Column Splittng

Let us immediately start off with an example to explain what it means to 'column split.' Imagine for a moment that we had a DataFrame below:

"player,year,season,stats 1,A,1,22-2 2,A,2,29-3 3,B,1,18-6 4,B,2,11-8 5,C,1,12-5 6,C,2,19-2"

The basic idea in splitting columns is we need a way to separate out the value in the cells. For simple cases, like if there is a character between them, we can use the SplitSelectedColumns(Array, text) action. For this, we first provide an array of the names of the headers for the new columns. Second, we provide it the character we want to split up the values by. Here is a brief description of how SplitSelectedColumns(Array, text) work:

Column Splitting Helper Functions
Class / Action Description Usage
dataFrameObject:SplitSelectedColumns(Array textHeader, text separatorChar)This action takes in an array that would be the new header for the created column as well as the character we are looking for to separate values by. The character will go through that selected column and create a new header column with just the separated value.frame:SplitSelectedColumns(newHeader,"-")

For this example, we will be using a sample file of player stats for an online game, league.csv. Here is an example of the CSV file:

Column Splitting Helper Functions
Player Tier LP Level Win Ratio Kills/Deaths/Assists
jianhunChallenger117053156-1124/33/11
TES 77Challenger113460209-1614/33/11
Shadow BongBongChallenger1121241312-2534/33/11
KING MIRChallenger1084654406-3554/33/11

We can click this link to download the file for the CSV.

What we want to do is try and separate the Win Ratio column so on one column, we have the wins and the next column we have the loses. So let us start off with downloading the provided file here. We will then want to create a DataFrame object, which we can call 'frame.' Our 'frame' object will contain all the actions we need in order to successfully split our column. We first want to load the file, which we can do by calling Load(text fileLocation) and put it in the file location of our league.csv file.

DataFrame frame
frame:Load("../Data/Culture and Entertainment/league.csv")

Next, we want to create a text array in which they will be the newly separated columns. Since we want to separate wins and loses, we will use the Add(text arrayElement) to add heading titles into our array.

Array headers
headers:Add("Wins")
headers:Add("Loses")

Next, we have to select the column we are trying to separate. Remember that in a CSV file, columns start off with 0 so in this case, the column with the non-separated wins and loses will be column 4. We would call the action AddSelectedColumn(integer columnNumber) and pass in 4 which will read in the column of 'Win Ratio.' An alternative method that accomplishes the same read-in is to use the action AddSelectedColumns(text columnHeader) which we would pass in the 'Win Ratio' column header and read in that same column. For this tutorial, we will just add the numerical assigned column.

frame:AddSelectedColumn(4)

The last step is to call the action SplitSelectedColumns(Array , text) on our 'frame' object. For the first parameter, we will pass in the 'headers' array we created previously and in the second parameter, we will want to pass in the character we want to separate the column by. What this does is that it will search through the column and find that separator character. Once it finds the character, it separates the values from the left and right hand sides and puts them into separate columns. In this dataset, the values we want to separate are divided by a '-' so we will be inserting that character.

We also want to save our newly created columns into a new DataFrame so we will assign this modified frame into a new frame called 'result.' We finally can output both DataFrames onto the console to see our changes.

DataFrame result = frame:SplitSelectedColumns(headers, "-")

Below is an example of the entire program:

use Libraries.Compute.Statistics.DataFrame
use Libraries.Containers.Array
    DataFrame frame
    frame:Load("../Data/Culture and Entertainment/league.csv")
    
    //create the names for the new columns
    Array headers
    headers:Add("Wins")
    headers:Add("Loses")

    //set which column or columns we want to split in this way
    //then split the values and output them to the console
    frame:AddSelectedColumn(4)
    DataFrame result = frame:SplitSelectedColumns(headers, "-")

    output "original frame"
    output frame:ToText()

    output "column split frame of wins/loses"
    output result:ToText()

Run the Example

Column Splitting for data transformations

To view the program we have made, we can download the program file

Column Splitting With Inheritance

While this works for a common case, for more complex examples, we may want to split up a value in any number of complicated ways. To do so is more complex. Specifically, we need to use Inheritance. Overriding the class named ColumnSplitter, then overriding one action. Here is an example of the class that implements the basic splitting shown above:
package Libraries.Compute.Statistics.Transforms
use Libraries.Containers.Arrayi
use Libraries.Compute.Statistics.DataFrame

class LiteralColumnSplitter is ColumnSplitter
text delimiter = undefined

action SplitValue(text value) returns Array
    if delimiter = undefined or delimiter:IsEmpty()
        alert("Cannot split the column based on an undefined text splitter.")
    end

    if value = undefined or value:IsEmpty()
        alert("Cannot split the column based on an undefined text value.")
    end

    return value:Split(delimiter)
end

action GetDelimiter returns text
    return delimiter
end

action SetDelimiter(text delimiter)
    me:delimiter = delimiter
end
end

Once we have created a new splitter, we can send it to any DataFrame using its Transform action. Now, we will go through the process on how to use this same idea with our DataFrames. Again, we will want to create a DataFrame object, which we can call it 'frame.' Our 'frame' object will contain all the actions we need in order to successfully split our column. We first want to load the file, which we can do by calling Load(text fileLocation) and put it in the file location of our league.csv file.

DataFrame frame
frame:Load("../Data/Culture and Entertainment/league.csv")

Next, we want to create a text array in which they will be the newly separated columns. Since we want to separate kills, deaths and assists, we will use the Add(text arrayElement) to add heading titles into our array.

Array  headers
headers:Add("Kills")
headers:Add("Deaths")
headers:Add("Assists")

Finally, we want to select the column we are splitting, 'Kills/Deaths/Assists' using our action AddSelectedColumns(text header). This will load our selected column into the DataFrame in order to properly split up the entries. We also want to call another action SplitSelectedColumns(Array headers, text charDelim, bool hasDelimeter) which takes in three parameters: the array we have created, a character separating each value, and boolean to verify such delimiter in the column. This is very similar to our previous splitting action. Now that we have our column set up, let's create a new DataFrame and call it 'result' and save the result from SplitSelectedColumns().

use Libraries.Compute.Statistics.DataFrame
use Libraries.Containers.Array

DataFrame frame

frame:Load("../Data/Culture and Entertainment/league.csv")
 
    //create the names for the new columns
    Array headers
    headers:Add("Kills")
    headers:Add("Deaths")
    headers:Add("Assists")


    text delimiter = undefined
    boolean hasDelimeter = true

    frame:AddSelectedColumns("Kills/Deaths/Assists")
    DataFrame result = frame:SplitSelectedColumns(headers, "/", hasDelimeter)

    output result:ToText()
    output frame:ToText()

Run the Example

Column Splitting with Inheritance for data transformations

To view the program we have made, we can download the program file