Filtering Columns
This tutorial demonstrates how to filter columns in our datasetsFiltering a Column
While filtering rows can be a somewhat complicated operation, filtering columns is straightforward. In a DataFrame, we can remove any column we want to get rid of by calling RemoveColumnAt() and telling it which one we want to remove. We can also copy a data frame first before doing so, to preserve the original data, but for now, let us try to transform our data by removing the unnecessary columns. Here is a brief description of what the function we are using does.
Class / Action | Description | Usage |
---|---|---|
DataFrame:RemoveColumnAt(integer columnNumber) | This function will remove unwanted columns from our dataset starting from 0 to N, where N is the number of total columns contained in the dataset. It takes in integers, signifying the corresponding column number to remove. | //This action removes various empty entries from our CSV file frame:RemoveColumnAt(3) |
To accomplish this, we will be going back to our AskAManager.csv file and begin to load it back into our DataFrame, 'frame' using the Load(text fileLocation) action. After properly loading our file, we will be able to use the RemoveColumnAt(integer num) action which will remove unwanted columns with empty entries. In this case, as when looking at the dataset, there are missing chunks of information on these columns: additional context on job title, other monetary comp, currency - other, and additional context on income. We would not want to have undefined values and mostly, we have decided that these columns are not useful pieces of information so it would be okay to get rid of them.
We can click this link to download the file for the CSV.//Create a DataFrame, which is essentially a table that understands
//more information about the data that is being loaded.
DataFrame frame
//This loads data relative to the project, so put the AskAManager.csv file in the Data/Miscellaneous folder
frame:Load("../Data/Miscellaneous/AskAManager.csv")
We want to figure out the corresponding column number to these columns. We will be calling RemoveColumnAt() action using our 'frame' object. Reminder that with a CSV file, columns start off at 0, therefore the corresponding column numbers for these headings we are filtering are as follows:
- Additional context on job title: 3
- Other monetary comp: 5
- Currency - other: 7
- Additional context on income: 8
//This action removes various empty entries from our CSV file
frame:RemoveColumnAt(3) // additional context on job title
frame:RemoveColumnAt(5) // other monetary comp
frame:RemoveColumnAt(7) // currency - other
frame:RemoveColumnAt(8) // additional context on income
Finally, we can output our dataset to confirm the filter using frame:ToText(). The full code is shown below.
//We need the DataFrame class to load in files for Data Science operations.
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Transforms.RemoveUndefinedRowsTransform
//Create a DataFrame, which is essentially a table that understands
//more information about the data that is being loaded.
DataFrame frame
//This loads data relative to the project, so put the AskAManager.csv file in the Data/Miscellaneous folder
frame:Load("../Data/Miscellaneous/AskAManager.csv")
//This action removes various empty entries from our CSV file
frame:RemoveColumnAt(3) // additional context on job title
frame:RemoveColumnAt(5) // other monetary comp
frame:RemoveColumnAt(7) // currency - other
frame:RemoveColumnAt(8) // additional context on income
//We can save the frame or output it to the console, like we are doing here.
output frame:ToText()
Try it Yourself!
Press the blue run button to execute the code in the code editor. Press the red stop button to end the program. Your program will work when the console outputs "Build Successful!"
To view the program we have made, we can download the program file
Next Tutorial
In the next tutorial, we will discuss splitting columns, which describes splitting columns.