Filtering Data in a Dataset

This tutorial tells us how to use the DataFrame to filter a dataset

Filtering the data

Being able to look at specific sections of our data may be useful when we have large amounts of entries in our dataset. For example, say that we have a survey sample of 2000 entries on music and we are wanting to look at data based off of entries that contain "indie pop" within the genre category. Instead of having to manually go through our data and find the individual rows containing "indie pop" we can simply make it easier on ourselves by filtering.

The example we will be looking at is about dog breeds. In this dataset, it lists over 100 entries of different dog breeds and describes their life span, weight, height, temperament, etc. Now, looking at all these different dogs could be a bit overwhelming and right now, we want to only focus on dogs that may be good in an apartment, so it would make sense to possibly find smaller dogs. Instead of scrolling through every row to manually look for small dogs, we can now filter by the "Breed Group." To do we would be able to filter all the dog breeds we want using the Filter(text source) function within the DataFrames library.

Here is a brief description of how the Filter() function works.

Filter Helper Function
frame:Filter(text source)This function takes in a string expression similar to writing an expression in Quorum. This function will filter out specific rows based on columns. The expression will evaluate to true or false on each row and either filter that row or not.text dq = " dq = dq:GetDoubleQuote() frame:Filter("Breed = " + dq + "Toy" + dq)

Let's try an example and retrieve all rows that contain "Toy" as the breed group. Doing so will create a copy of our data frame and contain ONLY the rows with "Toy," hence we are filtering our dataset. Please note that in the Quorum Language, there is a flaw in the programming language in regard to usability and double quotes (notably, Quorum lacks escape characters), so we must manually make our double quotes as shown above.

//We need the DataFrame class to load in files for Data Science operations.
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Transforms.RemoveUndefinedRowsTransform

//Create a DataFrame, which is essentially a table that understands 
//more information about the data that is being loaded.
DataFrame frame

//This loads data relative to the project, so put the AskAManager.csv file in the Data/Miscellaneous folder

text dq = ""
dq = dq:GetDoubleQuote()
//it's annoying, but we need to manually put in the double quotes, so
//Quorum interprets the word "toy" as text
//this then compares the value in the row Breed with
//the constant text value "Toy"
frame:Filter("Breed = " + dq + "Toy" + dq)

//We can save the frame or output it to the console, like we are doing here.
output frame:ToText()

Try it Yourself!

Press the blue run button to execute the code in the code editor. Press the red stop button to end the program. Your program will work when the console outputs "Build Successful!"

Congratulations! We have successfully filtered the data from a dataset. To view the code we just ran, we can reference it here.