The student will learn:
- Students will be able to independently filter their datasets by sub-groups.
- Students will be able to learn how to create new columns
- Student will be able to separate columns and add them as a new heading
Obtain and Examining Our Datasets (5 Minutes)
For this lesson, we will be using two different datasets, one is a numerical dataset filled with arbitrary values and one is more a 'real world' dataset with numerical and nonnumerical values. The significance of this dataset is to show how well users perform in a game.
Our numerical dataset will be used on calculating columns because for those functions, it requires us to form equations, making numerical data much easier to work with. Our 'real world' dataset will be used to split up columns.
Calculating Columns (25 Minutes)
Let us say we have two or more columns of numerical data in which we want to calculate something such as the sum. We are able to do that by forming equations by adding up our desired headers and our calculations would modify the DataFrame and insert a new column with the new values.
Firstly, let us load the file and create a copy of the original frame. Next, we add the columns on the frame, and the heading title, 'total games' and insert the equation we have (Wins + Losses) as the second parameter. Finally, we will output both the original frame (copy) and the newly modified frame (frame). Now we have created a new column with a name 'total games' that is essential in a gamer world.
If you would like more detailed steps, we have the column calculation tutorial with the 'random.csv' dataset on our website. As a reminder, these values are arbitrary and for learning purposes to construct our equations in a simple manner.
Column Splitting (25 Minutes)
Column splitting occurs when we have data that needs to be separated. As an example, let us view an example column:
We see that there are ice cream flavors listed and immediately following, we have their prices listed. In a sense, we could say we are trying to make our data Tidy. Suppose we wanted to Tidy up our data automatically by making new columns for 'Ice Cream' and 'Price.'
We can achieve this separation by column splitting. For this section, we will be modifying our 'League.csv' dataset dataset and we will be working on separating various factors. For this, we will be following this tutorial on column splitting. Notice how there are two major sections on, where the second section is a more complicated way to column split using Inheritance. In computer science, inheritance is when we create a new object that is based on another class.
Wrap-up with Data Transformations (5 Minutes)
Let us recall the two previous sections with cleaning up our data and focus on why data transformations are important. Again, let us refer back to our original AskAManager dataset and talk about what can be modified or removed to make this a Tidy dataset.
In the next tutorial, we will discuss Scatterplots and Correlations, which describes Understanding scatter plots, correlation and R^2.