Calculating the Skew
In data statistics, skewness can be thought of as a lack of symmetry in the data. When analyzing data, mathematical properties about that data can be useful in understanding it. Two rather useful metrics are skew and, in the next section, kurtosis. We measure skew as a number, positive or negative, which references whether the data set is shifted in one direction or the other. A skew of 0 means that the data is symmetrical around the mean. A positive skew, or right skew, indicates the tail of the data is longer above the mean. A negative skew, or left skew, is the opposite. For calculating skew, we use an equation that is common in statistical packages, called the "Fisher-Pearson Standardized Moment Coefficient."
The skew is calculated using the helper action Skew() which is within the DataFrame's class. To do this, we will use our 'frame' object and call the function Skew(). In this case we will be calculating the skew of the area of dry bean classifications. Here is a brief description on how Skew() works.
|This action takes the column that you have passed and calculates the skew of that column. Note that it can only calculate the skew of one column at a time.
Here is some code on how to calculate the skew:
//We need the DataFrame class to load in files for Data Science operations.
//Create a DataFrame, which is essentially a table that understands
//more information about the data that is being loaded.
//Using the default loader is enough for our purposes
//Tell the frame we want the first column selected
Try it Yourself!
Press the blue run button to execute the code in the code editor. Press the red stop button to end the program. Your program will work when the console outputs "Build Successful!"
Congrats! We have just learned how to calulate the skew! To view the whole file, we can click here.
In the next tutorial, we will discuss kurtosis, which describes calculating the kurtosis.