Calculating the Skew

In data statistics, skewness can be thought of as a lack of symmetry in the data. When analyzing data, mathematical properties about that data can be useful in understanding it. Two rather useful metrics are skew and, in the next section, kurtosis. We measure skew as a number, positive or negative, which references whether the data set is shifted in one direction or the other. A skew of 0 means that the data is symmetrical around the mean. A positive skew, or right skew, indicates the tail of the data is longer above the mean. A negative skew, or left skew, is the opposite. For calculating skew, we use an equation that is common in statistical packages, called the "Fisher-Pearson Standardized Moment Coefficient."

Skew= n ( n - 1 ) ( n - 2 ) i = 1 n x i - x ¯ s 3

The skew is calculated using the helper action Skew() which is within the DataFrame's class. To do this, we will use our 'frame' object and call the function Skew(). In this case we will be calculating the skew of the area of dry bean classifications. Here is a brief description on how Skew() works.

Skew Function
Function Description Usage
dataFrameObject:Skew() This action takes the column that you have passed and calculates the skew of that column. Note that it can only calculate the skew of one column at a time.frame:Skew()

Here is some code on how to calculate the skew:

//We need the DataFrame class to load in files for Data Science operations.
use Libraries.Compute.Statistics.DataFrame

//Create a DataFrame, which is essentially a table that understands 
//more information about the data that is being loaded.
//Using the default loader is enough for our purposes
DataFrame frame

//Tell the frame we want the first column selected
output frame:Skew()

Run the Example

Example of calculating the skew

Congrats! We have just learned how to calulate the skew! To view the whole file, we can click here.

Next Tutorial

In the next tutorial, we will discuss kurtosis, which describes calculating the kurtosis.