## Calculating the Kurtosis

In data statistics, kurtosis is a phrase used to indicate how heavy or light the tails of our data are. By heavy tails, we mean that our data might be flatter around the edges. By light tails, it is the opposite. A dataset might be largely flat across the board. One reason we care about this is because as data is distributed in different ways, it can cause problems in data analysis. Put simply, we use kurtosis to make sure the distribution of our data is fairly balanced.

$\text{Kurtosis=}\frac{n\left(n+1\right)}{\left(n-1\right)\left(n-2\right)\left(n-3\right){s}^{4}}\sum _{i=1}^{n}{\left(x-\overline{x}\right)}^{4}-\frac{3{\left(n-1\right)}^{2}}{\left(n-2\right)*\left(n-3\right)}$

While one might assume that the equations for kurtosis are standardized, in practice different statistical packages provide slightly different answers. In our case, we document the equations we used in MathML. All statistical packages provide similar answers and our equations match those used in the Apache Commons mathematical packages.

The kurtosis can be calculated by calling the helper action, Kurtosis() contained within the DataFrame's class. To do this, we will use our 'frame' object and call the function Kurtosis(). In this case we will be calculating the mean of the area of dry bean classifications. Here is a brief description on how Kurtosis() works.

Kurtosis Function
Function Description Usage
dataFrameObject:Skew()This action takes the column that you have passed and calculates the kurtosis of that column. Note that it can only calculate the kurtosis of one column at a time.frame:Kurtosis()

Here is some code on how to calculate the kurtosis:

//We need the DataFrame class to load in files for Data Science operations.
use Libraries.Compute.Statistics.DataFrame

//Create a DataFrame, which is essentially a table that understands
//Using the default loader is enough for our purposes
DataFrame frame

//Tell the frame we want the first column selected