Calculating the Variance and Standard Deviation
In data science, standard deviation measures the variability of a set of numerical values. Having a low standard deviation means that values are more clumped whereas a higher standard deviation means that the values are spread out in a wider range. Standard deviation is the square root of variance. Variance is simply the average of the squares of the deviations. What that means is that it describes the variability of observations from its arithmetic mean. This is done by calculating the difference between each number and the mean then squaring the differences to divide them by the sum of squares. Here are the equations of standard deviation and variance:Standard Deviation: Variance:
While the mean and median are measures of central tendency, variance and standard deviation are measures of dispersion. Put another way, variance and standard deviation are different measures of how the data points are compared to a center point. The standard deviation is the square root of the variance. This example shows using both the Variance() function and the StandardDeviation() function, even though they are mathematically related.
After loading the data, to calculate the standard deviation and variance, we want to call the two actionss from our DataFrame, frame Variance() and StandardDeviation() from one column out of our dataset. Here are brief descriptions of how these functions work:
|dataFrameObject:StandardDeviation()||This action will calculate the standard deviation from a single column of data within the dataset.||frame:StandardDeviation()|
|dataFrameObject:Variance()||This action will calculate the variance from a single column of data within the dataset.||frame:Variance()|
Here is some code on how to calculate the variance and SD:
//We need the DataFrame class to load in files for Data Science operations. use Libraries.Compute.Statistics.DataFrame //Create a DataFrame, which is essentially a table that understands //more information about the data that is being loaded. //Using the default loader is enough for our purposes DataFrame frame //This loads data relative to the project, so put the dryBeans file in the Data/Miscellaneous folder frame:Load("../Data/Miscellaneous/DryBeans.csv") //Tell the frame we want the roundness column which is received through the text header frame:AddSelectedColumns("roundness") //When we call Variance or StandardDeviation, the frame now knows //we are referencing the first column. output "Variance: " + frame:Variance() output "Standard Deviation: " + frame:StandardDeviation()
Run the Example
Example of calculating the variance and standard deviation
Congrats! We have just learned how to calulate the variance and standard deviation! To view the whole file, we can click here.
In the next tutorial, we will discuss IQR, which describes calculating the interquartile range.