Data Science 1

A set of problems that help you to practice making calculations with data loaded in from a spreadsheet.

Using Data Frames

Data frames are objects that can load data from a spreadsheet and allow you to make calculations and charts from that data. Like the Math library, add the use block for a DataFrame library and create an object. Then run the program to see some of the data.

Loading CSV Files

This program needs to load a data set and output the first four columns and two rows of that data set. Any of the load blocks in the tray will work, but you need to find the right one. Add a block that loads a data set that contains the value 175049.0 in the fourth column and first row of data.

Displaying Data

Display the first 5 rows of data from the covid19.csv spreadsheet. Don't include the column titles in your row count.

Selecting Data

This program uses the Cats.csv file to find the mean of a column from the data. Find the column that has a mean of 15.417910447761205 by choosing a select column block from the tray, dropping it into the program before the mean is calculated, and running it.

There are only four columns in this data that can be used to find the mean. This includes Minimum Life Span at index 2, Maximum Life Span at index 3, Minimum Weight at index 4, and Maximum Weight at index 5.

Finding the Mean

Over then next six problems you will work to build up a program that calculates central tendencies for two columns of data located in the covid19.csv file(originally from the CDC website).

For this problem, focus on calculating the mean for the 1st Dose Allocations column. First, read through the existing program and decide which block in the tray needs to be added to calculate the mean. Second, notice column two matches to the 1st Dose Allocations column in the covid19.csv file because column indexes start at zero. This column is selected before the mean calculation is done. Finally, once you have added the mean action block, run the program. The following is the output you should expect to get.

Central Tendencies of the following Column: 1st Dose Allocations
Mean: 62061.63265306125
Median: 0.0
Standard Deviation: 0.0
Variance: 0.0
Kurtosis: 0.0

Finding the Median

Over this and the next few problems you will work to build up a program that calculates central tendencies for two columns of data located in the covid19.csv file(originally from the CDC website).

For this problem, focus on calculating the median for the 1st Dose Allocations column. Once you have added the median action block, run the program. The following is the output you should expect to get.

Central Tendencies of the following Column: 1st Dose Allocations
Mean: 62061.63265306125
Median: 38025.0
Standard Deviation: 0.0
Variance: 0.0
Kurtosis: 0.0

Finding the Standard Deviation

Over this and the next few problems you will work to build up a program that calculates central tendencies for two columns of data located in the covid19.csv file(originally from the CDC website).

For this problem, focus on calculating the standard deviation for the 1st Dose Allocations column. Standard deviation is the amount of variance of the data around the mean. Once you have added the standard deviation action block, run the program. The following is the output you should expect to get.

Central Tendencies of the following Column: 1st Dose Allocations
Mean: 62061.63265306125
Median: 38025.0
Standard Deviation: 81206.03894539685
Variance: 0.0
Kurtosis: 0.0

Finding the Variance

Over this and the next couple of problems, you will work to build up a program that calculates central tendencies for two columns of data located in the covid19.csv file(originally from the CDC website).

For this problem, focus on calculating the variance for the 1st Dose Allocations column. Variance is how far a set of numbers is spread out from their mean value. Once you have added the variance action block, run the program. The following is the output you should expect to get.

Central Tendencies of the following Column: 1st Dose Allocations
Mean: 62061.63265306125
Median: 38025.0
Standard Deviation: 81206.03894539685
Variance: 6594420761.20131
Kurtosis: 0.0

Finding the Kurtosis

Over this and the next problem, you will work to build up a program that calculates central tendencies for two columns of data located in the covid19.csv file(originally from the CDC website).

For this problem, focus on calculating the kurtosis for the 1st Dose Allocations column. Kurtosis tells how heavily the tails of a distribution differ from the tails of a normal distribution. Once you have added the kurtosis action block, run the program. The following is the output you should expect to get.

Central Tendencies of the following Column: 1st Dose Allocations
Mean: 62061.63265306125
Median: 38025.0
Standard Deviation: 81206.03894539685
Variance: 6594420761.20131
Kurtosis: 14.60309669876606

Practice Calculating

In this final problem, you will work to build up the program so that it calculates central tendencies for the 2nd Dose Allocation column located in the covid19.csv file(originally from the CDC website).

For this problem, focus on calculating all the same central tendencies for the 2nd Dose Allocations column. First, add the block that removes the selected column. Then add the selection block for the new column. Next, add each of the calculations. The output statements will then display the new values for each of the calculations. Once you have completed all the steps, run the program. The following is the output you should expect to get.

Central Tendencies of the following Column: 1st Dose Allocations
Mean: 62061.63265306125
Median: 38025.0
Standard Deviation: 81206.03894539685
Variance: 6594420761.20131
Kurtosis: 14.60309669876606
Central Tendencies of the following Column: 2nd Dose Allocations
Mean: 61923.45238095237
Median: 38025.0
Standard Deviation: 81217.91840322685
Variance: 6596350269.753216
Kurtosis: 14.611164710632755