Tutorial: Databases

This tutorial shows us how to transform our database tables into meanful summarized data

Interpreting Summarized Data

Databases contain massive amounts of data which can be utilized in the world of Data Science. This could be using tables to measure central tendency, transform our data, or even create a visual representation of a table inside of our database. Recall that the DataFrames object is a container we heavily use for data manipulation and data representation. Because we can easily convert database tables as DataFrames in Quorum Studios, the functionality of a database has become so versatile. We will be able to draw conclusions from our database tables using summary statistics.

DataFrame frame

For this tutorial, we will be measuring the mean, median, variance, and standard deviation of our "grocerylist" table inside of our database. We will want to connect our database once again as well as create a DataFrame object. To learn more about how to convert our tables into a DataFrame we can use the following tutorial on converting a data table to a DataFrame.

Assuming that our "grocerylist" table is now a DataFrame, we will be using the "frame" object for the rest of the tutorial and find the Mean(), Median(), StandardDeviation(), and Variance(). The other actions we will be using with our "frame" are AddSelectedColumn(integer columnNumber) and EmptySelectedColumns().

Let us show an example of how to find the mean because the other measures also follow this same format. We first want to select the column we are going to measure. Recall that the columns in our "grocerylist" table are: FoodID, FoodName, FoodCost, FoodCalories, and FoodQuanity as well as know that our first column, FoodID column is at element 0 in our "frame."

Using our frame, we will select the "FoodCost" column which is also column 2 inside the frame using the action AddSelectedColumn(integer columnNumber). Then, we will properly format our data and call the Mean() action with our frame. After calling Mean(), we will need to empty our DataFrame using the action EmptySelectedColumns(). We do this because we can only measure one central tendency at a time and it also helps use different columns for calculations. This same process should be followed for Median(), Variance(), and StandardDeviation().

frame:AddSelectedColumn(2)
output " "
output "Mean of the FoodCost:"
output frame:Mean()
frame:EmptySelectedColumns()

Now that we have our data measures, we can describe our data in a more presentable way versus showing a data table. Data tables are great for holding massive amounts of data, but they are difficult to represent without showing any sort of quantitative measure. To learn more about Data Science and how to use DataFrames in a more complex manner, we can reference the Introduction to Data Science learning track.

We can view our entire program below.

use Libraries.Data.Database.Database
use Libraries.Compute.Statistics.DataFrame

Database database
database:Connect("localhost", "myfirstdatabase", "root", "password")

output database:IsConnected()

DataFrame frame

frame = database:FindAsDataFrame("grocerylist")

output frame:ToText()

frame:AddSelectedColumn(2)
output " "
output "Mean of the FoodCost:"
output frame:Mean()
frame:EmptySelectedColumns()

frame:AddSelectedColumn(3)
output " "
output "Median of the FoodCalories:"
output frame:Median()
frame:EmptySelectedColumns()

frame:AddSelectedColumn(4)
output " "
output "Standard of the FoodQuantity:"
output frame:StandardDeviation()
frame:EmptySelectedColumns()

frame:AddSelectedColumn(4)
output " "
output "Variance of the FoodQuantity:"
output frame:Variance()

Next Tutorial

In the next tutorial, we will discuss Database Chart creations, which describes how to create a bar chart from a database table.