Loading Tidy Data Frames

Learn about Tidy format and DataFrames in Quorum

Learning Objectives

In this tutorial, students will learn:

  1. Students will be able to run a Quorum program online
  2. Students will be able to obtain a Comma Separated Value (CSV) file from the Quorum server
  3. Students will understand how to create a Data Frame
  4. Students will be able to explain the purpose of Tidy format, where rows and columns are variables and observations and there is one data point per cell

Discuss DataFrame (25 minutes)

A DataFrame in Quorum is a library designed to load in and manipulate data. We are going to begin our discussion by looking at the reference page for DataFrame. In this session, we will discuss each line of code and run the program online.

Obtain a sample Comma Separated Value (CSV) file from the Quorum Server and Examine it (20 minutes)

Notice that when running a program on the Quorum website, it is getting the data from somewhere. We have a clue about where from the code and the relevant line is this one:

frame:Load("data/Height of Male and Female by Country 2022.csv")

If we look at the Load action, it is telling us where on the Internet the file lives, which means that if we type that exact phrase with quorumlanguage.com at the front, it will find the file. As such, we can download the height file from this link or by copying the information inside of Load. Note that the data is written in a Tidy format.

Create a Quorum Studio project to Run the same code Offline (15 minutes)

Now that we have discussed DataFrame, Tidy, and run our program online, we will now make a project in Quorum Studio and run the identical program offline. To do this, we can reference the Getting started tutorial if we need to when making the project. Once we do, we place the file we downloaded into a folder, named data, inside of where we created the project. After we have created it, we will run the program and make sure we get the same output as the online version. The goal here is to be able to take a program written online and run it there or on our desktops. This is important, as many kinds of software in data science are too complex to be run in the browser, which is especially true as data sets get large. A rule of thumb might be that in the browser, a few hundreds lines of data is fine, but if we have millions or billions, running our code on the desktop will make it go much faster.

Next Tutorial

In the next tutorial, we will discuss Overview of Chart Types and Examples, which describes Understand different types of charts and when to use them.