Tidy Data
This tutorial introduces Tidy Data and why we use it in the Quorum Programming Language.What is Tidy Data?
There are many ways to create datasets for analysis. In practice, while anything goes, there are advantages to having consistent formats. Notably, keeping our data consistent makes it possible for 1) humans reading the data to know ahead of time how it is structured and 2) computers ahead of time to use that structure to its advantage when running statistical and charting actions. For this reason, Quorum has adopted what Hadley Wickham calls the tidy data format.
The basic idea behind Tidy is that data must have three properties:
- Variables must be in columns
- Observations must be in rows
- Individual cells must represent only one value
In Quorum, all charts, statistical tests, and actions related to data all assume that data are in a tidy format. This means that sometimes some systems must convert data to other formats to conduct tests, or other such operations, but this can be done automatically. Here is an example of data in Tidy Format:
y | x1 | x2 | x3 |
---|---|---|---|
2 | 4 | -9 | 9 |
6 | 7 | -19 | 19 |
3 | 4 | -18 | 18 |
9 | 8 | -16 | 16 |
15 | 17 | -2 | 2 |
1 | 3 | -4 | 4 |
In this example, y, x1, x2, and x3 are all variables and the first row is a header representing the name of those variables. The remaining rows contain all of the data points for each variable. Each cell contains only one value.
Next Tutorial
In the next tutorial, we will discuss csv exporting, which describes how to export a csv from excel and google sheets.