What is Tidy Data?

There are many ways to create datasets for analysis. In practice, while anything goes, there are advantages to having consistent formats. Notably, keeping our data consistent makes it possible for 1) humans reading the data to know ahead of time how it is structured and 2) computers ahead of time to use that structure to its advantage when running statistical and charting actions. For this reason, Quorum has adopted what Hadley Wickham calls the tidy data format.

The basic idea behind Tidy is that data must have three properties:

  • Variables must be in columns
  • Observations must be in rows
  • Individual cells must represent only one value

In Quorum, all charts, statistical tests, and actions related to data all assume that data are in a tidy format. This means that sometimes some systems must convert data to other formats to conduct tests, or other such operations, but this can be done automatically. Here is an example of data in Tidy Format:

Tidy Format Example
yx1x2x3
24-99
67-1919
34-1818
98-1616
1517-22
13-44

In this example, y, x1, x2, and x3 are all variables and the first row is a header representing the name of those variables. The remaining rows contain all of the data points for each variable. Each cell contains only one value.

Next Tutorial

In the next tutorial, we will discuss csv exporting, which describes how to export a csv from excel and google sheets.