When data is cleaned and ready to be presented, scientists show their data to others. When scientists do this, it is rare that they just hand a community or person a data set. Looking at the raw data can be difficult for many reasons (e.g., it is huge, it is difficult to understand). Thus, we often use charts to represent summaries of the information in our data.
Consider an example. We will take this example from a dataset about different dog breeds and their attributes such as maximum/minimum lifespan, maximum/minimum height, and maximum/minimum weight. Suppose we want to create a visualization about the heights of dogs based on their height from shortest to tallest by creating a line chart. It would be very clear that the Irish Wolfhound would be the tallest dog while the Yorkshire Terrier would be the shortest dog. All other dogs in the dataset would fall in-between this line. As data scientists we can observe any indication of the purpose of their breed type based on height. Data to make these predictions would be more difficult to see with just a data table. We will start by learning to load our data for the charts system.
Loading Data and Formatting Data
Running our program on Quorum Studio is recommended for best practice.
To follow along, we can download the Dog.csv dataset here.
For the data to be read in, we use the DataFrame component to read in the Dogs CSV file. To do that, we include the DataFrame library and create a DataFrame:
use Libraries.Compute.Statistics.DataFrame DataFrame frame frame:Load("../Data/Animals/Dogs.csv")
Note that the file structure for this tutorial is that the dog CSV file is contained in a separate Data folder contained in an internal Animals folder for organization of datasets. The Data folder should be in a separate outside the project folder but inside the main parent directory.
To format the data, we can select certain columns from our CSV file to consider in our chart. Let us try pull out the min and maximum height and weight as well as the breed group:
|Breed Group||Max Weight||Min Weight||Max Height||Min Height|
Taking our frame component, we use the AddSelectedFactors to create the x-axis for the chart. For the y-axis, AddSelectedColumns will be used as the comparison data to display the different data points taken from the data table. Think of this as the contents of the bars, lines, etc. of the chart. AddSelectedColumns() and AddSelectedFactors() both take in strings as a parameter or the table column number. In this example, we use the defined text headers to extract our data.
frame:AddSelectedFactors("Breed Group") // this will pull out the breed group from table and label on the x axis frame:AddSelectedColumns("Maximum Weight") // this will pull out the max weight from table and label on the y axis frame:AddSelectedColumns("Minimum Weight") frame:AddSelectedColumns("Maximum Height") frame:AddSelectedColumns("Minimum Height")
Let us run this example below!
Run the programs
This code will help us read in our Dogs.csv file
Now that we have demonstrated let's use the data we have read in and transform our data into a chart. For this example, we will be creating a chart object. The chart we will be constructing is a bar chart in which we will need to create a 'chart' object. We will formally learn more about bar charts and how to properly construct them, but for this tutorial, we want to learn how to combine data frames with the charts.
This code will display our first chart!
In the next tutorial, we will discuss color accessibility, which describes color accessibility with charts.