Making a Histogram
A very common chart for visualizing numeric data is through a histogram. Histograms are typically used to represent the distribution of numerical data. The variable we observe is divided into different intervals where as a whole, we can examine the shape of the graph whether it is skewed, normal, uniform, bimodal [two distinct curves], etc. and draw conclusions about the data. Typical datasets to look out for when wanting to construct a histogram would be datasets featuring costs, ages, GPAs, and test scores. Note that datasets with non-numeric data are not recommended for histograms.
For this lesson, we will examine the costs of various AirBnB per night in NYC in 2019. We can examine how expensive it is to rent an AirBnB and can make a conclusion if using the app is worth it or not.
First step to this process is that we will need to properly load in the dataset and format it. It is best to keep track of where we are storing our data files. For this tutorial, we will have our dataset inside of a Data folder contained in another internal folder called "Other."
To follow along, we can download the NYC Airbnb dataset here.
Here is a snippet of what the dataset should look like:
|Neighborhood Group||Neighborhood||Room Type||Price|
|Brooklyn||Clinton Hill||Entire home/apt||89|
|Manhattan||East Harlem||Entire home/apt||80|
Loading and Formatting
As mentioned previously, to load and read in the dataset, we will need to create a DataFrame component named "frame". Using the frame, we must use the Load function and type in the file path of the Airbnb CSV.
use Libraries.Compute.Statistics.DataFrame use Libraries.Interface.Controls.Charts.Histogram // Create a DataFrame to hold the data. DataFrame frame // Load your data file into the frame. frame:Load("data/AB_NYC_2019.csv")
Once the data has been loaded in, we will now extract this data to use in the data chart. We will be using two functions from our frame component, AddSelectedColumns(text header) and AddSelectedFactors(text header) where the columns will be used to label our x axis, signifying the groups we are observing and the factor will be used to label our y axis, signifying the change over time. AddSelectedColumn and AddSelectedFactor take in a parameter of either the column number or the column label in the CSV file. We will be using the column number to demonstrate.
Notice in our data that we do have a lot of columns, but feel free to ignore them, the only column we will be pulling are the prices. Please note that for a Histogram, we will only be using the AddSelectedColumns(text header) because we would like to count the totals of each cost range for AirBnBs. This goes for other datasets as well, in which we will not be modifying the y axis. AddSelectedColumns(text header) will take a string as the parameter which represents the text header in the data file.
We should have the following code:
// Select data from the frame that you wish to use in your histogram. // Note: In Histograms, factors and non-numerical columns are not supported. // For this example we will be using the numerical column "price". frame:AddSelectedColumns("price")
Now it is time to create the Histogram which can be done with the following code. This creates a chart object from our DataFrame component, frame. The rest of this lesson, we will be using the chart object we have created to change and format the rest of our line chart.
// Using the frame, create a Histogram object. Histogram chart = frame:Histogram() // Display your histogram. chart:Display()
Example of loading our data and creating histogram object
Calling the Display() function will give us a pop-up of our formatted data so far. We still need to give meaning to our data, therefore, the following steps will show us how to label and customize our chart.
Labeling The Histogram
In order for viewers to understand our data, labels give a clear comprehension of what is being presented. This means that we will be labeling the x axis, y axis, legend, and giving our chart a title that describes the dataset. To do so, we will call the following functions with our "chart" object: SetTitle(text title), SetXAxisTitle(text title), SetYAxisTitle(text title), SetLegendTitle(text title), and SetSubtitle(text title). Here is a brief description on what each function does and what it takes in.
|SetTitle(text name)*||SetTitle() takes in a string as a parameter, which would be the title of the chart. For this example, we will name the chart "Price per night with AirBnB in 2019 (NYC)"||chart:SetTitle("Price per night with AirBnB in 2019 (NYC)")|
|SetXAxisTitle(text name)||SetXAxisTitle() takes in a string as a parameter, which would be the label of the x axis. For this example, we will label this section "Price ($)" because this is the time frame we are observing||chart:SetXAxisTitle("Price ($)")|
|SetYAxisTitle(text name)||SetYAxisTitle() takes in a string as a parameter, which would be the label of the y axis. For this example, we will label this section "Number of Stays" because this is the factor used for comparison between the different countries. This is also a good section to label the unit we are comparing, such as dollars||chart:SetYAxisTitle("Number of Stays")|
|SetLegendTitle(text name)||SetLegendTitle() takes in a string as a parameter, which would label the legend of the chart. The legend identifies the separate countries into different lines. For this example, we will label the legend "Cost per night"||chart:SetLegendTitle("Cost per Night")|
|SetSubtitle(text title)||SetSubtitle() takes in a string as a parameter which would set a subtitle under the title. This can be any short description or any other necessary information for our chart. For this example, we will label the subtitle "How expensive is it to stay in NYC?"||chart:SetSubtitle("How expensive is it to stay in NYC?")|
// Give the chart a descriptive title. chart:SetTitle("Price per night with AirBnB in 2019 (NYC)") // Add a subtitle for further description. chart:SetSubtitle("How expensive is it to stay in NYC?") // Give the x axis a descriptive title. chart:SetXAxisTitle("Price ($)") // Give the y axis a descriptive title. chart:SetYAxisTitle("Number of Stays")
*Note: We will also be adjusting the font size for the title as well so it fits nicely with our histogram. For this, we will be calling the function SetTitleFontSize(integer size) with our chart object. This takes in an integer (size) as the parameter, so for this case, we will insert 20 as our parameter.
Example of labeling our histogram
Customizing The Data Chart
Now that we have our data labeled, we can customize our data to our liking, such as adjusting the intervals, changing starting values, and changing the color. We will be playing around with all these features and to do so, we will be again, using our chart object to call these functions. The functions we will be using for this would be: SetXTickInterval(integer num), SetXAxisMinimum(integer num), and SetColorPaletteToWarmScale().
|SetXTickInterval(number setX)||SetXTickInterval() takes in a number as a parameter and sets the interval in multiples of the setX value given. For this tutorial, we will insert 50 as the tick interval||chart:SetXTickInterval(50)|
|SetXAxisMinimum(number min)||SetXAxisMinimum() takes in a number as a parameter and adjusts the starting point for the x axis. For this tutorial, we will insert 0 as our minimum so we can see the price ranges from 0-50, 50-100, etc.||chart:SetXAxisMinimum(0)|
|SetColorPaletteToWarmScale()||SetColorPaletteToWarmScale() changes the color palette to warmer colors such as yellows, reds, and oranges.||chart:SetColorPaletteToWarmScale()|
// You can hide the legend, as it is not needed for this example. chart:ShowLegend(false) // If needed, you can change the color palette to a predefined palette or create a custom one. chart:SetColorPaletteToWarmScale() // Set a custom interval along the x axis to 50 for a better display. // Alternatively, you can use SetBinWidth(50) for the same result. chart:SetXTickInterval(50) // Start the x axis at 0 for a better display. chart:SetXAxisMinimum(0)
Example of customizing our histogram
Displaying The Chart
Congratulations, our Histogram is constructed! Now we can display our chart with the Display() function. There are two ways to do this, letting it automatically display and specifying a specific window size. By doing chart:Display() it will display in a size equal to the screen size. By doing chart:Display(num, num), it will display the chart in a respected constraint window size. We will be using the default display to show the histogram.
Now, feel free to clean, build, and run our program and we shortly should see a Game window pop-up. This is our Histogram! To view the entire code, click here to view the file.
Run the Program
Full Example of the Histogram
Further Useful Histogram Functions
|StackBars(boolean)||StackBars() will stack bars on top of each other when a group contains multiple bars. They are not stacked by default.||chart:StackBars(true)|
|OverlayBars(boolean)||OverlayBars() will overlay bars on top of each other when a group contains multiple bars. They are not overlayed by default.||chart:OverlayBars(true)|
|SetBinWidth(number)||SetBinWidth() will override the auto-calculated bin width with a user determined interval.||chart:SetBinWidth(10)|
|SeparateBySeries(integer)||SeparateBySeries() will separate the chart into a grid of subcharts based on the legend (series). It takes in an integer as the number of columns in the grid. If empty, it results in a single-column grid.||chart:SeparateBySeries(4)|
|SeparateBySeries(boolean)||SeparateBySeries() will separate the chart into a grid of subcharts based on the legend (series). If set to false, it will combined the subcharts to one histogram.||chart:SeparateBySeries(false)|
To view more examples with charts, we can reference the Quorum Curriculum Repository for charts.
In the next tutorial, we will discuss Creating a Pie Chart, which describes how to use pie charts in Quorum Studio.