Making a Histogram

A very common chart for visualizing numeric data is through a histogram. Histograms are typically used to represent the distribution of numerical data. The variable we observe is divided into different intervals where as a whole, we can examine the shape of the graph whether it is skewed, normal, uniform, bimodal [two distinct curves], etc. and draw conclusions about the data. Typical datasets to look out for when wanting to construct a histogram would be datasets featuring costs, ages, GPAs, and test scores. Note that datasets with non-numeric data are not recommended for histograms.

For this lesson, we will examine the costs of various AirBnB per night in NYC in 2019. We can examine how expensive it is to rent an AirBnB and can make a conclusion if using the app is worth it or not.

First step to this process is that we will need to properly load in the dataset and format it. It is best to keep track of where we are storing our data files. For this tutorial, we will have our dataset inside of a Data folder contained in another internal folder called "Other."

To follow along, we can download the NYC Airbnb dataset here.

Here is a snippet of what the dataset should look like:

Airbnb Prices in NYC CSV
Neighborhood Group Neighborhood Room Type Price
BrooklynKensingtonPrivate room149
ManhattanMidtownEntire home/apt225
ManhattanHarlemPrivate room150
BrooklynClinton HillEntire home/apt89
ManhattanEast HarlemEntire home/apt80

Loading and Formatting

As mentioned previously, to load and read in the dataset, we will need to create a DataFrame component named "frame". Using the frame, we must use the Load function and type in the file path of the Airbnb CSV.

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.Histogram

// create frame component
DataFrame frame
// read in data from dog csv
frame:Load("../Data/Other/AB_NYC_2019.csv")

Once the data has been loaded in, we will now extract this data to use in the data chart. We will be using two functions from our frame component, AddSelectedColumns(text header) and AddSelectedFactors(text header) where the columns will be used to label our x axis, signifying the groups we are observing and the factor will be used to label our y axis, signifying the change over time. AddSelectedColum and AddSelectedFactor take in a parameter of either the column number or the column label in the CSV file. We will be using the column number to demonstrate.

Notice in our data that we do have a lot of columns, but feel free to ignore them, the only column we will be pulling are the prices. Please note that for a Histogram, we will only be using the AddSelectedColumns(text header) because we would like to count the totals of each cost range for AirBnBs. This goes for other datasets as well, in which we will not be modifying the y axis. AddSelectedColumns(text header) will take a string as the parameter which represents the text header in the data file.

We should have the following code:

// pull out specific columns from csv that we are comparing
// note: histograms do not support factors
frame:AddSelectedColumns("price")

Now it is time to create the Histogram which can be done with the following code. This creates a chart object from our DataFrame component, frame. The rest of this lesson, we will be using the chart object we have created to change and format the rest of our line chart.

// create chart component inherited from the line chart library
Histogram chart = frame:Histogram()
chart:Display()

Example of loading our data and creating histogram object

Calling the Display() function will give us a pop-up of our formatted data so far. We still need to give meaning to our data, therefore, the following steps will show us how to label and customize our chart.

Labeling The Histogram

In order for viewers to understand our data, labels give a clear comprehension of what is being presented. This means that we will be labeling the x axis, y axis, legend, and giving our chart a title that describes the dataset. To do so, we will call the following functions with our "chart" object: SetTitle(text title), SetXAxisTitle(text title), SetYAxisTitle(text title), SetLegendTitle(text title), and SetSubtitle(text title). Here is a brief description on what each function does and what it takes in.

Histrogram Labeling Functions
Function Description Usage
SetTitle(text name)*SetTitle() takes in a string as a parameter, which would be the title of the chart. For this example, we will name the chart "Average Price a Night for an AirBnB in 2019 (NYC)" chart:SetTitle("Average Price a Night for an AirBnB in 2019 (NYC)")
SetXAxisTitle(text name)SetXAxisTitle() takes in a string as a parameter, which would be the label of the x axis. For this example, we will label this section "Price" because this is the time frame we are observingchart:SetXAxisTitle("Price")
SetYAxisTitle(text name)SetYAxisTitle() takes in a string as a parameter, which would be the label of the y axis. For this example, we will label this section "Total" because this is the factor used for comparison between the different countries. This is also a good section to label the unit we are comparing, such as dollarschart:SetYAxisTitle("Total")
SetLegendTitle(text name)SetLegendTitle() takes in a string as a parameter, which would label the legend of the chart. The legend identifies the separate countries into different lines. For this example, we will label the legend "Cost a night"chart:SetLegendTitle("Cost a Night")
SetSubtitle(text title)SetSubtitle() takes in a string as a parameter which would set a subtitle under the title. This can be any short description or any other necessary information for our chart. For this example, we will label the subtitle "How expensive does staying in NYC cost"chart:SetSubtitle("How expensive does staying in NYC cost")

// create a title to describe the chart 
chart:SetTitle("Average Price a Night for an AirBnB in 2019 (NYC)")
// let's adjust the font size so it appears nicely on the screen
chart:SetTitleFontSize(20)

// label the x axis, y axis, and the legend title
chart:SetXAxisTitle("Price")
chart:SetYAxisTitle("Count")
chart:SetSubtitle("How expensive does staying in NYC cost")
chart:SetLegendTitle("Cost a Night")

*Note: We will also be adjusting the font size for the title as well so it fits nicely with our histogram. For this, we will be calling the function SetTitleFontSize(integer size) with our chart object. This takes in an integer (size) as the parameter, so for this case, we will insert 20 as our parameter.

Example of labeling our histogram

Customizing The Data Chart

Now that we have our data labeled, we can customize our data to our liking, such as adjusting the intervals, changing starting values, and changing the color. We will be playing around with all these features and to do so, we will be again, using our chart object to call these functions. The functions we will be using for this would be: SetXTickInterval(integer num), SetXAxisMinimum(integer num), and SetColorPaletteToWarmScale().

Other Useful Histogram Functions
Function Description Usage
SetXTickInterval(number setX)SetXTickInterval() takes in an integer as a parameter and sets the interval in multiples of the setX value given. For this tutorial, we will insert 50 as the tick countchart:SetXTickInterval(50)
SetXAxisMinimum(number min)SetXAxisMinimum() takes in an integer as a parameter and adjusts the starting point for the x axis. For this tutorial, we will insert 0 as our minimum so we can see the price ranges from 0-50, 50-100, etc.chart:SetXAxisMinimum(0)
SetColorPaletteToWarmScale()SetColorPaletteToWarmScale() changes the color palette to warmer colors such as yellows, reds, and oranges.chart:SetColorPaletteToWarmScale()

// customization features
// sets this to warm tones
chart:SetColorPaletteToWarmScale()
// define a clear interval, we separate each interval by 50
chart:SetXTickInterval(50)
// let's start our chart at 0 to examine a curve as a whole
chart:SetXAxisMinimum(0)

Example of customizing our histogram

Displaying The Chart

Congratulations, our Histogram is constructed! Now we can display our chart with the Display() function. There are two ways to do this, letting it automatically display and specifying a specific window size. By doing chart:Display() it will display in a size equal to the screen size. By doing chart:Display(num, num), it will display the chart in a respected constraint window size. We will be using the default display to show the histogram.

chart:Display()

Now, feel free to clean, build, and run our program and we shortly should see a Game window pop-up. This is our Histogram! To view the entire code, click here to view the file.

Run the Program

Full Example of the Histogram

Final Chart

0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 400.0 450.0 500.0 550.0 600.0 650.0 700.0 750.0 800.0 0 10 20 30 40 50 60 70 80 90 Price Count Average Price a Night for an AirBnB in 2019 (NYC) How expensive does staying in NYC cost Cost a Night price

Further Useful Histogram Functions

Other Useful Histogram Functions
Function Description Usage
SeparateByFactor(integer)SeparateByFactor() will separate graphs based on number of columns we want in the grid of subcharts. It takes in an integer, but could be left empty. If empty, by default, it would leave it as a single chartchart:SeparateByFactor(4)

To view more examples with charts, we can reference the Quorum Cirriculum Repository for charts.

Next Tutorial

In the next tutorial, we will discuss Creating a Pie Chart, which describes how to use pie charts in quorum studios.