Making a Scatter Plot

The next chart we will be learning to create would be a scatter plot. Scatter plots are used to observe relationships between two variables. An example of this would be comparing the heights and diameters of trees, where the position of each dot correlates to that specified height and diameter. The data as a whole can represent a relationship: strong positive/negative linear, moderate positive/negative linear, or no relationship. As a data scientist, one of the most important patterns is seeing how points cluster from other points, if there are any gaps within the dataset, and identifying outliers for within the set. We focus on these aspects to make predictions on future datasets and understand trends.

This next dataset we are using to create a scatter plot is about insurance costs on various customers. It will compare the cost of insurance based on their BMI (body mass index). Each dot also represents the ages of these customers.

To follow along, we can download the insurance dataset here.

Here is a snippet of what the dataset should look like:

Insurance CSV
Age Sex BMI Charges

Loading and Formatting

As mentioned previously, to load and read in the dataset, we will need to create a DataFrame component named frame. Using the frame, we must use the Load function and type in the file path of the insurance CSV. Recall that a CSV is a comma separated text file that holds in data.

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.ScatterPlot

    This is an example of a simple scatter plot in quorum.
    The data collected is about medical insurance costs in relation to BMI (body mass index).

// create dataframe to read in data
DataFrame frame

Note that, we stored this dataset in a Data folder and contained in that folder is an inner folder named Science.

Once the data has been loaded in, we will now extract this data to use in the data chart. We will be using two functions from our frame component, AddSelectedColumns(text heading) and AddSelectedFactors(text heading). In this instance, because scatter plots need two data variables for comparisons, we will be extracting the BMI and charges using our AddSelectedColumns(text header) function. We will also be extracting ages with our AddSelectedFactors(text header) function to discriminate the two variables through the dots. The usage of these two functions are shown below:

Adding CSV columns onto Charts
Function Description Usage
frame:AddSelectedColumns(text heading)AddSelectedColumns() takes in a string that matches a column heading from our dataset. This function is used to format our axises. For this tutorial, we will be calling this function twice and extract "bmi" and "charges."frame:AddSelectedColumns("heading")
frame:AddSelectedFactors(text heading)AddSelectedFactors() takes in a string that matches a column heading from our dataset. This function is used to label our dots and form the legend based off of the two variables we are comparing. For this tutorial, we will be extracting "age."frame:AddSelectedFactors("heading")

We should have the following code:

// pull out selected data, for this we will be categorizing by bmi, charges, and age

Now it is time to create the Scatter Plot which can be done with the following code. This creates a chart object from our DataFrame component, frame. The rest of this lesson, we will be using the chart object we have created to change and format the rest of our scatter plot.

// using the data frame, format data by creating a scatter plot chart component
ScatterPlot chart = frame:ScatterPlot()

Example of loading our data and creating scatter plot object

Calling the Display() function will give us a pop-up of our formatted data so far. We still need to give meaning to our data, therefore, the following steps will show us how to label and customize our chart.

Labeling the Scatter Plot

In order for viewers to understand our data, labels give a clear comprehension of what is being presented. This means that we will be labeling the x axis, y axis, legend, and giving our chart a title that describes the dataset. To do so, we will call the following functions with our "chart" object: SetTitle(text name), SetXAxisTitle(text name), SetYAxisTitle(text name), SetLegendTitle(text name), and SetSubtitle(text name). Here is a brief description on what each function does and what it takes in.

Labeling Charts
Function Description Usage
SetTitle(text name)SetTitle() takes in a string as a parameter, which would be the title of the chart. For this example, we will name the chart "Charges of Insurance Based Off of BMI and Age"chart:SetTitle("Charges of Insurance Based Off of BMI and Age")
SetXAxisTitle(text name)SetXAxisTitle() takes in a string as a parameter, which would be the label of the x axis. For this example, we will label this section "Body Mass Index (BMI)"chart:SetXAxisTitle("Body Mass Index (BMI)"
SetYAxisTitle(text name)SetYAxisTitle() takes in a string as a parameter, which would be the label of the y axis. For this example, we will label this section "Insurance Cost (in $)." This is also a good section to label the unit we are comparing, such as dollars. chart:SetYAxisTitle("Insurance Cost (in $)")
SetLegendTitle(text name)SetLegendTitle() takes in a string as a parameter, which would label the legend of the chart. The legend identifies the separate ages for the dots. For this example, we will label the legend "Age Group" chart:SetLegendTitle("Age Group")
SetSubtitle(text title)SetSubtitle() takes in a string as a parameter which would set a subtitle under the title. This can be any short description or any other necessary information for our chart. For this example, we will label the subtitle "Does body weight and age affect cost of insurance?"chart:SetSubtitle("Does body weight and age affect cost of insurance?")

// label your scatter plot
chart:SetXAxisTitle"Body Mass Index (BMI)")
chart:SetYAxisTitle("Insurance Cost (in $)")
chart:SetLegendTitle("Age Group")
chart:SetSubtitle("Does body weight and age affect cost of insurance?")
chart:SetTitle("Charges of Insurance Based Off of BMI and Age")

Note, if we would like to see the data chart so far, we can type "chart:Display()" to view it with the labels we created.

Example of labeling our scatter plot

Customizing the Data Chart

Now that we have our data labeled, we can customize our data to our liking, such as adjusting the intervals, changing starting values, and changing the color. We will be playing around with all these features and to do so, we will be again, using our chart object to call these functions. The functions we will be using for this would be: SetLegendLocation(text location), SetColorPaletteToDisurbing(), SetFontSize(integer size), FlipOrientation(), and ShowLinearRegression(bool). Here are brief descriptions on what each function does and how to use it.

Customizing Charts
Function Description Usage
SetLegendLocation(text location)SetLegendLocation() takes in a string as a parameter, which would be the directions, left, right, top or bottom. These directions would place the legend in the specified place. For this example, we will place the legend on the "bottom"chart:SetLegendLocation("Legend Location")
SetFontSize(integer size)SetFontSize() takes in an integer as a parameter and will set the font size on all text based on the desired input. For this tutorial, we will insert 30 as the font size.chart:SetFontSize(30)
SetColorPaletteToDisurbing()SetColorPaletteToDisurbing() takes in no parameters, but will adjust the color palette based off of yellows, browns, oranges, and greenschart:SetColorPaletteToDisturbing()
ShowLinearRegression(boolean)takes in a true or false value (boolean), and wll show the regression lines and equations for the chartchart:ShowLinearRegression(true)
FlipOrientation()FlipOrientation() takes in no parameters, and this function will swap the places of the x and y axis.chart:FlipOrientation()

// set the legend location, choices are left, right, top and bottom

// color palette contains yellows, oranges, browns, and greens

// adjust font size by preference, here we set it to 30 pt

// if we would like to switch the x and y axis

Example of customizing our scatter plot

Congratulations, our Scatter Plot is constructed! Now we can display the chart with the Display() function. There are two ways to do this, letting it automatically display and specifying a specific window size. By doing chart:Display() it will display in a size equal to the screen size. By doing chart:Display(num, num), it will display the chart in a respected constraint window size. We will be using the specified display.

chart:Display(1000, 750)

Now, feel free to clean, build, and run our program and we shortly should see a Game window pop-up. This is our Histogram! To view the entire code, click here to view the file.

Full Example of the Scatter Plot

Final Chart

0 10000 20000 30000 40000 50000 60000 70000 15 20 25 30 35 40 45 50 55 Insurance Cost (in $) Body Mass Index (BMI) Charges of Insurance Based Off of BMI and Age Does body weight and age affect cost of insurance? Age Group 18.0 19.0 20.0 21.0 22.0 23.0 24.0 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 33.0 34.0 35.0 36.0 37.0 38.0 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 49.0 50.0 51.0 52.0 53.0 54.0 55.0 56.0 57.0 58.0 59.0 60.0 61.0 62.0 63.0 64.0

Further Useful Scatter Plot Functions

Extra Functions
Function Description Usage
SetPointDestiny(integer num)This function takes in an integer that will adjust the size of the dots of the scatter plotchart:SetPointDensity(5)

To view more examples with charts, we can reference the Quorum Cirriculum Repository for charts.

Next Tutorial

In the next tutorial, we will discuss box plot, which describes how to use the box plot chart in quorum.