Data Science Hour of Code

Activity 3: Selecting the Data

Scenario:

  • You're a data scientist working with a group of researchers from Antarctica.
  • They have collected a bunch of data about penguins, but need help answering questions about it.
  • Your job is to make some charts from the data and answer some of their questions.

Introduction:

Now that we're getting the hang of it, we can start to explore more data. Often times, data scientist have large datasets with many different columns, and they might investigate how different variables have different relationships. In the Table below, we can see a sample of Penguins2.csv. This dataset is a lot like our Penguins1.csv, except it has a few more columns. For each entry, we can look at the species of the penguin (Adelie, Gentoo, Chinstrap), which island it lives on (Torgersen, Biscoe, Dream), and some measurements like bill length, bill depth and flipper length.

Sample of Penguins2.CSV file
speciesislandbill_depthbill_lengthflipper_length
AdelieTorgersen18.739.1181
AdelieBiscoe18.337.8174
AdelieDream18.536.8193
GentooBiscoe13.246.1211
ChinstrapDream17.946.5192

Instructions:

In the code editor below, we have a program that makes a Chart. Take code blocks from the palette and place them below where we Load the .csv, but before we make the Chart object.

  1. Use the block(s) in the palette on the left.
  2. Place the 'frame:AddSelectedColumns("bill_depth,bill_length,flipper_length")' block below the 'frame:Load("data/Penguins2.csv")' block in the block editor.
  3. Run the program.
  4. Use the chart in the canvas to answer the questions in the Activity section.

Coding:

Blocks

Activity:

Use the chart(s) you've created in the Coding section to answer a few questions.








Next Tutorial

In the next tutorial, we will discuss Selecting a Factor, which describes how to split our data into groups based on Factors..