Overview

First the students will have an introduction to the concept of "big data," where it comes from, what makes it "big," and how people use big data to solve problems, and how much of their lives are "datafied" or could be. The lesson concludes with a brief introduction to the AP Explore Performance Task which students are recommended to complete at the end of the unit.

Vocabulary

Goals

Students will be able to:

Purpose

Big data is a big deal right now, both in the field of computer science and more broadly across other fields and industries. Understanding the types of things that can be captured in data and anticipating the types of innovations or new knowledge that can be built upon this data is increasingly the role of the computer scientist. A first step toward understanding big data is a survey of how big data is already being used to learn and solve problems across numerous disciplines. The scale of big data makes it hard to "see" sometimes, and techniques for looking at, working with, and understanding data change once the data is "big." Everything, from how it's stored to how it's processed to how it's visualized, is a little different once you enter the realm of big data.

Resources

Activity Guides

College Board Resources

Getting Started

First, the students will watch and/or listen to the following video on Big Data.

Discuss with the students the following:

Based on what they saw in the video, what is big data? In small groups (if possible), allow the students to share their responses. Afterwards, discuss as a whole class. Try to keep these things in mind:

Activity

Explain to the students the graphics linked below. Part of what contributes to data being "big" is the sheer growth of the amount of data in the world. Examine as a class on how large big data is. The amount of data flying around is growing exponentially, doubling every two years or so. Here's a way to think about how fast this is: The world will produce as much digital data over the next two years, as currently existed in all of humanity prior to that. And it will do the same the two years after that. And so on. That's a lot!

A graph demonstrating Moore's Law with the growth of data from 2009 to 2012, and then predicts the growth of data until 2020.

Reading for Students

Now, let's talk about "Moore's Law." Look over the definition at the beginning of the lesson again. It is not a law of nature or mathematics but simply a surprisingly accurate prediction that was made a long time ago. In 1965, a computer chip designer named Gordon Moore predicted that the number of transistors one could fit on a chip would double every 18 months or so. Amazingly, that prediction has more or less held true to the present day! The result is that since about 1970, computers have gotten twice as fast, at half the cost, roughly every 1.5-2 years. With some small differences, the same is true for data storage capacity. This is extraordinarily fast growth - which is called exponential growth. With more and more machines that are faster and faster, the amount of data being pushed around, saved, and processed is growing exponentially. This is so fast that it's hard to fathom and even harder to plan for. Mention the following example: If the average hard drive today is 1 TB and you are planning for something or 6 years away, you should expect that average hard drives will be 8-10 TB. Keep in mind the following things about Moore's Law:

Big Data Sleuth Card Activity

Big data surrounds us but it is sometimes surprisingly challenging to get access to it, use it, or see it. Much of the data out there is in the "wild." Even when the data is "available," it can sometimes be challenging to figure out where it came from, or how to use it. Open the "Big Data Sleuth Card Activity" guides for students and let them choose one of the websites listed on the activity guide. Have them answer the questions listed on the activity guide as they explore the websites.

Wrap Up

Students should share their results from the Big Data Sleuth Cards with members of another group. This can also be conducted as a class-wide discussion. Think about the following questions:

Introduce AP Computer Science Explore Performance Task

The College Board Explore Performance Task is a part of the AP Computer Science Principles assessment. The Explore PT has two major components, as shown in the PDF file linked, (1) the computational artifice and (2) written responses. Read through the submission requirements and prompts on pages 5-6 of the PDF document. The students can also look at sample responses from the students from the previous years on College Board's website.

Assessment

Historically it has been observed that computer processing speeds tend to double every two years. This is known as:

When a computer scientist uses the term "Big Data" what do they typically mean?

Extended Learning

The students might be interested in looking at some of the publicly available datasets linked below.

Standards Alignment