Data - Lesson 6: Machine Learning and Bias

Overview

In this lesson, students are introduced to the concepts of Artificial Intelligence and Machine Learning in an unplugged activity. First students will gather objects from around a room that they and a partner are going to attempt to classify. Then, they will gather items to try and trick the artificial intelligence. This data will be used to train one partner, that will classify new objects based on this training, then will switch to classifying new objects the AI student has never encountered in its training.

Goals

Students will be able to:

  • Train and test a machine learning model.
  • Reason about how human bias plays a role in machine learning.

Purpose

This tutorial is designed to quickly introduce students to machine learning, a type of artificial intelligence. Students will explore how training data is used to enable a machine learning model to classify new data.

Resources

Preparation

Review the classroom to ensure there are items available that can be classified in some way or create categories ahead of time. For remote students, ensure that there are items available that can be classified either over a camera or through sound.

Activity (35 mins)

Teaching Tip

Artificial Intelligence and Machine learning use algorithms, just like any other area of computer science, to make decisions based on these labels. The area of mathematics this area relies on the most is arguably called linear algebra, which is the use of vectors and matrices. More information on the standard words people use when describing AI, and some of the math, can be found at explained.ai.

Video: Play the video 'What is Machine Learning.'

Remarks

  • Machine learning refers to a computer that can recognize patterns and make decisions by learning the rules on its own. In this activity you're going to supply the data to train your own machine learning model. Imagine one student is a human programmer training an artificial intelligence and another is the computer being trained.
Teaching Tip

Images in neural networks are often pre-trained on a huge set of data, like ImageNet. That database contains over 14 million hand-annotated images. ImageNet contains more than 20,000 categories with a typical category, such as balloon or strawberry, consisting of several hundred images. When an A.I. is scanning new items and making its own predictions, it is actually comparing the possible categories for the new image with the patterns it found in the training dataset.

Do This: As a first step, both students should find objects around the room that have a similar property and another with a different one. For example, one might find cans of juice, water, or soda, that are either in a box or are in a round can. Students should identify a particular property they want to train the computer on and should be encouraged to be creative with the properties they choose. One student might choose, for example, a visual property, like the color of a can, while another might choose to provide information to their machine via another sense (e.g., feeling for whether an item is a box or a can, weight). Once items are gathered, have the human student provide the items to the AI student in order and with a label (e.g., item 1, round). This human student should categorize them via the appropriate properties, marking each item with a number (e.g., item 1) and its property (e.g., it is round). Excel, paper, or any other mechanism is reasonable for doing this marking. The AI student should mark the labels down even if they are wrong. The AI student does not get to choose whether or not the human student is correct or even telling the truth.

Prompt: The artificial intelligence has now received data on a number of items. Using this data, how could a machine classify new items that are not in the training set?

Do This: The students should now find new items to provide to the AI for classification. Once these items are found, they are provided to the AI. The students should try to intentionally find items that might be tricky to classify for the AI, given the property they chose (e.g., sort of round). Now the tables have turned. The human student should feed items to the AI and the AI student should return the classification. Do not worry about how an AI would calculate the label, as that is an advanced topic. Instead, focus on what the students think an AI might predict and think through the kinds of mistakes it might make in the process. Perhaps even, through training, their might be bias in what the AI reports.

Discuss: How well did A.I. student do in classifying the new items? How do you think a computer would make a decision for how to classify the items?

Discussion Goal

Goal: Get students to reflect on their experience so far. It is important at this point that they realize the labeling they are doing is similar to programming the computer. The examples they show A.I. are the 'training data.'

Video: Play the video 'Training Data & Bias.'

Prompt: How could biased data result in problems for artificial intelligence? What are ways to address this?

Discussion Goal

Goal: At this point, students should have some preliminary thoughts on how biased data leads to problems for artificial intelligence. They may bring up that if the data sets are trained incorrectly, there will be incorrect or misinterpreted conclusions. It can be addressed through diverse training sets. The following video dives into this subject further.

Video: Play the video 'How I'm fighting bias in algorithms', with Joy Buolamwini.

Prompts

  • How can computing innovations which make use of Machine Learning reflect existing human bias?
  • How could it be used to discriminate against groups of individuals?
  • How can that bias be minimized?

Remarks

  • As we've seen, problems of bias are often created by the type or source of data being collected. Collecting more data does not mean that the bias is removed. Computing innovations can reflect existing human biases because of biases written into the algorithms or biases in the data used by the innovation.
  • Machine learning and data mining have led to innovations in medicine, business, and science but information discovered in this way has been used to discriminate against groups of individuals.
  • Programmers (that includes you!) should take action to reduce bias in algorithms used for computing innovations as a way to combat existing human biases. Be on the lookout! Bias can occur at any level in software development.

Review: Play the video 'Impact on Society' which recaps the concepts discussed today.

Wrap up (5 Minutes)

Prompt: Which steps of this process to you think have to be done by humans? Would you be concerned if any of them were automated?

Discuss: Time may be running short at this point in the class. Encourage students to share with a neighbor or share out with the room. The conversation should focus around bias.

Remarks

  • At this point, you've fully explored the core parts of the Data Analysis Process. Ultimately you are able to use the new information gained through visualizing and finding patterns (whether yourself or using Machine Learning) to make decisions. This is why being careful about bias is so important!

Assessment: Check for Understanding

For Students

Open a word doc or google doc and copy/paste the following question.

Question

Think about examples of Machine Learning you may have encountered in the past such as a website that recommends what video you may be interested in watching next. Are the recommendations ever wrong or unfair? Give an example and explain how this could be addressed.

Standards Alignment

  • CSTA K-12 Computer Science Standards (2017): 3B-AP-08
  • CSP2021: DAT-2.C.5
  • CSP2021: IOC-1.B.1, IOC-1.D.1, IOC-1.D.2, IOC-1.D.3

Next Tutorial

In the next tutorial, we will discuss Code.org Unit 9, which describes explore innovations in everyday life.