Hour 10: Traversals

This lesson is to introduce you to traversals.

Overview

One of the most enduring reasons why loops and conditionals are useful in computer science is that they offer a way to repeat behavior and make decisions. That combination together provides an incredibly useful technique you can apply over and over again to create algorithms. In this lesson, you will practice applying these techniques together to traverse a data structure.

Goals

You have the following goals for this lesson:

  • Learn about data structures and how to traverse them
  • Learn how to create your own algorithm
  • Practice using loops and conditionals together

Warm up

Imagine you are looking for information in a book that contains the phone number of every cell phone in the world. This book contained every person's name, their address, and phone number and it was not sorted. How many steps would it take a human to flip through the book and find the phone number you want?

Vocabulary

You will be learning about the following vocabulary words:

Vocabulary
TermDefinition
TraversalAn algorithm that visits all or part of a data structure, typically from the start to the finish.
Control FlowHow the code makes decisions on which statements will run and in which order.
AlgorithmA finite series of steps a computer program can take to solve a problem. 
Data StructureA way to store data in the computer that typically has actions to store, retrieve, and traverse that data.

Code

You will be using the following new pieces of code:

New Code to Learn
Quorum CodeCodeExplanation
DataFrameColumnDataFrameColumn column = frame:GetColumn(0)A DataFrameColumn is an in memory representation of a column in a spreadsheet.
column:GetAsInteger(index)integer value = column:GetAsInteger(i)A line of code that gets the value of in one cell of a column and gives it back as an integer.
column:GetSize()integer size = column:GetSize()A line of code that can obtain the total number of items in a DataFrameColumn.

CSTA Standards

This lesson covers the following standards:

  • 3A-AP-12: Create computational models that represent the relationships among different elements of data collected from a phenomenon or process.
  • 3A-AP-13: Create prototypes that use algorithms to solve computational problems by leveraging prior student knowledge and personal interests.
  • 3A-AP-14: Use lists to simplify solutions, generalizing computational problems instead of repeatedly using simple variables.
  • 3A-AP-15: Justify the selection of specific control structures when tradeoffs involve implementation, readability, and program performance, and explain the benefits and drawbacks of choices made.
  • 3A-AP-16: Design and iteratively develop computational artifacts for practical intent, personal expression, or to address a societal issue by using events to initiate instructions.

Explore

While you can write computer programs that store individual variables to gather information, this quickly becomes impractical or impossible. For example, in the warm-up, you considered the idea of a book that contains the phone numbers of all people on earth. If such a book existed, it would be impractical to write down a variable for every person. In fact, even if you wrote such a program for a small city of 10,000 people, writing new code and recompiling your software every time a new person came to town would not be particularly sound engineering.

How to manage data is a massive topic in the field of computer science. In a college sequence, for example, just to manage storage in memory is a one year college sequence. Once you get to storing items outside of memory on a hard disk, on other people's computers like the cloud, or in other ways, it takes even more practice. The good news is, you do not need to know all of these to make apps. Knowing about them, or abstracting, can still let you meaningfully write apps.

Data Structures

Consider a few examples. You may have heard words like array, list, linked list, or other names, all of which store data in memory. You may have even heard of the seemingly magical approach called the hash table. This extremely commonly used structure has the peculiar property of being able to access disconnected data extremely quickly. Overall, storage mechanisms like these are typically called data structures. With data structures, they are often not necessarily better than each other. Instead, data structures have mathematical properties, like how fast they can read or write, or details on how they store, which make them situationally better or worse.

There are many ways to interact with data structures and over time it can be helpful to learn a variety of them, but one crucial method is called traversal. Generally, the idea in traversal is to teach a computer program to either process every single value in a structure or at least to keep searching until it finds a particular one. You have already used the DataFrame object, which is one kind of such structure. There is nothing special about DataFrame, but it has its own type because it processes data in a very specific way related to data science.

Examine again the Dogs.csv comma separated value file mentioned in a previous lesson, but this time consider how you might go about doing the kinds of calculations it can with your own algorithm. The first point to recognize is that, while not true for all, many kinds of data structures have what is called an index. The idea is to label each item in the structure with a number that you can reference. For example, the first four columns in the Dog.csv file are:

  • id
  • Name
  • Breed Group
  • Bred For

Indexing

Historically, it would be rational to assume that computer scientists got together, wrote a standard, and decided what the labels should be. Sadly, the community is not so lucky. As such, today, different programming languages sometimes use what is called 0-based indexing and sometimes use 1-based indexing.

The difference here is important and the idea in 0-based is the first label is 0 (arguably, weirdly) and in 1-based the first is 1 (arguably less weirdly). While the word weird is hard to quantify, the point is that for every programming language you use, you need to know what the designer chose and they are not all the same. Some chose both (e.g., Java's arrays versus their database libraries).

That said, the most common choice is probably 0-based. The reason is because under the hood, the computer can do the calculations just slightly faster with that choice. Children's products sometimes choose 1, as does the College Board in AP Computer Science, but this is arguably rare and especially so for professional tools. Evidence on the choice is not definitive in the literature as of the time of this writing.

Traversing in a DataFrame

In any case, in the Quorum programming language, the choice is 0-based indexing. This means the above list would start at 0 instead of 1. With that information in hand, you can get any of the columns in the data frame. The following code does this:

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.DataFrameColumn
DataFrame frame
frame:Load("data/Dogs.csv")

//The first column is column 0. Ya, it's weird.
DataFrameColumn column = frame:GetColumn(0)

Notice that DataFrame has an action you have not encountered before: GetColumn. The GetColumn action takes one parameter, which is the index. This indicates which column to get. Once you have the column, it contains all of the raw data from the DataFrame in memory, stored in whatever way the designer chose to store it. For the first column, how this storage works is outside the scope. While not the complete story, the easiest way to think about it is the storage used by DataFrame was optimized for statistics. Here is an abridged list of column actions in DataFrameColumn and what they do:

Sample Actions in DataFrameColumn
Action in DataFrameColumnReturnsWhat it does
GetAsInteger(index)integerGet a value from the column as an integer
GetAsNumber(index)numberGet a value from the column as a number
GetAsBoolean(index)booleanGet a value from the column as a boolean
GetAsText(index)textGet a value from the column as text
GetSize()integerGet the number of items currently in the column

There are many other actions, like those for adding and removing data, getting the name of the column, and others. However, imagine you want to add up all of the items in the column or calculate the average. Now, with traversals and indices, you can do just that. Here is an example of code that adds up all of the values:

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.DataFrameColumn
DataFrame frame
frame:Load("data/Dogs.csv")
DataFrameColumn column = frame:GetColumn(0)

//Use a loop to add all the values
integer sum = 0
i = 0
repeat while i < column:GetSize()
	integer value = column:GetAsInteger(i)
	sum = sum + value
	i = i + 1
end

output sum

The key here is to notice that once you have a column, you can use loops to get each value, referenced by its index, and calculate the total. To do so you first create an integer variable for sum and set it to 0. Next, while you can use any loop type, repeat while allows you to set a condition that the loop should stop once it has traversed through the entire column. On each iteration of the loop, you get the value of the column, starting from the top and going down. You then add the value to the sum and crucially increment a counter variable by 1. That counter variable tells the loop when to stop and forgetting it would cause your program to never stop. This is often called an infinite loop.

Creating Custom Algorithms

The real power of algorithms, though, is in creating your own for any specialized needs you have. For example, suppose you needed to count in the total only every other item in the column. To do this, you would nest an if statement inside of the loop. That might change the loop to something like this:

repeat while i < column:GetSize()
	integer value = column:GetAsInteger(i)
	if i mod 2 = 0
		sum = sum + value
	end
	i = i + 1
end

In this case, recall that the mod operator is the remainder of a division. If i is 0, then 0 mod 2 is also 0. If it is 1, then 1 mod 2 is 1. It goes on with every even value being 0 and every odd one being 1, which allows the program to count every other. Algorithms and traversals are very common and writing them is the type of activity a programmer in practice would do every day. They are an essential part of computer science that you will practice next.

Perhaps this lesson might appear as very abstract, leaving you wondering why traversals would matter. However, traversals are exceedingly common. So common, in fact, that for professionals they might show up in a significant majority of programs they ever write. This might include counting parts of data, calculating information, retrieving the settings a user has, loading a configuration file, talking to collections of web servers, sending information to many players in a game, saving information from a list, or so many other things. Traversals, at the end of the day, are a skill worth practicing.

Engage

Learning to combine iteration and decision making, loops and conditionals, into algorithms is a core and extremely important skill to practice. Computer scientists do these kinds of activities every day and being able to do them fluently makes it easier to grasp later concepts.

For these problems, consider that each is a bit of a puzzle. It is normal to run the program often, get stuck, and tweak as you go. Sometimes it helps to add extra output statements as you debug to try and learn the behavior of the loop.

As one final point, do not forget that if you accidentally tell your web browser to run a loop that would run forever, you can crash it. This is not scary and not a big deal. In fact, any developer from any website could have accidentally put an infinite loop on a webpage you already visit. If this happens in your code, just refresh the page.

Directions

You will practice algorithms again using Parsons problems on traversal. You can drag and drop, use the keyboard, or even just write in the editor the solution to the problem and run the code. As a reminder, the hotkey to run the code is ALT + SHIFT + R on Windows and CTRL + SHIFT + R on Mac.

  1. Learn about Traversals in DataFrameColumn objects

Wrap up

While traversals are very powerful, they have one big downside. As the amount of data you have to process grows, it takes the computer longer and longer to process. How much data do you think the computer can process before it starts to struggle? In fact, what does it even mean for a computer to struggle with large amounts of data?

Next Tutorial

In the next tutorial, we will discuss Apps1 Online, which describes how to make your own apps.