In this lesson, students examine a classic problem in computer science, the Traveling Salesperson Problem (TSP). Students solve small instances of the problem, try to formulate algorithms to solve it, and discuss why these algorithms take a long time for computers (and humans) to compute. Students learn how the TSP grows in size much faster than the problem of adding characters to a password. Even though we use encryption to motivate a desire to learn about computationally hard problems, they are valuable to know about, in and of themselves. This lesson covers some territory about how we reason formally and mathematically about algorithms and figuring out how “hard” something is for a computer to do.
Students will be able to:
In this final algorithm detour, students are introduced to the Traveling Salesperson Problem, a classic problem in computer science for which there is no known algorithm to solve it, other than brute force. The number of possible solutions grows extremely fast, even for small inputs, and quickly becomes "unreasonable" to solve, making it a computationally hard problem. The ideas of computationally hard problems are leveraged for encryption to make ciphers that take an unreasonable amount of time to crack (as in thousands of trillions of years), but computationally hard problems are also important in their own right. There are many problems for which we wish we had reasonable algorithmic solutions - especially in medical fields - and we're still on the hunt to find them. No one has yet mathematically proven whether or not the problems we currently think are "hard" actually are.
Remind the students the following as a recall:
Now, introduce today's lesson to the students:
After distributing The Traveling Salesperson Problem Activity Guide,
If you ask students how they arrived at their solutions they might say, "I tried a bunch of different paths and kept track of the shortest." Or some form of greedy algorithm: "I want to avoid that large edge" or "I want to use that small edge." These are good strategies, but they can always be thwarted with the question, "How do you know for sure it's the shortest?"
The goal of the discussion is to bring out the fact that this would be a much harder problem for computers to solve than the shortest path problem or minimum spanning tree problem. "Hard" means that there is no known way to find the correct answer without generating ALL possible answers. Connect this problem to our desire to understand problems that are computationally hard for computers for encryption.
Here are some of the questions you can ask the students:
This is a fairly extended Wrap Up that asks the teacher to lead students through some reasoning about the traveling salesperson problem. Some preparation is recommended.
Many people for many years have tried to design an algorithm that works in all cases to find the exact optimum route for the traveling salesperson problem, with no success. So there is currently no algorithm to find the exact answer besides "brute force," trying all the possibilities.
Since we have to calculate the distances of all the tours anyway, we might as well count how many tours there are in the first place, just so we know how many calculations we have to perform.
For this demonstration, let's also consider a version of the problem faced by, say, an airline. For an airline, there are cities all over the world, and you can fly from any one directly to any other. So for an airline, there is a possible path that connects every possible node; we say the graph is "fully connected."
Show the students how the problem grows very quickly, by constructing all the possible tours for larger and larger numbers of nodes.
Starting with 3 nodes, there is only one possible tour as shown in the image below.
With 4 nodes, there are 3 possible tours as shown in the image below.
With 5 nodes, there are 12 possible tours as shown in the image below. The way shown in the image is one way to reason through the different tours that 5 nodes can take: a 5th node could be added between each pair of nodes for every possible 4-node tour. Here, the actual distances don't matter because we're just trying to count the number of tours.
A different way to reason through the possible tours of 5 nodes is to "break" one of the edges, connecting it to the new 5th node.
Without having to draw pictures, we can reason about 6 nodes: add a node to every possible edge of the 12 solutions 5-node solutions. Since each of 12 solutions above has 5 edges, it means we have 12*5 = 60 tours!
We can see how fast this grows: between a graph with 3 nodes and a graph of 6 nodes, our total number of possible tours grew from 1 to 60. This problem grows very quickly!
With just 10 nodes, this grows to about 181,440 possible tours.
With just 26 nodes, this grows to a 25-digit number: 7,755,605,021,665,492,992,000,000. (By comparison, the width of the entire observable universe(!!!!) in miles is roughly a 25-digit number.)
With 100 nodes you're up to a roughly a 155-digit number.
The math: With n nodes there are (n-1)!/2 ("n-1 factorial divided by 2") possible tours. n! (or "n factorial") is n(n-1)(n-2)(n-3)...(2)(1). So for example 10! = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1. You can confirm the formula with the examples above: 5 nodes = (5-1)! / 2 = 12.
Discuss the following with the students.
Computers work fast, but they have limits. In computer science, we have an actual mathematical hard line between reasonable and unreasonable runtimes.
Reasonable means the number of things the computer has to do is proportional to the size of the input to the problem. For example, the Minimum Spanning Tree and Shortest Path Problems are "reasonable" because they had algorithms that solved them by considering every edge in the graph once. The amount of time it takes is proportional to the number of edges. If the number of edges is n, even if there was an algorithm that had to look at the edge n^2 times, or n^3 times, that's still reasonable.
Unreasonable means the number of things the computer has to do grows as an exponent of the size of the input. So if you discovered that an algorithm made the computer do 2^n things, that's not reasonable, because it means every time the size of the input (n) gets bigger, the solution gets massively further out of reach. n! is another running time that is considered unreasonable. In real life, "unreasonable" problems would take a modern computer trillions and trillions of years to churn through all the possibilities.
So the brute force solution to TSP is unreasonable -- at least as far as we know! In fact, it's an open question how to solve this problem efficiently. If anyone finds a solution that runs in a reasonable time, a lot of security and encryption algorithms are in trouble, because many are based on the fact that this (and related problems) are unreasonable to solve.
Hard problems like this one can be used to create public keys that are tough for computers to crack.
Ask the students to think about the following: If the TSP is unsolvable for finding an exact solution, how do you think package delivery companies optimize their delivery routes?
Do a quick Google search on the Traveling Salesperson Problem, and explore the many many interesting resources you can find.