The Traveling Salesperson Problem¶

Consider theTraveling Salesperson Problem:

Given a set of cities and the distances between each pair of cities, what is the shortest possible tour that visits each city exactly once, and returns to the starting city?

In this notebook we will develop some solutions to the problem, and more generally showhow to think about solving a problem like this.Elsewhere you can read about how the algorithms developed here are used in serious applications that millions of people rely on every day.

$No description has been provided for this image$

An example tour.

Understanding What We're Talking About (Vocabulary)¶

Do we understand precisely what the problem is asking? Do we understand all the concepts that the problem talks about? Do we understand them well enough to implement them in a programming language? Let's take a first pass:

A set of cities: We will need to represent a set of cities; Python'sset datatype might be appropriate.
Distance between each pair of cities: IfA andB are cities, this could be a function,distance(A, B), or a table lookup,distance[A][B]. The resulting distance will be a real number.
City: All we have to know about an individual city is how far it is from other cities. We don't have to know its name, population, best restaurants, or anything else. So a city could be just an integer (0, 1, 2, ...) used as an index into a distance table, or a city could be a pair of (x, y) coordinates, if we are using straight-line distance on a plane.
Tour: A tour is a specified order in which to visit the cities; Python'slist ortuple datatypes would work. For example, given the set of cities{A, B, C, D}, a tour might be the list[B, D, A, C], which means to travel fromB toD toA toC and finally back toB.
Shortest possible tour: The shortest tour is the one whose tour length is the minimum of all tours.
Tour length: The sum of the distances between adjacent cities in the tour (including the last city back to the first city). Probably a function,tour_length(tour).
What is ...: We can define a function to answer the questionwhat is the shortest possible tour? The function takes a set of cities as input and returns a tour as output. I will use the convention that any such function will have a name ending in the letters "tsp", the traditional abbreviation for Traveling Salesperson Problem.

At this stage I have a rough sketch of how to attack the problem. I don't have all the answers, and I haven't committed to specific representations for all the concepts, but I know what all the pieces are, and I don't see anything that stops me from proceeding.

Here are the imports used throughout this notebook. I'm assuming Python 3.

In [1]:

%matplotlib inlineimportmatplotlib.pyplotaspltimportrandomimporttimeimportitertoolsimporturllibimportcsvimportfunctoolsfromstatisticsimportmean,stdev

All Tours Algorithm:`alltours_tsp`¶

Let's start with an algorithm that is guaranteed to solve the problem, although it is inefficient for large sets of cities:

All Tours Algorithm:Generate all possible tours of the cities, and choose the shortest tour (the one with minimum tour length).

My design philosophy is to first write an English description of the algorithm, then write Python code that closely mirrors the English description. This will probably require some auxilliary functions and data structures; just assume they exist; put them on a TO DO list, and eventually define them with the same design philosophy.

Here is the start of the implementation:

In [2]:

defalltours_tsp(cities):"Generate all possible tours of the cities and choose the shortest tour."returnshortest_tour(alltours(cities))defshortest_tour(tours):"Choose the tour with the minimum tour length."returnmin(tours,key=tour_length)# TO DO: Data types: cities, tours, Functions: alltours, tour_length

Note: In Pythonmin(collection,key=function) means to find the elementx that is a member ofcollection such thatfunction(x) is minimized. Soshortest finds the tour whosetour_length in the minimal among the tours.

This gives us a good start; the Python code closely matches the English description. And we know what we need to do next: represent cities and tours, and implement the functionsalltours andtour_length. Let's start with tours.

Representing Tours¶

A tour starts in one city, and then visits each of the other cities in order, before returning to the start city. A natural representation of a tour is a sequence of cities. For example(1, 2, 3) could represent a tour that starts in city 1, moves to 2, then 3, and finally returns to 1.

Note: I considered using(1, 2, 3, 1) as the representation of this tour. I also considered an ordered list ofedges between cities:((1, 2), (2, 3), (3, 1)). In the end, I decided(1, 2, 3) was simplest.

Now for thealltours function. If a tour is a sequence of cities, then all the tours arepermutations of the set of all cities. A function to generate all permutations of a set is already provided in Python's standarditertools library module; we can use it as our implementation ofalltours:

In [3]:

alltours=itertools.permutations

Forn cities there aren! (that is, the factorial ofn) permutations.Here's are all 3! = 6 tours of 3 cities:

In [4]:

cities={1,2,3}list(alltours(cities))

Out[4]:

[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]

The length of a tour is the sum of the lengths of each edge in the tour; in other words, the sum of the distances between consecutive cities in the tour, including the distance form the last city back to the first:

In [5]:

deftour_length(tour):"The total of distances between each pair of consecutive cities in the tour."returnsum(distance(tour[i],tour[i-1])foriinrange(len(tour)))# TO DO: Functions: distance, Data types: cities

Note: I use one Python-specific trick: wheni is 0, thendistance(tour[0], tour[-1]) gives us the wrap-around distance between the first and last cities, becausetour[-1] is the last element oftour.

Representing Cities¶

We determined that the only thing that matters about cities is the distance between them. But before we can decide about how to represent cities, and before we can definedistance(A, B), we have to make a choice. In the fully general version of the TSP, the "distance" between two cities could be anything: it could factor in the amount of time it takes to travel between cities, the twistiness of the road, or anything else. Thedistance(A, B) might be different fromdistance(B, A). So the distances could be represented by a matrixdistance[A][B], where any entry in the matrix could be any (non-negative) numeric value.

But we will ignore the fully general TSP and concentrate on an important special case, theEuclidean TSP, where the distance between any two cities is theEuclidean distance, the straight-line distance between points in a two-dimensional plane. So a city can be represented by a two-dimensional point: a pair ofx andy coordinates. We will use the constructor functionCity, so thatCity(300, 0) creates a city with x-coordinate of 300 and y coordinate of 0. Thendistance(A, B) will be a function that uses thex andy coordinates to compute the distance betweenA andB.

Representing Points and Computing`distance`¶

OK, so a city can be represented as just a two-dimensional point. But how will we represent points? Here are some choices, with their pros and cons:

tuple: A point is a two-tuple of (x,y) coordinates, for example,(300, 0).Pro: Very simple.Con: doesn't distinguish Points from other two-tuples.
class: Define a customPoint class withx andy slots.Pro: explicit, gives usp.x andp.y accessors.Con: less efficient.
complex: Python already has the two-dimensional point as a built-in numeric data type, but in a non-obvious way: ascomplex numbers, which inhabit the two-dimensional (real × imaginary) plane.Pro: efficient.Con: a little confusing; doesn't distinguish Points from other complex numbers.
subclass of complex: All the pros ofcomplex, and eliminating the major con.

Any of these choices would work perfectly well; I decided to use a subclass ofcomplex:

In [6]:

# Cities are represented as Points, which are a subclass of complex numbersclassPoint(complex):x=property(lambdap:p.real)y=property(lambdap:p.imag)City=Pointdefdistance(A,B):"The distance between two points."returnabs(A-B)

Here's an example of computing the distance between two cities:

In [7]:

A=City(3,0)B=City(0,4)distance(A,B)

Out[7]:

5.0

Random Sets of Cities¶

The input to a TSP algorithm should be a set of cities. I can make a random set ofn cities by callingCityn times, each with different randomx andy coordinates:

In [8]:

{City(random.randrange(1000),random.randrange(1000))forcinrange(6)}

Out[8]:

{(193+375j), (427+384j), (497+585j), (179+546j), (224+543j), (245+643j)}

The functionCities does that (and a bit more):

In [9]:

defCities(n,width=900,height=600,seed=42):"Make a set of n cities, each with random coordinates within a (width x height) rectangle."random.seed(seed*n)returnfrozenset(City(random.randrange(width),random.randrange(height))forcinrange(n))

There are three complications that I decided to tackle inCities:

IPython's matplotlib plots (by default) in a rectangle that is 1.5 times wider than it is high; that's why I specified a width of 900 and a height of 600. If you want the coordinates of your cities to be bounded by a different size rectangle, you can change width or height.
Sometimes I wantCities(n) to be a true function, returning the same result each time. This is very helpful for getting repeatable results: if I run a test twice, I get the same results twice.But other times I would like to be able to do an experiment, where, for example, I callCities(n) 30 times and get 30 different sets, and I then compute the average tour length produced by my algorithm across these 30 sets. Can I get both behaviors out of one function?Yes! The trick is the additional optional parameter,seed. Two calls toCities with the samen andseed parameters will always return the same set of cities, and two calls with different values forseed will return different sets. This is implemented by calling the functionrandom.seed, which resets the random number generator.
Once I create a set of Cities, I don't want anyone messing with my set. For example, I don't want an algorithm that claims to "solve" a problem by deleting half the cities from the input set, then finding a tour of the remaining cities. Therefore, I makeCities return afrozenset rather than aset. Afrozenset isimmutable; nobody can change it once it is created. (Likewise, each city is immutable.)

For example:

In [10]:

# A set of 5 citiesCities(5)

Out[10]:

frozenset({(172+20j), (234+40j), (696+415j), (393+7j), (671+296j)})

In [11]:

# The exact same set of 5 cities each time[Cities(5)foriinrange(3)]

Out[11]:

[frozenset({(172+20j), (234+40j), (696+415j), (393+7j), (671+296j)}), frozenset({(172+20j), (234+40j), (696+415j), (393+7j), (671+296j)}), frozenset({(172+20j), (234+40j), (696+415j), (393+7j), (671+296j)})]

In [12]:

# A different set of 5 cities each time[Cities(5,seed=i)foriinrange(3)]

Out[12]:

[frozenset({(414+310j), (776+430j), (41+265j), (864+394j), (523+497j)}), frozenset({(814+542j), (29+476j), (637+261j), (759+367j), (794+255j)}), frozenset({(439+494j), (211+473j), (585+33j), (832+503j), (591+15j)})]

Now we are ready to apply thealltours_tsp function to find the shortest tour:

In [13]:

alltours_tsp(Cities(8))

Out[13]:

((6+546j), (199+147j), (350+65j), (737+26j), (847+187j), (891+465j), (554+374j), (505+548j))

In [14]:

tour_length(alltours_tsp(Cities(8)))

Out[14]:

2509.307587720301

Quick, is that the right answer? I have no idea, and you probably can't tell either. But if we couldplot the tour we'd understand it better and might be able to see at a glance if the tour is optimal.

Plotting Tours¶

I defineplot_tour(tour) to plot the cities (as circles) and the tour (as lines):

In [15]:

defplot_tour(tour):"Plot the cities as circles and the tour as lines between them."plot_lines(list(tour)+[tour[0]])defplot_lines(points,style='bo-'):"Plot lines to connect a series of points."plt.plot([p.xforpinpoints],[p.yforpinpoints],style)plt.axis('scaled');plt.axis('off')

In [16]:

plot_tour(alltours_tsp(Cities(8)))

No description has been provided for this image

That looks much better! To me, it looks like the shortest possible tour, although I don't have an easy way to prove it. Let's go one step further and define a function,plot_tsp(algorithm, cities) that will take a TSP algorithm (such asalltours_tsp) and a set of cities, apply the algorithm to the cities to get a tour, check that the tour is reasonable, plot the tour, and print information about the length of the tour and the time it took to find it:

In [17]:

defplot_tsp(algorithm,cities):"Apply a TSP algorithm to cities, plot the resulting tour, and print information."# Find the solution and time how long it takest0=time.clock()tour=algorithm(cities)t1=time.clock()assertvalid_tour(tour,cities)plot_tour(tour);plt.show()print("{} city tour with length{:.1f} in{:.3f} secs for{}".format(len(tour),tour_length(tour),t1-t0,algorithm.__name__))defvalid_tour(tour,cities):"Is tour a valid tour for these cities?"returnset(tour)==set(cities)andlen(tour)==len(cities)

In [18]:

plot_tsp(alltours_tsp,Cities(8))

8 city tour with length 2509.3 in 0.110 secs for alltours_tsp

All Non-Redundant Tours Algorithm (improved`alltours_tsp`)¶

We said there aren! tours ofn cities, and thus 6 tours of 3 cities:

In [19]:

list(alltours({1,2,3}))

Out[19]:

[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]

But this is redundant:(1, 2, 3),(2, 3, 1), and(3, 1, 2) are three ways of describing the same tour. So let's arbitrarily say that all tours must start with the first city in the set of cities. We'll just pull the first city out, and then tack it back on to all the permutations of the rest of the cities.

While we're re-assembling a tour from the start city and the rest, we'll take the opportunity to construct the tour as alist rather than atuple. It doesn't matter much now, but later on we will want to representpartial tours, to which we will want to append cities one by one; appending can only be done to lists, not tuples.

In [20]:

defalltours(cities):"Return a list of tours, each a permutation of cities, but each one starting with the same city."start=first(cities)return[[start]+Tour(rest)forrestinitertools.permutations(cities-{start})]deffirst(collection):"Start iterating over collection, and return the first element."returnnext(iter(collection))Tour=list# Tours are implemented as lists of cities

We can verify that for 3 cities there are now only 2 tours (not 6) and for 4 cities there are 6 tours (not 24):

In [21]:

alltours({1,2,3})

Out[21]:

[[1, 2, 3], [1, 3, 2]]

In [22]:

alltours({1,2,3,4})

Out[22]:

[[1, 2, 3, 4], [1, 2, 4, 3], [1, 3, 2, 4], [1, 3, 4, 2], [1, 4, 2, 3], [1, 4, 3, 2]]

Note: We could say that there is only one tour of three cities, because[1, 2, 3] and[1, 3, 2] are in some sense the same tour, one going clockwise and the other counterclockwise. However, I choose not to do that, for two reasons. First, it would mean we can never handle TSP problems where the distance from A to B is different from B to A. Second, it would complicate the code (if only by a line or two).

We can verify that callingalltours_tsp(Cities(8)) still works and gives the same tour with the same total distance. But it now runs faster:

In [23]:

plot_tsp(alltours_tsp,Cities(8))

8 city tour with length 2509.3 in 0.018 secs for alltours_tsp

Now let's try a much harder 10-city tour:

In [24]:

plot_tsp(alltours_tsp,Cities(10))

10 city tour with length 2291.8 in 1.636 secs for alltours_tsp

Complexity of`alltours_tsp`¶

It takes about 2 seconds on my machine to solve this 10-city problem. In general, the functionTSP looks at (n-1)! tours for ann-city problem, and each tour hasn cities, so the total time required forn cities should be roughly proportional ton!. This means that the time grows rapidly with the number of cities.Really rapidly. This table shows the actual time for solving a 10 city problem, and the exepcted time for solving larger problems:

n	expected time for `alltours_tsp(Cities(n))`
10	Covering 10! tours = 2 secs
11	2 secs × 11! / 10! ≈ 22 secs
12	2 secs × 12! / 10! ≈ 4 mins
14	2 secs × 14! / 10! ≈ 13 hours
16	2 secs × 16! / 10! ≈ 200 days
18	2 secs × 18! / 10! ≈ 112 years
25	2 secs × 25! / 10! ≈270 billion years

There must be a better way ...

Approximate Algorithms¶

What if we are willing to settle for a tour that is short, but not guaranteed to be shortest? Then we can save billions of years of compute time: we will show severalapproximate algorithms, which find tours that are typically within 10% of the shortest possible tour, and can handle thousands of cities in a few seconds. (Note: There are more sophisticated approximate algorithms that can handle hundreds of thousands of cities and come within 0.01% or better of the shortest possible tour.)

So how do we come up with an approximate algorithm? Here are two general plans of how to create a tour:

Nearest Neighbor Algorithm: Make the tour go from a city to its nearest neighbor. Repeat.
Greedy Algorithm: Find the shortest distance between any two cities and include that edge in the tour. Repeat.

We will expand these ideas into full algorithms.

In addition, here are four very general strategies that apply not just to TSP, but to any optimization problem. Anoptimization problem is one in which the goal is to find a solution that is best (or near-best) according to some metric,out of a pool of many candidate solutions. The strategies are:

Repetition Strategy: Take some algorithm and re-run it multiple times, varying some aspect each time, and take the solution with the best score.
Alteration Strategy: Use some algorithm to create a solution, then make small changes to the solution to improve it.
Ensemble Strategy: Take two or more algorithms, apply all of them to the problem, and pick the best solution.

And here are two more strategies that work for a wide variety of problems:

Divide and Conquer: Split the input in half, solve the problem for each half, and then combine the two partial solutions.
Stand on the Shoulders of GiantsorJust Google It: Find out what others have done in the past, and either copy it or build on it.

Nearest Neighbor Algorithm:`nn_tsp`¶

Here is a description of the nearest neighbor algorithm:

Nearest Neighbor Algorithm:Start at any city; at each step extend the tour by moving from the previous city to its nearest neighbor that has not yet been visited.

So now, instead of considering alln! tours, we are generating a single tour. It takes O(n² ) time to find the tour, because it hasn-1 steps, and at each step we consider each of the remaining cities.I implement the algorithm as follows:

"Start at any city": arbitrarily pick the first city.
"extend the tour": append to the end of a list of cities.
"by moving from the previous city": previous city istour[-1].
"to its nearest neighbor": I will define the functionnearest_neighbor.
"that has not yet been visited": I will keep a set ofunvisited cities.

That gives us:

In [25]:

defnn_tsp(cities):"""Start the tour at the first city; at each step extend the tour    by moving from the previous city to the nearest neighboring city, C,    that has not yet been visited."""start=first(cities)tour=[start]unvisited=set(cities-{start})whileunvisited:C=nearest_neighbor(tour[-1],unvisited)tour.append(C)unvisited.remove(C)returntourdefnearest_neighbor(A,cities):"Find the city in cities that is nearest to city A."returnmin(cities,key=lambdac:distance(c,A))

Note: In Python, as in the formal mathematical theory of computability,lambda (or λ) is the symbol forfunction, so "lambda c: distance(c, A)" means the function ofc that computes the distance fromc to the cityA.

We can compare the the slow (but optimal)alltours_tsp algorithm to the new fast (but approximate)nn_tsp algorithm:

In [26]:

plot_tsp(alltours_tsp,Cities(10))

10 city tour with length 2291.8 in 1.681 secs for alltours_tsp

In [27]:

plot_tsp(nn_tsp,Cities(10))

10 city tour with length 2381.4 in 0.000 secs for nn_tsp

So the nearest neighbor algorithm is a lot faster, but it didn't find the shortest tour. To understand where it went wrong, it would be helpful to know what city it started from. I can modifyplot_tour by adding one line of code to highlight the start city with a red square:

In [28]:

defplot_tour(tour):"Plot the cities as circles and the tour as lines between them. Start city is red square."start=tour[0]plot_lines(list(tour)+[start])plot_lines([start],'rs')# Mark the start city with a red square

In [29]:

plot_tsp(nn_tsp,Cities(10))

10 city tour with length 2381.4 in 0.000 secs for nn_tsp

We can see that the tour moves clockwise from the start city, and mostly makes good decisions, but not optimal ones.

We can compare the performance of these two algorithms on, say, eleven different sets of cities instead of just one:

In [30]:

deflength_ratio(cities):"The ratio of the tour lengths for nn_tsp and alltours_tsp algorithms."returntour_length(nn_tsp(cities))/tour_length(alltours_tsp(cities))sorted(length_ratio(Cities(8,seed=i*i))foriinrange(11))

Out[30]:

[1.0, 1.0, 1.0, 1.0, 1.0118279018107388, 1.0121039193389436, 1.107851821362778, 1.139713084817861, 1.1531140497779002, 1.1972133336642432, 1.2160497559961319]

The ratio of1.0 means the two algorithms got the same (optimal) result; that happened 4 times out of 10. The other times, we see that thenn_tsp produces a longer tour, by anything up to 21% worse, with a median of 1% worse.

But more important than that 1% (or even 21%) difference is that the nearest neighbor algorithm can quickly tackle problems that the all tours algorithm can't touch in the lifetime of the universe. Finding a tour of 1000 cities takes well under a second:

In [31]:

plot_tsp(nn_tsp,Cities(1000))

1000 city tour with length 21275.9 in 0.145 secs for nn_tsp

Can we do better? Can we combine the speed of the nearest neighbor algorithm with the optimality of the all tours algorithm?

Let's consider wherenn_tsp can go wrong. At the end ofplot_tsp(nn_tsp, Cities(10)), we see a very long edge, because there are no remaining cities near by. In a way, this just seems like bad luck—we started in a place that left us with no good choices at the end. Just as with buying lottery tickets, we could improve our chance of winning by trying more often; in other words, by using therepetition strategy.

Repeated Nearest Neighbor Algorithm:`repeated_nn_tsp`¶

Here is an easy way to apply therepetition strategy to improvenearest neighbors:

Repeated Nearest Neighbor Algorithm:For each of the cities, run the nearest neighbor algorithm with that city as the starting point, and choose the resulting tour with the shortest total distance.

So, withn cities we could run thenn_tsp algorithmn times, regrettably making the total run timen times longer, but hopefully making at least one of then tours shorter.

To implementrepeated_nn_tsp we just take the shortest tour over all starting cities:

In [32]:

defrepeated_nn_tsp(cities):"Repeat the nn_tsp algorithm starting from each city; return the shortest tour."returnshortest_tour(nn_tsp(cities,start)forstartincities)

To do that requires a modification ofnn_tsp so that thestart city can be specified as an optional argument:

In [33]:

defnn_tsp(cities,start=None):"""Start the tour at the first city; at each step extend the tour    by moving from the previous city to its nearest neighbor    that has not yet been visited."""ifstartisNone:start=first(cities)tour=[start]unvisited=set(cities-{start})whileunvisited:C=nearest_neighbor(tour[-1],unvisited)tour.append(C)unvisited.remove(C)returntour

In [34]:

# Compare nn_tsp to repeated_nn_tspplot_tsp(nn_tsp,Cities(100))plot_tsp(repeated_nn_tsp,Cities(100))

100 city tour with length 6734.1 in 0.002 secs for nn_tsp

100 city tour with length 5912.6 in 0.157 secs for repeated_nn_tsp

We see thatrepeated_nn_tsp does indeed take longer to run, and yields a tour that is shorter.

Let's try again with a smaller map that makes it easier to visualize the tours:

In [35]:

forfin[nn_tsp,repeated_nn_tsp,alltours_tsp]:plot_tsp(f,Cities(10))

10 city tour with length 2381.4 in 0.000 secs for nn_tsp

10 city tour with length 2297.7 in 0.000 secs for repeated_nn_tsp

10 city tour with length 2291.8 in 1.619 secs for alltours_tsp

This time therepeated_nn_tsp gives us a tour that is better thannn_tsp, but not quite optimal. So, it looks like repetition is helping. But if I want to tackle 1000 cities, I don't really want the run time to be 1000 times slower. I'd like a way to moderate the repetition—to repeat thenn_tsp starting froma sample of the cities but notall the cities.

Sampled Repeated Nearest Neighbor Algorithm: revised`repeated_nn_tsp`¶

We can giverepeated_nn_tsp an optional argument specifying the number of different cities to try starting from. We will implement the functionsample to draw a random sample of the specified size from all the cities. Most of the work is done by the standard library functionrandom.sample. What oursample adds is the same thing we did with the functionCities: we ensure that the function returns the same result each time for the same arguments, but can return different results if aseed parameter is passed in. (In addition, if the sample size,k isNone or is larger than the population, then return the whole population.)

In [36]:

defrepeated_nn_tsp(cities,repetitions=100):"Repeat the nn_tsp algorithm starting from specified number of cities; return the shortest tour."returnshortest_tour(nn_tsp(cities,start)forstartinsample(cities,repetitions))defsample(population,k,seed=42):"Return a list of k elements sampled from population. Set random.seed with seed."ifkisNoneork>len(population):returnpopulationrandom.seed(len(population)*k*seed)returnrandom.sample(population,k)

Let's compare with 1, 10, and 100 starting cities on a 300 city map:

In [37]:

defrepeat_10_nn_tsp(cities):returnrepeated_nn_tsp(cities,10)defrepeat_100_nn_tsp(cities):returnrepeated_nn_tsp(cities,100)

In [38]:

plot_tsp(nn_tsp,Cities(300))plot_tsp(repeat_10_nn_tsp,Cities(300))plot_tsp(repeat_100_nn_tsp,Cities(300))

299 city tour with length 12752.7 in 0.014 secs for nn_tsp

299 city tour with length 12070.6 in 0.127 secs for repeat_10_nn_tsp

299 city tour with length 11598.6 in 1.285 secs for repeat_100_nn_tsp

As we add more starting cities, the run times get longer and the tours get shorter.

I'd like to understand the tradefoff better. I'd like to have a way to compare different algorithms (or different choices of parameters for one algorithm) overmultiple trials, and summarize the results. That means we now have a new vocabulary item:

New Vocabulary: "Maps"¶

We use the termcities and the functionCities to denote a set of cities. But now I want to talk about multiple trials over a collection of sets of cities: a plural of a plural. English doesn't give us a good way to do that, so it would be nice to have asingular noun that is a synonym for "set of cities." We'll use the term "map" for this, and the functionMaps to create a collection of maps. Just likeCities, the functionMaps will give the same result every time it is called with the same arguments.

In [39]:

defMaps(num_maps,num_cities):"Return a tuple of maps, each consisting of the given number of cities."returntuple(Cities(num_cities,seed=(m,num_cities))forminrange(num_maps))

Benchmarking¶

The termbenchmarking means running a function on a standard collection of inputs, in order to compare its performance. We'll define a general-purpose function,benchmark, which takes a function and a collection of inputs for that function, and runs the function on each of the inputs. It then returns two values: the average time taken per input, and the list of results of the function.

In [40]:

@functools.lru_cache(None)defbenchmark(function,inputs):"Run function on all the inputs; return pair of (average_time_taken, results)."t0=time.clock()results=[function(x)forxininputs]t1=time.clock()average_time=(t1-t0)/len(inputs)return(average_time,results)

Note: Each time we develop a new algorithm, we would like to compare its performance to some standard old algorithms.The use of@functools.lru_cache here means that we don't need to to re-run the old algorithms on a standard data set each time; we can just cache the old results.

We can usebenchmark to see the average call to the absolute value function takes less than a microsecond:

In [41]:

benchmark(abs,range(-10,10))

Out[41]:

(5.00000000069889e-07, [10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

And we can see thatalltours_tsp can handle 6-city maps in under a millisecond each:

In [42]:

benchmark(alltours_tsp,Maps(10,6))

Out[42]:

(0.00032370000000003785, [[(574+214j), (236+141j), (556+348j), (677+277j), (833+33j), (578+224j)],  [(433+6j), (396+143j), (527+431j), (167+227j), (113+100j), (127+105j)],  [(571+206j), (724+42j), (703+269j), (797+331j), (543+474j), (310+248j)],  [(12+30j), (344+45j), (693+77j), (548+186j), (279+508j), (171+229j)],  [(243+271j), (379+72j), (859+331j), (840+411j), (651+478j), (8+369j)],  [(672+502j), (820+460j), (887+489j), (853+65j), (422+69j), (433+135j)],  [(38+119j), (644+90j), (622+288j), (602+511j), (509+424j), (275+536j)],  [(18+208j), (832+456j), (483+477j), (314+533j), (314+539j), (23+596j)],  [(274+560j), (213+594j), (248+84j), (550+317j), (508+577j), (377+575j)],  [(813+467j), (438+216j), (270+118j), (71+18j), (125+320j), (199+578j)]])

Benchmarking Specifically for TSP Algorithms¶

Now let's add another function,benchmarks, which builds onbenchmark in two ways:

It compares multiple algorithms, rather than just running one algorithm. (Hence the pluralbenchmarks.)
It is specific toTSP algorithms, and rather than returning results, it prints summary statistics: the mean, standard deviation, min, and max of tour lengths, as well as the time taken and the number and size of the sets of cities.

In [43]:

defbenchmarks(tsp_algorithms,maps=Maps(30,60)):"Print benchmark statistics for each of the algorithms."fortspintsp_algorithms:time,results=benchmark(tsp,maps)lengths=[tour_length(r)forrinresults]print("{:>25} |{:7.0f} ±{:4.0f} ({:5.0f} to{:5.0f}) |{:7.3f} secs/map |{} ⨉{}-city maps".format(tsp.__name__,mean(lengths),stdev(lengths),min(lengths),max(lengths),time,len(maps),len(maps[0])))

How Many Starting Cities is best for`nn_tsp`?¶

Now we are in a position to gain some insight into how many repetitions, or starting cities, we need to get a good result fromnn_tsp.

In [44]:

defrepeat_25_nn_tsp(cities):returnrepeated_nn_tsp(cities,25)defrepeat_50_nn_tsp(cities):returnrepeated_nn_tsp(cities,50)

In [45]:

algorithms=[nn_tsp,repeat_10_nn_tsp,repeat_25_nn_tsp,repeat_50_nn_tsp,repeat_100_nn_tsp]benchmarks(algorithms,Maps(30,60))

                   nn_tsp |   5668 ± 488 ( 4674 to  6832) |  0.001 secs/map | 30 ⨉ 60-city maps         repeat_10_nn_tsp |   5232 ± 374 ( 4577 to  6172) |  0.006 secs/map | 30 ⨉ 60-city maps         repeat_25_nn_tsp |   5159 ± 394 ( 4620 to  6069) |  0.014 secs/map | 30 ⨉ 60-city maps         repeat_50_nn_tsp |   5118 ± 386 ( 4512 to  6069) |  0.029 secs/map | 30 ⨉ 60-city maps        repeat_100_nn_tsp |   5113 ± 384 ( 4512 to  6069) |  0.034 secs/map | 30 ⨉ 60-city maps

We see that adding more starting cities results in shorter tours, but you start getting diminishing returns after 50 repetitions.

Let's try again with bigger maps:

In [46]:

benchmarks(algorithms,Maps(30,120))

                   nn_tsp |   7789 ± 458 ( 6877 to  8632) |  0.002 secs/map | 30 ⨉ 120-city maps         repeat_10_nn_tsp |   7316 ± 334 ( 6646 to  7870) |  0.021 secs/map | 30 ⨉ 120-city maps         repeat_25_nn_tsp |   7242 ± 287 ( 6725 to  7870) |  0.053 secs/map | 30 ⨉ 120-city maps         repeat_50_nn_tsp |   7189 ± 295 ( 6646 to  7742) |  0.106 secs/map | 30 ⨉ 120-city maps        repeat_100_nn_tsp |   7173 ± 289 ( 6646 to  7736) |  0.213 secs/map | 30 ⨉ 120-city maps

In [47]:

benchmarks(algorithms,Maps(30,150))

                   nn_tsp |   8668 ± 485 ( 7183 to  9636) |  0.003 secs/map | 30 ⨉ 150-city maps         repeat_10_nn_tsp |   8220 ± 364 ( 7290 to  9197) |  0.033 secs/map | 30 ⨉ 150-city maps         repeat_25_nn_tsp |   8117 ± 326 ( 7222 to  8918) |  0.083 secs/map | 30 ⨉ 150-city maps         repeat_50_nn_tsp |   8086 ± 300 ( 7237 to  8676) |  0.166 secs/map | 30 ⨉ 150-city maps        repeat_100_nn_tsp |   8062 ± 284 ( 7174 to  8603) |  0.331 secs/map | 30 ⨉ 150-city maps

The results are similar. So depending on what your priorities are (run time versus tour length), somewhere around 25 or 50 repetitions might be a good tradeoff.

Next let's try to analyze where nearest neighbors goes wrong, and see if we can do something about it.

A Problem with Nearest Neighbors: Outliers¶

Consider the 20-city map that we build below:

In [48]:

outliers_list=[City(2,2),City(2,3),City(2,4),City(2,5),City(2,6),City(3,6),City(4,6),City(5,6),City(6,6),City(6,5),City(6,4),City(6,3),City(6,2),City(5,2),City(4,2),City(3,2),City(1,6.8),City(7.8,6.4),City(7,1.2),City(0.2,2.8)]outliers=set(outliers_list)plot_lines(outliers,'bo')

Let's see what a nearest neighbor search does on this map:

In [49]:

plot_tsp(nn_tsp,outliers)

20 city tour with length 38.8 in 0.000 secs for nn_tsp

The tour starts out going around the inner square. But then we are left with long lines to pick up the outliers.Let's try to understand what went wrong. First we'll create a new tool to draw better diagrams:

In [50]:

defplot_labeled_lines(points,*args):"""Plot individual points, labeled with an index number.    Then, args describe lines to draw between those points.    An arg can be a matplotlib style, like 'ro--', which sets the style until changed,    or it can be a list of indexes of points, like [0, 1, 2], saying what line to draw."""# Draw points and label them with their index numberplot_lines(points,'bo')for(label,p)inenumerate(points):plt.text(p.x,p.y,'  '+str(label))# Draw lines indicated by argsstyle='bo-'forarginargs:ifisinstance(arg,str):style=argelse:# arg is a list of indexes into points, forming a lineXs=[points[i].xforiinarg]Ys=[points[i].yforiinarg]plt.plot(Xs,Ys,style)plt.axis('scaled');plt.axis('off');plt.show()

In the diagram below, imagine we are running a nearest neighbor algorithm, and it has created a partial tour from city 0 to city 4. Now there is a choice. City 5 is the nearest neighbor. But if we don't take city 16 at this point, we will have to pay a higher price sometime later to pick up city 16.

In [51]:

plot_labeled_lines(outliers_list,'bo-',[0,1,2,3,4],'ro--',[4,16],'bo--',[4,5])

It seems that picking up an outlier issometimes a good idea, but sometimes going directly to the nearest neighbor is a better idea. So what can we do? It is difficult to make the choice between an outlier and a nearest neighbor while we are constructing a tour, because we don't have the context of the whole tour yet. So here's an alternative idea: don't try to make the right choice while constructing the tour; just go ahead and make any choice, then when the tour is complete,alter it to correct problems caused by outliers (or other causes).

New Vocabulary: "Segment"¶

We'll define asegment as a subsequence of a tour: a sequence of consecutive cities within a tour. A tour forms a loop, but a segment does not have a loop; it is open-ended on both ends. So, if[A, B, C, D] is a 4-city tour, then segments include[A, B],[B, C, D], and many others. Note that the segment[A, B, C, D] is different than the tour[A, B, C, D]; the tour returns fromD toA but the segment does not.

Altering Tours by Reversing Segments¶

One way we could try to improve a tour is byreversing a segment. Consider this tour:

In [52]:

cross=[City(9,3),City(3,10),City(2,16),City(3,21),City(9,28),City(26,3),City(32,10),City(33,16),City(32,21),City(26,28)]plot_labeled_lines(cross,range(-1,10))

This is clearly not an optimal tour. We should "uncross" the lines, which can be achieved by reversing a segment. The tour as it stands is[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. If we reverse the segment[5, 6, 7, 8, 9], we get the tour[0, 1, 2, 3, 4, 9, 8, 7, 6, 5], which is the optimal tour. In the diagram below, reversing[5, 6, 7, 8, 9] is equivalent to deleting the red dashed lines and adding the green dotted lines. If the sum of the lengths of the green dotted lines is less than the sum of the lengths of the red dashed lines, then we know the reversal is an improvement.

In [53]:

plot_labeled_lines(cross,'bo-',range(5),range(5,10),'g:',(4,9),(0,5),'r--',(4,5),(0,9))

Here we see that reversing[5, 6, 7, 8, 9] works:

In [54]:

tour=Tour(cross)tour[5:10]=reversed(tour[5:10])plot_tour(tour)

Here is how we can check if reversing a segment is an improvement, and if so to do it:

In [55]:

defreverse_segment_if_better(tour,i,j):"If reversing tour[i:j] would make the tour shorter, then do it."# Given tour [...A-B...C-D...], consider reversing B...C to get [...A-C...B-D...]A,B,C,D=tour[i-1],tour[i],tour[j-1],tour[j%len(tour)]# Are old edges (AB + CD) longer than new ones (AC + BD)? If so, reverse segment.ifdistance(A,B)+distance(C,D)>distance(A,C)+distance(B,D):tour[i:j]=reversed(tour[i:j])

Now let's write a function,alter_tour, which finds segments to swap. What segments should we consider? I don't know how to be clever about the choice, but I do know how to be fairly thorough: try all segments of all lengths at all starting positions. I have an intuition that trying longer ones first is better (although I'm not sure).

I worry that even trying all segements won't be enough: after I reverse one segment, it might open up opportunities to reverse other segments. So, after trying all possible segments, I'll check the tour length. If it has been reduced, I'll go through thealter_tour process again.

In [56]:

defalter_tour(tour):"Try to alter tour for the better by reversing segments."original_length=tour_length(tour)for(start,end)inall_segments(len(tour)):reverse_segment_if_better(tour,start,end)# If we made an improvement, then try again; else stop and return tour.iftour_length(tour)<original_length:returnalter_tour(tour)returntourdefall_segments(N):"Return (start, end) pairs of indexes that form segments of tour of length N."return[(start,start+length)forlengthinrange(N,2-1,-1)forstartinrange(N-length+1)]

Here is what the list of all segments look like, for N=4:

In [57]:

all_segments(4)

Out[57]:

[(0, 4), (0, 3), (1, 4), (0, 2), (1, 3), (2, 4)]

We can see that altering the cross tour does straighten it out:

In [58]:

plot_tour(alter_tour(Tour(cross)))

Altered Nearest Neighbor Algorithm (`altered_nn_tsp`)¶

Let's see what happens when we alter the output ofnn_tsp:

In [59]:

defaltered_nn_tsp(cities):"Run nearest neighbor TSP algorithm, and alter the results by reversing segments."returnalter_tour(nn_tsp(cities))

Let's try this new algorithm on some test cases:

In [60]:

plot_tsp(altered_nn_tsp,set(cross))

10 city tour with length 93.2 in 0.000 secs for altered_nn_tsp

In [61]:

plot_tsp(altered_nn_tsp,Cities(10))

10 city tour with length 2333.4 in 0.000 secs for altered_nn_tsp

It fails to get the optimal result here. Let's try benchmarking:

In [62]:

algorithms=[nn_tsp,repeat_50_nn_tsp,altered_nn_tsp]benchmarks(algorithms)

                   nn_tsp |   5668 ± 488 ( 4674 to  6832) |  0.001 secs/map | 30 ⨉ 60-city maps         repeat_50_nn_tsp |   5118 ± 386 ( 4512 to  6069) |  0.029 secs/map | 30 ⨉ 60-city maps           altered_nn_tsp |   4820 ± 233 ( 4450 to  5346) |  0.008 secs/map | 30 ⨉ 60-city maps

This is quite encouraging;altered_nn_tsp gives shorter tours and is faster than repeating nearest neighbors from 50 starting cities. Could we do better?

Altered Repeated Nearest Neighbor Algorithm (`altered_repeated_nn_tsp`)¶

We have seen that thenearest neighbor algorithm is improved by both thealteration andrepetition strategies. So why not apply both strategies?

In [63]:

defrepeated_altered_nn_tsp(cities,repetitions=20):"Use alteration to improve each repetition of nearest neighbors."returnshortest_tour(alter_tour(nn_tsp(cities,start))forstartinsample(cities,repetitions))defrepeat_5_altered_nn_tsp(cities):returnrepeated_altered_nn_tsp(cities,5)

Let's see it in action:

In [64]:

plot_tsp(repeated_altered_nn_tsp,Cities(100))

100 city tour with length 5701.6 in 0.541 secs for repeated_altered_nn_tsp

That looks like a good tour. Let's gather more data:

In [65]:

algorithms=[nn_tsp,repeat_50_nn_tsp,altered_nn_tsp,repeated_altered_nn_tsp]benchmarks(algorithms)print('-'*100)benchmarks(algorithms,Maps(30,120))

                   nn_tsp |   5668 ± 488 ( 4674 to  6832) |  0.001 secs/map | 30 ⨉ 60-city maps         repeat_50_nn_tsp |   5118 ± 386 ( 4512 to  6069) |  0.029 secs/map | 30 ⨉ 60-city maps           altered_nn_tsp |   4820 ± 233 ( 4450 to  5346) |  0.008 secs/map | 30 ⨉ 60-city maps  repeated_altered_nn_tsp |   4640 ± 194 ( 4298 to  4991) |  0.148 secs/map | 30 ⨉ 60-city maps----------------------------------------------------------------------------------------------------                   nn_tsp |   7789 ± 458 ( 6877 to  8632) |  0.002 secs/map | 30 ⨉ 120-city maps         repeat_50_nn_tsp |   7189 ± 295 ( 6646 to  7742) |  0.106 secs/map | 30 ⨉ 120-city maps           altered_nn_tsp |   6589 ± 202 ( 6188 to  7016) |  0.036 secs/map | 30 ⨉ 120-city maps  repeated_altered_nn_tsp |   6402 ± 185 ( 6015 to  6779) |  0.701 secs/map | 30 ⨉ 120-city maps

So, alteration gives the most gain, but alteration plus repetition gives a modest improvement in tour length, at the cost of 20 times more run time.

Non-Random Maps¶

I thought it would be fun to work on somereal maps, instead of random maps. First I founda page that lists geographical coordinates of US cities. Here is an excerpt from that page:

[TCL]  33.23   87.62  Tuscaloosa,AL[FLG]  35.13  111.67  Flagstaff,AZ[PHX]  33.43  112.02  Phoenix,AZ

I also found ablog post by Randal S. Olson who chose 50 landmarks across the states and found a tour based on actual road-travel distances, not straight-line distance. His data looks like this:

Mount Rushmore National Memorial, South Dakota 244, Keystone, SD43.879102-103.459067Toltec Mounds, Scott, AR34.647037-92.065143Ashfall Fossil Bed, Royal, NE42.425000-98.158611

You can't see, but fields are separated by tabs in this data.

Now we have a problem: we have two similar but different data formats, and we want to convert both of them toMaps (sets of cities). Python provides a module,csv (for "comma-separated values"), to parse data like this. The functioncsv.reader takes an input that should be an iterable over lines of text, and optionally you can tell it what character to use as a delimiter (as well as several other options). For each line, it generates alist of fields. For example, for the line"[TCL] 33.23 87.62 Tuscaloosa,AL" it would generate the list['[TCL]', '33.23', '87.62', 'Tuscaloosa,AL'].

I define the functionCoordinate_map to take an iterable of lines (a file object or a list of strings), parse it withcsv_reader, pick out the latitude and longitude columns, and build aCity out of each one:

In [66]:

deflines(text):returntext.strip().splitlines()defCoordinate_map(lines,delimiter=' ',lat_col=1,long_col=2,lat_scale=69,long_scale=-48):"""Make a set of Cities from an iterable of lines of text.    Specify the column delimiter, and the zero-based column number of lat and long.    Treat long/lat as a square x/y grid, scaled by long_scale and lat_scale.    Source can be a file object, or list of lines."""returnfrozenset(City(long_scale*float(row[long_col]),lat_scale*float(row[lat_col]))forrowincsv.reader(lines,delimiter=delimiter,skipinitialspace=True))

You might be wondering about thelat_scale=69, long_scale=-48 part. The issue is that we have latitude and longitude for cities, and we want to compute the distance between cities. To do that accurately requirescomplicated trigonometry. But we can get an approximation by assuming the earth is flat, and that latitude and longitude are on a rectangular grid. (This is a bad approximation if you're talking about distances of 10,000 miles, but close enough for 100 miles, as long as you're not too close to the poles.) I took the latitude of the center of the country (Wichita, KS: latitude 37.65) and plugged it into aLength Of A Degree Of LatitudeAnd Longitude Calculator to find that, in Wichita, one degree of latitude is 69 miles, and one degree of longitude is 48 miles. (It is -48 rather than +48 because the US is west of the prime meridian.)

Now let's create the map of USA cities, and find a tour for it:

In [67]:

USA_map=Coordinate_map(lines("""[TCL]  33.23   87.62  Tuscaloosa,AL[FLG]  35.13  111.67  Flagstaff,AZ[PHX]  33.43  112.02  Phoenix,AZ[PGA]  36.93  111.45  Page,AZ[TUS]  32.12  110.93  Tucson,AZ[LIT]  35.22   92.38  Little Rock,AR[SFO]  37.62  122.38  San Francisco,CA[LAX]  33.93  118.40  Los Angeles,CA[SAC]  38.52  121.50  Sacramento,CA[SAN]  32.73  117.17  San Diego,CA[SBP]  35.23  120.65  San Luis Obi,CA[EKA]  41.33  124.28  Eureka,CA[DEN]  39.75  104.87  Denver,CO[DCA]  38.85   77.04  Washington/Natl,DC[MIA]  25.82   80.28  Miami Intl,FL[TPA]  27.97   82.53  Tampa Intl,FL[JAX]  30.50   81.70  Jacksonville,FL[TLH]  30.38   84.37  Tallahassee,FL[ATL]  33.65   84.42  Atlanta,GA[BOI]  43.57  116.22  Boise,ID[CHI]  41.90   87.65  Chicago,IL[IND]  39.73   86.27  Indianapolis,IN[DSM]  41.53   93.65  Des Moines,IA[SUX]  42.40   96.38  Sioux City,IA[ICT]  37.65   97.43  Wichita,KS[LEX]  38.05   85.00  Lexington,KY[NEW]  30.03   90.03  New Orleans,LA[BOS]  42.37   71.03  Boston,MA[PWM]  43.65   70.32  Portland,ME[BGR]  44.80   68.82  Bangor,ME[CAR]  46.87   68.02  Caribou Mun,ME[DET]  42.42   83.02  Detroit,MI[STC]  45.55   94.07  St Cloud,MN[DLH]  46.83   92.18  Duluth,MN[STL]  38.75   90.37  St Louis,MO[JAN]  32.32   90.08  Jackson,MS[BIL]  45.80  108.53  Billings,MT[BTM]  45.95  112.50  Butte,MT[RDU]  35.87   78.78  Raleigh-Durh,NC[INT]  36.13   80.23  Winston-Salem,NC[OMA]  41.30   95.90  Omaha/Eppley,NE[LAS]  36.08  115.17  Las Vegas,NV[RNO]  39.50  119.78  Reno,NV[AWH]  41.33  116.25  Wildhorse,NV[EWR]  40.70   74.17  Newark Intl,NJ[SAF]  35.62  106.08  Santa Fe,NM[NYC]  40.77   73.98  New York,NY[BUF]  42.93   78.73  Buffalo,NY[ALB]  42.75   73.80  Albany,NY[FAR]  46.90   96.80  Fargo,ND[BIS]  46.77  100.75  Bismarck,ND[CVG]  39.05   84.67  Cincinnati,OH[CLE]  41.42   81.87  Cleveland,OH[OKC]  35.40   97.60  Oklahoma Cty,OK[PDX]  45.60  122.60  Portland,OR[MFR]  42.37  122.87  Medford,OR[AGC]  40.35   79.93  Pittsburgh,PA[PVD]  41.73   71.43  Providence,RI[CHS]  32.90   80.03  Charleston,SC[RAP]  44.05  103.07  Rapid City,SD[FSD]  43.58   96.73  Sioux Falls,SD[MEM]  35.05   90.00  Memphis Intl,TN[TYS]  35.82   83.98  Knoxville,TN[CRP]  27.77   97.50  Corpus Chrst,TX[DRT]  29.37  100.92  Del Rio,TX[IAH]  29.97   95.35  Houston,TX[SAT]  29.53   98.47  San Antonio,TX[LGU]  41.78  111.85  Logan,UT[SLC]  40.78  111.97  Salt Lake Ct,UT[SGU]  37.08  113.60  Saint George,UT[CNY]  38.77  109.75  Moab,UT[MPV]  44.20   72.57  Montpelier,VT[RIC]  37.50   77.33  Richmond,VA[BLI]  48.80  122.53  Bellingham,WA[SEA]  47.45  122.30  Seattle,WA[ALW]  46.10  118.28  Walla Walla,WA[GRB]  44.48   88.13  Green Bay,WI[MKE]  42.95   87.90  Milwaukee,WI[CYS]  41.15  104.82  Cheyenne,WY[SHR]  44.77  106.97  Sheridan,WY"""))

In [68]:

plot_lines(USA_map,'bo')

In [69]:

plot_tsp(repeated_altered_nn_tsp,USA_map)

80 city tour with length 13562.6 in 0.297 secs for repeated_altered_nn_tsp

Not bad! There are no obvious errors in the tour (although I'm not at all confident it is the optimal tour).

Now let's do the same for Randal Olson's landmarks. Note that the data is delimited by tabs, not spaces, and the longitude already has a minus sign, so we don't need another one inlong_scale.

In [70]:

USA_landmarks_map=Coordinate_map(lines("""Mount Rushmore National Memorial, South Dakota 244, Keystone, SD43.879102-103.459067Toltec Mounds, Scott, AR34.647037-92.065143Ashfall Fossil Bed, Royal, NE42.425000-98.158611Maryland State House, 100 State Cir, Annapolis, MD 2140138.978828-76.490974The Mark Twain House & Museum, Farmington Avenue, Hartford, CT41.766759-72.701173Columbia River Gorge National Scenic Area, Oregon45.711564-121.519633Mammoth Cave National Park, Mammoth Cave Pkwy, Mammoth Cave, KY37.186998-86.100528Bryce Canyon National Park, Hwy 63, Bryce, UT37.593038-112.187089USS Alabama, Battleship Parkway, Mobile, AL30.681803-88.014426Graceland, Elvis Presley Boulevard, Memphis, TN35.047691-90.026049Wright Brothers National Memorial Visitor Center, Manteo, NC35.908226-75.675730Vicksburg National Military Park, Clay Street, Vicksburg, MS32.346550-90.849850Statue of Liberty, Liberty Island, NYC, NY40.689249-74.044500Mount Vernon, Fairfax County, Virginia38.729314-77.107386Fort Union Trading Post National Historic Site, Williston, North Dakota 1804, ND48.000160-104.041483San Andreas Fault, San Benito County, CA36.576088-120.987632Chickasaw National Recreation Area, 1008 W 2nd St, Sulphur, OK 7308634.457043-97.012213Hanford Site, Benton County, WA46.550684-119.488974Spring Grove Cemetery, Spring Grove Avenue, Cincinnati, OH39.174331-84.524997Craters of the Moon National Monument & Preserve, Arco, ID43.416650-113.516650The Alamo, Alamo Plaza, San Antonio, TX29.425967-98.486142New Castle Historic District, Delaware38.910832-75.527670Gateway Arch, Washington Avenue, St Louis, MO38.624647-90.184992West Baden Springs Hotel, West Baden Avenue, West Baden Springs, IN38.566697-86.617524Carlsbad Caverns National Park, Carlsbad, NM32.123169-104.587450Pikes Peak, Colorado38.840871-105.042260Okefenokee Swamp Park, Okefenokee Swamp Park Road, Waycross, GA31.056794-82.272327Cape Canaveral, FL28.388333-80.603611Glacier National Park, West Glacier, MT48.759613-113.787023Congress Hall, Congress Place, Cape May, NJ 0820438.931843-74.924184Olympia Entertainment, Woodward Avenue, Detroit, MI42.387579-83.084943Fort Snelling, Tower Avenue, Saint Paul, MN44.892850-93.180627Hoover Dam, Boulder City, CO36.012638-114.742225White House, Pennsylvania Avenue Northwest, Washington, DC38.897676-77.036530USS Constitution, Boston, MA42.372470-71.056575Omni Mount Washington Resort, Mount Washington Hotel Road, Bretton Woods, NH44.258120-71.441189Grand Canyon National Park, Arizona36.106965-112.112997The Breakers, Ochre Point Avenue, Newport, RI41.469858-71.298265Fort Sumter National Monument, Sullivan's Island, SC32.752348-79.874692Cable Car Museum, 94108, 1201 Mason St, San Francisco, CA 9410837.794781-122.411715Yellowstone National Park, WY 8219044.462085-110.642441French Quarter, New Orleans, LA29.958443-90.064411C. W. Parker Carousel Museum, South Esplanade Street, Leavenworth, KS39.317245-94.909536Shelburne Farms, Harbor Road, Shelburne, VT44.408948-73.247227Taliesin, County Road C, Spring Green, Wisconsin43.141031-90.070467Acadia National Park, Maine44.338556-68.273335Liberty Bell, 6th Street, Philadelphia, PA39.949610-75.150282Terrace Hill, Grand Avenue, Des Moines, IA41.583218-93.648542Lincoln Home National Historic Site Visitor Center, 426 South 7th Street, Springfield, IL39.797501-89.646211Lost World Caverns, Lewisburg, WV37.801788-80.445630"""),delimiter='\t',long_scale=48)

In [71]:

plot_lines(USA_landmarks_map,'bo')

In [72]:

plot_tsp(repeated_altered_nn_tsp,USA_landmarks_map)

50 city tour with length 10236.7 in 0.125 secs for repeated_altered_nn_tsp

We can compare that to the tour that Randal Olson computed as the shortest based on road distances:

The two tours are similar but not the same. I think the difference is that roads through the rockies and along the coast of the Carolinas tend to be very windy, so Randal's tour avoids them, whereas my program assumes staright-line roads and thus includes them. William Cook provides ananalysis, and atour that is shorter than either Randal's or mine.

Now let's go back to theoriginal web page to get a bigger map with over 1000 cities. A shell command fetches the file:

In [73]:

![-elatlong.htm]||curl-Ohttp://www.realestate3d.com/gps/latlong.htm

I note that the page has some lines that I don't want, so I will filter out lines that are not in the continental US (that is, cities in Alaska or Hawaii), as well as header lines that do not start with'['.

In [74]:

defcontinental_USA(line):"Does line denote a city in the continental United States?"returnline.startswith('[')and',AK'notinlineand',HI'notinlineUSA_big_map=Coordinate_map(filter(continental_USA,open('latlong.htm')))

In [75]:

plot_lines(USA_big_map,'bo')

Let's get a baseline tour withnn_tsp:

In [76]:

plot_tsp(nn_tsp,USA_big_map)

1089 city tour with length 52879.1 in 0.177 secs for nn_tsp

Now try to improve on that withrepeat_100_nn_tsp and withrepeat_5_altered_nn_tsp (which will take a while with over 1000 cities):

In [77]:

plot_tsp(repeat_100_nn_tsp,USA_big_map)

1089 city tour with length 50802.6 in 17.126 secs for repeat_100_nn_tsp

In [78]:

plot_tsp(repeat_5_altered_nn_tsp,USA_big_map)

1089 city tour with length 44234.6 in 23.149 secs for repeat_5_altered_nn_tsp

Again we see that we do better by spending our run time budget on alteration rather than on repetition. This time we saved over 8,000 miles of travel in half a minute of computation!

Greedy Algorithm:`greedy_tsp`¶

At the start of theApproximate Algorithms section, we mentioned two ideas:

Nearest Neighbor Algorithm: Make the tour go from a city to its nearest neighbor. Repeat.
Greedy Algorithm: Find the shortest distance between any two cities and include that in the tour. Repeat.

It is time to develop thegreedy algorithm, so-called because at every step it greedily adds to the tour the edge that is shortest (even if that is not best in terms of long-range planning). The nearest neighbor algorithm always extended the tour by adding on to the end. The greedy algorithm is different in that it doesn't have a notion ofend of the tour; instead it keeps aset of partial segments. Here's a brief statement of the algorithm:

Greedy Algorithm:Maintain a set of segments; intially each city defines its own 1-city segment. Find the shortest possible edge that connects two endpoints of two different segments, and join those segments with that edge. Repeat until we form a segment that tours all the cities.

On each step of the algorithm, we want to "find the shortest possible edge that connects two endpoints." That seems like an expensive operation to do on each step. So we will add in some data structures to enable us to speed up the computation. Here's a more detailed sketch of the algorithm:

Pre-compute a list ofedges, sorted by shortest edge first. An edge is a pair of cities; if the list contains(A, B) then it does not contain(B, A), and it never contains(A, A).
Maintain a dict that mapsendpoints tosegments, e.g.{A: [A, B, C, D], D: [A, B, C, D]}. Initially, each city is the endpoint of its own 1-city-long segment, but as we join segments together, some cities are no longer endpoints and are removed from the dict.
Go through the edges in shortest-first order. When you find an edge(A, B) such that bothA andB are endpoints of different segments, then join the two segments together. Maintain the endpoints dict to reflect this new segment. Stop when you createa segment that contains all the cities.

Let's consider an example: assume we have seven cities, labeled A through G. Suppose CG happens to be the shortest edge. We would add the edge to the partial tour, by joining the segment that contains C with the segment that contains G. In this case, the joining is easy, because each segment is one city long; we join them to form a segment two cities long. We then look at the next shortest edge and continue the process, joining segments as we go, as shown in the table below. Some edges cannot be used. For example, FD cannot be used, because by the time it becomes the shortest edge, D is already in the interior of a segment. Next, AE cannot be used, even though both A and E are endpoints, because it would make a loop out of ACGDE. Finally, note that sometimes we may have to reverse a segment. For example, EF can merge AGCDE and BF, but first we have to reverse BF to FB.

Shortest Edge	Usage of edge	Resulting Segments
—	—	A; B; C; D; E; F; G
CG	Join C to G	A; B;CG; D; E; F
DE	Join D to E	A; B; CG;DE; F
AC	Join A to CG	B;ACG; DE; F
GD	Join ACG to D	B; ACGDE; F
FD	Discard	B; ACGDE; F
AE	Discard	B; ACGDE; F
BF	Join B to F	BF; ACGDE
CF	Discard	BF; ACGDE
EF	Join ACGDE to FB	ACGDEFB

Here is the code:

In [79]:

defgreedy_tsp(cities):"""Go through edges, shortest first. Use edge to join segments if possible."""endpoints={c:[c]forcincities}# A dict of {endpoint: segment}for(A,B)inshortest_edges_first(cities):ifAinendpointsandBinendpointsandendpoints[A]!=endpoints[B]:new_segment=join_endpoints(endpoints,A,B)iflen(new_segment)==len(cities):returnnew_segment# TO DO: functions: shortest_edges_first, join_endpoints

Note: Theendpoints dict is serving two purposes. First, the keys of the dict are all the cities that are endpoints of some segments,making it possible to ask "A in endpoints" to see if cityA is an endpoint. Second, the values of the dict are all the segments, making it possible to ask "endpoints[A] != endpoints[B]" to make sure that the two cities are endpoints of different segments, not of the same segment.

Theshortest_edges_first function is easy: generate all(A, B) pairs of cities, and sort by the distance between the cities. (Note: I use the conditionalif id(A) < id(B) so that I won't have both(A, B) and(B, A) in my list of edges, and I won't ever have(A, A).)

In [80]:

defshortest_edges_first(cities):"Return all edges between distinct cities, sorted shortest first."edges=[(A,B)forAincitiesforBincitiesifid(A)<id(B)]returnsorted(edges,key=lambdaedge:distance(*edge))

For thejoin_endpoints function, I first make sure that A is the last element of one segment and B is the first element of the other, by reversing segments if necessary. Then I add the B segment on to the end of the A segment. Finally, I update theendpoints dict. This is a bit tricky! My first thought was that A and B are no longer endpoints, because they have been joined together in the interior of the segment. However, that isn't always true. If A was the endpoint of a 1-city segment, then when you join it to B, A is still an endpoint. I could have had complicated logic to handle the case when A, B, or both, or neither were 1-city segments, but I decided on a different tactic: first unconditionally delete A and B from the endpoints dict, no matter what. Then add the two endpoints of the new segment (which isAsegment) to the endpoints dict.

In [81]:

defjoin_endpoints(endpoints,A,B):"Join B's segment onto the end of A's and return the segment. Maintain endpoints dict."Asegment,Bsegment=endpoints[A],endpoints[B]ifAsegment[-1]isnotA:Asegment.reverse()ifBsegment[0]isnotB:Bsegment.reverse()Asegment.extend(Bsegment)delendpoints[A],endpoints[B]# A and B are no longer endpointsendpoints[Asegment[0]]=endpoints[Asegment[-1]]=AsegmentreturnAsegment

Let's try out thegreedy_tsp algorithm on the two USA maps:

In [82]:

plot_tsp(greedy_tsp,USA_map)

80 city tour with length 16087.5 in 0.006 secs for greedy_tsp

In [83]:

plot_tsp(greedy_tsp,USA_big_map)

1089 city tour with length 46981.5 in 1.052 secs for greedy_tsp

The greedy algorithm is worse than nearest neighbors, but it is fast (especially on the big map). Let's see if thealteration strategy can help:

In [84]:

defaltered_greedy_tsp(cities):"Run greedy TSP algorithm, and alter the results by reversing segments."returnalter_tour(greedy_tsp(cities))

In [85]:

plot_tsp(altered_greedy_tsp,USA_map)

80 city tour with length 14133.3 in 0.022 secs for altered_greedy_tsp

In [86]:

plot_tsp(altered_greedy_tsp,USA_big_map)

1089 city tour with length 43769.9 in 4.177 secs for altered_greedy_tsp

That's the best result yet on the big map. Let's look at some benchmarks:

In [87]:

algorithms=[altered_nn_tsp,altered_greedy_tsp,repeated_altered_nn_tsp]benchmarks(algorithms)print('-'*100)benchmarks(algorithms,Maps(30,120))

           altered_nn_tsp |   4820 ± 233 ( 4450 to  5346) |  0.008 secs/map | 30 ⨉ 60-city maps       altered_greedy_tsp |   4766 ± 207 ( 4320 to  5185) |  0.009 secs/map | 30 ⨉ 60-city maps  repeated_altered_nn_tsp |   4640 ± 194 ( 4298 to  4991) |  0.148 secs/map | 30 ⨉ 60-city maps----------------------------------------------------------------------------------------------------           altered_nn_tsp |   6589 ± 202 ( 6188 to  7016) |  0.036 secs/map | 30 ⨉ 120-city maps       altered_greedy_tsp |   6539 ± 240 ( 5994 to  7203) |  0.037 secs/map | 30 ⨉ 120-city maps  repeated_altered_nn_tsp |   6402 ± 185 ( 6015 to  6779) |  0.701 secs/map | 30 ⨉ 120-city maps

So overall, the altered greedy algorithm looks slightly better than the altered nearest neighbor algorithm and runs in about the same time. However, the repeated altered nearest neighbor algorithm does best of all (although it takes much longer).

What about a repeated altered greedy algorithm? That might be a good idea, but there is no obvious way to do it. We can't just start from a sample of cities, because the greedy algorithm doesn't have a notion of starting city.

Visualizing the Greedy Algorithm¶

I would like to see how the process of joining segments unfolds. Although I dislike copy-and-paste (because it violates theDon't Repeat Yourself principle), I'll copygreedy_tsp and make a new version calledvisualize_greedy_tsp which adds one line to plot the segments several times as the algorithm is running:

In [88]:

defvisualize_greedy_tsp(cities,plot_sizes):"""Go through edges, shortest first. Use edge to join segments if possible.    Plot segments at specified sizes."""edges=shortest_edges_first(cities)# A list of (A, B) pairsendpoints={c:[c]forcincities}# A dict of {endpoint: segment}for(A,B)inedges:ifAinendpointsandBinendpointsandendpoints[A]!=endpoints[B]:new_segment=join_endpoints(endpoints,A,B)plot_segments(endpoints,plot_sizes,distance(A,B))# <<<< NEWiflen(new_segment)==len(cities):returnnew_segmentdefplot_segments(endpoints,plot_sizes,dist):"If the number of distinct segments is one of plot_sizes, then plot segments."segments=set(map(tuple,endpoints.values()))iflen(segments)inplot_sizes:forsinsegments:plot_lines(s)plt.show()print('{} segments, longest edge ={:.0f}'.format(len(segments),dist))

In [89]:

visualize_greedy_tsp(USA_map,(50,25,10,5,2,1));

50 segments, longest edge = 119

25 segments, longest edge = 190

10 segments, longest edge = 255

5 segments, longest edge = 335

2 segments, longest edge = 597

1 segments, longest edge = 1021

Divide and Conquer Strategy¶

The next general strategy to consider isdivide and conquer. Suppose we have an algorithm, likealltours_tsp, that is inefficient for largen (thealltours_tsp algorithm is O(n!) forn cities). So we can't applyalltours_tsp directly to a large set of cities. But we can divide the problem into smaller pieces, and then combine those pieces:

Split the set of cities in half.
Find a tour for each half.
Join those two tours into one.

The trick is that whenn is small, then step 2 can be done directly. But whenn is large, step 2 is done with a recursive call, breaking the half into two smaller halves.

Let's work out by hand an example with a small map of just six cities. Here are the cities:

In [90]:

cities=Cities(6)plot_labeled_lines(cities)

Step 1 is to divide this set in half. I'll divide it into a left half (blue circles) and a right half (black squares):

In [91]:

plot_labeled_lines(list(cities),'bo',[3,4,0],'ks',[1,2,5])

Step 2 is to find a tour for each half:

In [92]:

plot_labeled_lines(list(cities),'bo-',[0,3,4,0],'ks-',[1,2,5,1])

Step 3 is to combine the two halves. We do that by choosing an edge from each half to delete (the edges marked by red dashes) and replacing those two edges by two edges that connect the halves (the blue dash-dot edges). Note that there are two choices of ways to connect the new dash-dot lines. Pick the one that yields the shortest tour.

In [93]:

# One way to connect the two segments, giving the tour [1, 3, 4, 0, 2, 5]plot_labeled_lines(list(cities),'bo-',[0,3,4],'ks-',[1,2,5],'b-.',[0,1],[4,5],'r--',[0,4],[1,5])

Now we have a feel for what we have to do. Let's define the divide and conquer algorithm, which we will calldq_tsp. Like alltsp algorithms it gets a set of cities as input and returns a tour. If the size of the set of cities is 3 or less, then just listing the cities in any order produces an optimal tour. If there are more than 3 cities, then split the cities in half (usingsplit_cities), find a tour for each half (usingdq_tsp recursively), and join the two tours together (usingjoin_tours):

In [94]:

defdq_tsp(cities):"""Find a tour by divide and conquer: if number of cities is 3 or less,    any tour is optimal.  Otherwise, split the cities in half, solve each    half recursively, then join those two tours together."""iflen(cities)<=3:returnTour(cities)else:Cs1,Cs2=split_cities(cities)returnjoin_tours(dq_tsp(Cs1),dq_tsp(Cs2))# TO DO: functions: split_cities, join_tours

Let's verify thatdq_tsp works for three cities:

In [95]:

plot_tsp(dq_tsp,Cities(3))

3 city tour with length 1203.4 in 0.000 secs for dq_tsp

If we have more than 3 cities, how do we split them? My approach is to imagine drawing an axis-aligned rectangle that is just big enough to contain all the cities. If the rectangle is wider than it is tall, then order all the cities byx coordiante and split that ordered list in half. If the rectangle is taller than it is wide, order and split the cities byy coordinate.

In [96]:

defsplit_cities(cities):"Split cities vertically if map is wider; horizontally if map is taller."width,height=extent([c.xforcincities]),extent([c.yforcincities])key='x'if(width>height)else'y'cities=sorted(cities,key=lambdac:getattr(c,key))mid=len(cities)//2returnfrozenset(cities[:mid]),frozenset(cities[mid:])defextent(numbers):returnmax(numbers)-min(numbers)

Let's show that split_cities is working:

In [97]:

Cs1,Cs2=split_cities(cities)plot_tour(dq_tsp(Cs1))plot_tour(dq_tsp(Cs2))

Now for the tricky part: joining two tours together. First we consider all ways of deleting one edge from each of the two tours. If we delete a edge from a tour we get a segment. We are representing segments as lists of cities, the same surface representation as tours. But there is a difference in their interpretation. The tour[0, 2, 5] is a triangle of three edges, but the segment[0, 2, 5] consists of only two edges, from0 to2 and from2 to5. The segments that result from deleting an edge from the tour[0, 2, 5] are:

[0, 2, 5],    [2, 5, 0],    [5, 0, 2]

You may recognize these as therotations of the segment[0, 2, 5]. So any candidate combined tour consists of taking a rotation of the first tour, and appending to it a rotation of the second tour, with one caveat: when we go to append the two segments, there are two ways of doing it: either keep the second segment as is, or reverse the second segment.

Note: In Python,sequence[::-1] means to reverse the sequence.

In [98]:

defjoin_tours(tour1,tour2):"Consider all ways of joining the two tours together, and pick the shortest."segments1,segments2=rotations(tour1),rotations(tour2)tours=[s1+s2fors1insegments1forsinsegments2fors2in(s,s[::-1])]returnshortest_tour(tours)defrotations(sequence):"All possible rotations of a sequence."# A rotation is some suffix of the sequence followed by the rest of the sequence.return[sequence[i:]+sequence[:i]foriinrange(len(sequence))]

Let's see if it works, first on the 6 city example:

In [99]:

plot_tsp(dq_tsp,Cities(6))

6 city tour with length 1431.7 in 0.000 secs for dq_tsp

That is indeed the optimal tour, achieved by deleting the two dashed red lines and adding the dotted blue lines.

Now for the USA map:

In [100]:

plot_tsp(dq_tsp,USA_map)

80 city tour with length 14883.2 in 0.142 secs for dq_tsp

Not quite as good asaltered_greedy_tsp. Let's alter it!

In [101]:

defaltered_dq_tsp(cities):returnalter_tour(dq_tsp(cities))

In [102]:

plot_tsp(altered_dq_tsp,USA_map)

80 city tour with length 14209.6 in 0.109 secs for altered_dq_tsp

Let's just remind ourselves how the algorithms behave on the standard test cases:

In [103]:

algorithms=[nn_tsp,greedy_tsp,dq_tsp,altered_dq_tsp,altered_nn_tsp,altered_greedy_tsp,repeated_altered_nn_tsp]benchmarks(algorithms)

                   nn_tsp |   5668 ± 488 ( 4674 to  6832) |  0.001 secs/map | 30 ⨉ 60-city maps               greedy_tsp |   5392 ± 306 ( 4554 to  5967) |  0.002 secs/map | 30 ⨉ 60-city maps                   dq_tsp |   5268 ± 236 ( 4743 to  5752) |  0.042 secs/map | 30 ⨉ 60-city maps           altered_dq_tsp |   4953 ± 221 ( 4575 to  5399) |  0.049 secs/map | 30 ⨉ 60-city maps           altered_nn_tsp |   4820 ± 233 ( 4450 to  5346) |  0.008 secs/map | 30 ⨉ 60-city maps       altered_greedy_tsp |   4766 ± 207 ( 4320 to  5185) |  0.009 secs/map | 30 ⨉ 60-city maps  repeated_altered_nn_tsp |   4640 ± 194 ( 4298 to  4991) |  0.148 secs/map | 30 ⨉ 60-city maps

Of the non-altered algorithms (the first three lines), divide and conquer (dq_tsp) does best. But interestingly, divide and conquer is helped less byalter_tour than is the greedy algorithm or nearest neighbor algorithm. Perhaps it is because divide and conquer constructs its tour by putting together pieces that are already good, soalter_tour is less able to improve it. ALso,dq_tsp has a standard deviation that is much smaller than the other two—this suggests thatdq_tsp is not producing really bad tours that can be easily improved byalter_tour. In any event,altered_dq_tsp is the worst of thealtered algorithms, both in average tour length and in run time.

repeated_altered_nn_tsp remains the best in tour length, although the worst in run time.

Shoulders of Giants: Minimum Spanning Tree Algorithm:`mst_tsp`¶

Joseph Kruskal (Wikipedia)

I hope you now believe that you could have come up with some ideas for solving the TSP. But even if you can't come up with something all on your own, you can alwaysGoogle it, in which case you'll no doubt find a giant of a mathematician,Joseph Kruskal, who, in 1956,publisheda paper that led to an algorithm thatmost people would not have thought of on their own(I know I wouldn't have):

Minimum Spanning Tree Traversal Algorithm:Construct a Minimum Spanning Tree, then do a pre-order traversal. That will give you a tour that is guaranteed to be no more than twice as long as the minimal tour.

What does all this jargon mean? It is part ofgraph theory, the study of vertexes and edges. Here is a glossary of terms:

Agraph is a collection of vertexes and edges.
Avertex is a point (such as a city).
Anedge is a link between two vertexes. Edges have lengths.
Adirected graph is a graph where the edges have a direction. We say that the edge goes from theparent vertex to thechild vertex.
Atree is a directed graph in which there is one distinguished vertex called theroot that has no parent; every other vertex has exactly one parent.
Aspanning tree (of a set of vertexes) is a tree that contains all the vertexes.
Aminimum spanning tree is a spanning tree with the smallest possible sum of edge lengths.
Atraversal of a tree is a way of visiting all the vertexes in some order.
Apre-order traversal means that you visit the root first, then do a pre-order traversal of each of the children.
Aguarantee means that, no matter what set of cities is selected, the tour found by the minimum spanning tree traversal algorithm will never be more than twice as long as the shortest possible tour. None of the other algorithms has any guarantee at all (except foralltours_tsp, which is guaranteed to find the optimal algorithm, if it has enough time to complete).

We will implement a vertex as a Point, and a directed graph as a dict of{parent: [child, ...]} pairs.

Visualizing Graphs and Trees¶

I think we will need visualization right away, so before doing anything else I will defineplot_graph. I will make it plot in red so that we can easily tell a tour (blue) from a graph (red).

In [104]:

defplot_graph(graph):"Given a graph of the form {parent: [child...]}, plot the vertexes and edges."vertexes={vforparentingraphforvingraph[parent]}|set(graph)edges={(parent,child)forparentingraphforchildingraph[parent]}foredgeinedges:plot_lines(edge,'ro-')total_length=sum(distance(p,c)for(p,c)inedges)print('{} node Graph of total length:{:.1f}'.format(len(vertexes),total_length))

Let's try it out:

In [105]:

Ps=[Point(0,0.1),Point(-2,-1),Point(0,-1),Point(2,-1),Point(-2.9,-1.9),Point(-1,-1.9),Point(1,-1.9),Point(2.9,-1.9)]Ptree={Ps[0]:Ps[1:4],Ps[1]:Ps[4:6],Ps[3]:Ps[6:8]}plot_graph(Ptree)

8 node Graph of total length: 10.9

Now our plan is:

Implement an algorithm to create a minimum spanning tree.
Implement a tree traversal; that will give us ourmst_tsp algorithm.
Understand the guarantee,

Creating a Minimum Spanning Tree (`mst`)¶

Now let's see how to create a minimum spanning tree (or MST). Kruskal has a very nice algorithm to find MSTs, but with what we have done so far, it will be a bit easier to implement another Giant's algorithm:

Prim's algorithm for creating a MST:List all the edges and sort them, shortest first. Initialize a tree to be a single root city (we'll arbitrarily shoose the first city). Now repeat the following until the tree contains all the cities: find the shortest edge that links a city (A) that is in the tree to a city (B) that is not yet in the tree, and add B to the list of A's children in the tree.

Here's the code. One tricky bit: In the first line inside thewhile loop, we define(A, B) to be an edge in which one ofA orB is in the tree, using the exclusive-or operator,^. Then in the next line, we make sure thatA is the one that is in the tree and B is not, by swapping if necessary.

In [106]:

defmst(vertexes):"""Given a set of vertexes, build a minimum spanning tree: a dict of the form {parent: [child...]},    where parent and children are vertexes, and the root of the tree is first(vertexes)."""tree={first(vertexes):[]}# the first city is the root of the tree.edges=shortest_edges_first(vertexes)whilelen(tree)<len(vertexes):(A,B)=shortest_usable_edge(edges,tree)tree[A].append(B)tree[B]=[]returntreedefshortest_usable_edge(edges,tree):"Find the ehortest edge (A, B) where A is in tree and B is not."(A,B)=first((A,B)for(A,B)inedgesif(Aintree)^(Bintree))# ^ is "xor"return(A,B)if(Aintree)else(B,A)

Let's see what a minimum spanning tree looks like:

In [107]:

plot_graph(mst(USA_map))

80 node Graph of total length: 11518.4

This algorithm clearly produced a spanning tree. It looks pretty good, but how can we be sure the algorithm willalways produce a minimum spanning tree?

The output is atree because (1) every city is connected by a path from the root, and (2) every city only gets one parent (we only add a B that is not in tree), so there can be no loops.
The output is aspanning tree because it contains all the cities.
The output is aminimum spanning tree because each city was added with the shortest possible edge. Suppose this algorithm produces the tree T. For another putative spanning tree to be shorter, it would have to contain at least one city C whose edge from its parent was shorter than the edge in T. But that is not possible, because the algorithm always chooses the shortest possible edge from C's parent to C.

Note: There are refinements to Prim's algorithm to make it more efficient. I won't bother with them because they complicate the code, and becausemst is already fast enough for our purposes.

Turning a Minimum Spanning Tree into a Tour (`mst_tsp`)¶

Given a minimum spanning tree, we can generate a tour by doing a pre-order traversal, which means the tour starts at the root, then visits all the cities in the pre-order traversal of the first child of the root, followed by the pre-order traversals of any other children.

In [108]:

defmst_tsp(cities):"Create a minimum spanning tree and walk it in pre-order, omitting duplicates."returnpreorder_traversal(mst(cities),first(cities))defpreorder_traversal(tree,root):"Traverse tree in pre-order, starting at root of tree."result=[root]forchildintree.get(root,()):result.extend(preorder_traversal(tree,child))returnresult

To better understand pre-order traversal, let's go back to thePtree example, and this time label the vertexes:

In [109]:

P=[Point(0,0.1),Point(-2,-1),Point(0,-1),Point(2,-1),Point(-2.9,-1.9),Point(-1,-1.9),Point(1,-1.9),Point(2.9,-1.9)]Ptree={P[0]:P[1:4],P[1]:P[4:6],P[3]:P[6:8]}plot_graph(Ptree)plot_labeled_lines(P)

8 node Graph of total length: 10.9

A pre-order traversal starting at 0 would go to the first child, 1, then to its children, 4 and 5, then since there are no children of 4 and 5, it would continue with the other children of 0, hitting 2, then 3, and finally the children of 3, namely 6 and 7. So the following should be true:

In [110]:

preorder_traversal(Ptree,P[0])==[P[0],P[1],P[4],P[5],P[2],P[3],P[6],P[7]]

Out[110]:

True

And this is what the pre-order traversal looks like as a tour:

In [111]:

plot_tour([P[0],P[1],P[4],P[5],P[2],P[3],P[6],P[7]])

You can think of this as starting at the root (at the top) and going around the outside of the tree counterclockwise, as if you were walking with your left hand always touching an edge, but skipping cities you have already been to.

We see that the result is a tour, but not an optimal one.

Let's see whatmst_tsp can do on the USA map:

In [112]:

plot_tsp(mst_tsp,USA_map)

80 city tour with length 17924.5 in 0.004 secs for mst_tsp

Not so great. Can the alteration strategy help?

In [113]:

defaltered_mst_tsp(cities):returnalter_tour(mst_tsp(cities))

In [114]:

plot_tsp(altered_mst_tsp,USA_map)

80 city tour with length 14105.0 in 0.022 secs for altered_mst_tsp

Better. Let's go to the benchmarks:

In [115]:

benchmarks([mst_tsp,nn_tsp,greedy_tsp,dq_tsp])

                  mst_tsp |   5953 ± 361 ( 5334 to  7030) |  0.002 secs/map | 30 ⨉ 60-city maps                   nn_tsp |   5668 ± 488 ( 4674 to  6832) |  0.001 secs/map | 30 ⨉ 60-city maps               greedy_tsp |   5392 ± 306 ( 4554 to  5967) |  0.002 secs/map | 30 ⨉ 60-city maps                   dq_tsp |   5268 ± 236 ( 4743 to  5752) |  0.042 secs/map | 30 ⨉ 60-city maps

Not very encouraging:mst_tsp is the second slowest and has the longest tours. I'm sure I could make it faster (at the cost of making the code a bit more complicated), but there is no point if the tours are going to be longer.

What happens when we add the alteration strategy?

In [116]:

benchmarks([altered_dq_tsp,altered_nn_tsp,altered_mst_tsp,altered_greedy_tsp,repeated_altered_nn_tsp])

           altered_dq_tsp |   4953 ± 221 ( 4575 to  5399) |  0.049 secs/map | 30 ⨉ 60-city maps           altered_nn_tsp |   4820 ± 233 ( 4450 to  5346) |  0.008 secs/map | 30 ⨉ 60-city maps          altered_mst_tsp |   4823 ± 227 ( 4354 to  5250) |  0.009 secs/map | 30 ⨉ 60-city maps       altered_greedy_tsp |   4766 ± 207 ( 4320 to  5185) |  0.009 secs/map | 30 ⨉ 60-city maps  repeated_altered_nn_tsp |   4640 ± 194 ( 4298 to  4991) |  0.148 secs/map | 30 ⨉ 60-city maps

Nowaltered_mst_tsp is in the middle of the pack, both in tour length and in run time.

So why would we want to use the rather complicated minimum spanning tree algorithm, when the greedy algorithm is simpler to implement, runs faster, and produces shorter tours?

Guaranteed Tour Length!¶

The great thing about the minimum spanning tree algorithm is that it comes with aguarantee, which none of the other algorithms offer. You are guaranteed that the tour length it comes up with will be no worse than twice as long as the optimal tour. (And, with a bit more complication, you can modify it to give a guarantee of 1.5 times longer.) The guarantee works like this:

The minimum spanning tree, by definition, connects all the cities with the shortest possible total edge length.
So if you could follow each edge in the spanning tree just once, and that formed a legal tour, then that would be guaranteed to bea minimal tour.
But you can't do that in general; in general there will be places where you skip to the next city without following the spanning tree. Any such skip, however, is a straight line, and thus will be less than you would take if you went to the next city by following along the spanning tree.
If you did follow along the spanning tree, you would follow some edges twice, and some edges once. Hence the total length of the tour would be at most twice the spanning tree, and thus at most twice the minimal tour.

A guarantee is great from a theoretical point of view, but in practice the greedy or nearest neighbor algorithms do just better than the minimum spanning tree, on the maps that we actually see.

Shoulders of Giants: Held-Karp Algorithm:`hk_tsp`¶

Held, Shareshian, Karp (Computer History Museum)

xkcd 399

Another algorithm that shows up with a literature search is theHeld-Karp Dynamic Programming Algorithm, named after giantsMichael Held andRichard Karp. It is an algorithm for finding optimal tours, not approximate ones, so it is not appropriate for largen. But even in its simplest form, without any programming tricks, it can go quite a bit further thanalltours_tsp. That is becausealltours_tsp is O(n!), while the Held-Karp algorithm is only O(n² 2ⁿ). How did Held and Karp achieve this speedup? They noticed thatalltours_tsp wastes a lot of time with permutations that can't possibly be optimal tours. Consider the following 10-city problem, with a 6-city segment shown:

In [117]:

plot_labeled_lines(cross,'r-',[0,4,1,3,2,9])

Thealltours_tsp would consider 4! = 24 different tours that start with those 6 cities. But that seems wasteful: there is no way that this segment could be part of an optimal tour, so why waste time onany continuation of it? The proof that this segment can never be part of an optimal tour comes down to two things. First, we demonstrate another tour that also starts in city 0 and ends in city 9, and along the way goes through cities 1 through 4, and is shorter:

In [118]:

plot_labeled_lines(cross,[0,1,2,3,4,9])

Second, we need this key property:

*Given a start city A, an end city C, and a set of middle cities Bs, then out of all the possible segments that start in A, end in C, and go through all and only the cities in Bs, only the shortest of those segments could ever be part of an optimal tour.

Of course, we don't know that the optimal tour goes through exactly those Bs cities before hitting C. But if it does, then we need only consider the permutation of Bs that leads to the shortest segment. So we can throw out the red zig-zag segment above, and keep the nice smooth blue segment.

So far we have only been talking about segments. We know that the TSP is defined for tours, not segments. So even if we find the shortest possible segment, it might not be the shortest possible tour. But here's something we do know: a tour has to end somewhere. So just find the shortest segment from the start city,A, to every possible end city,C. That will give youn-2 segments. Out of those, don't choose the shortestsegment, but rather choose the shortesttour.

That gives us our algorithm:

In [119]:

defhk_tsp(cities):"""The H eld-Karpshortest tour of this set of cities.    For each end city C, find the shortest segment from A (the start) to C.    Out of all these shortest segments, pick the one that is the shortest tour."""A=first(cities)returnshortest_tour(shortest_segment(A,cities-{A,C},C)forCincitiesifCisnotA)# TO DO: function: shortest_segment(A, Bs, C)

Now forshortest_segment(A, Bs, C). It is defined to produce the shortest segment that starts in cityA, ends inC, and visits some permutation ofBs cities in the middle. If there are noBs cities, then of course the shortest segment is to go directly fromA toC. If there areBs cities, then one of them has to be the lastB city visited (just before visitingC). So for eachB, find the shortest segment that first goes fromA, through all the otherBs cities, then toB, and finally toC. Out of all these candidate segments, return the one with the minimum segment length.

Note: the decorator@functools.lru_cache makes this adynamic programming algorithm, which is a fancy name meaning that we cache the results of sub-computations because we will re-use them multiple times.

In [120]:

@functools.lru_cache(None)defshortest_segment(A,Bs,C):"The shortest segment starting at A, going through all Bs, and ending at C."ifnotBs:return[A,C]else:segments=[shortest_segment(A,Bs-{B},B)+[C]forBinBs]returnmin(segments,key=segment_length)defsegment_length(segment):"The total of distances between each pair of consecutive cities in the segment."# Same as tour_length, but without distance(tour[0], tour[-1])returnsum(distance(segment[i],segment[i-1])foriinrange(1,len(segment)))

That's all there is to it. Let's comparealltours_tsp withhk_tsp on 10 city tours:

In [121]:

plot_tsp(alltours_tsp,Cities(10))

10 city tour with length 2291.8 in 1.650 secs for alltours_tsp

In [122]:

plot_tsp(hk_tsp,Cities(10))

10 city tour with length 2291.8 in 0.037 secs for hk_tsp

We see thathk_tsp returns the optimal tour, and it is a lot faster. We can takehk_tsp into uncharted territory well beyond the reach ofalltours_tsp:

In [123]:

plot_tsp(hk_tsp,Cities(14))

14 city tour with length 2886.6 in 1.464 secs for hk_tsp

In [124]:

plot_tsp(hk_tsp,Cities(16))

16 city tour with length 2868.6 in 9.018 secs for hk_tsp

Not bad! In 11 seconds, we did whatalltours_tsp would have taken an estimated 200 days to complete! Let's repeat the table of expected times, comparing the All Tours algorithm with the Held-Karp algorithm:

n	`alltours_tsp(Cities(n))`	`hk_tsp(Cities(n))`
	expected time ≈ O(n!)	expected time ≈ O(n² 2ⁿ)
10	10! tours = 2 secs	0.1 secs
11	2 secs × 11! / 10! ≈ 22 secs	0.2 secs
12	2 secs × 12! / 10! ≈ 4 mins	0.4 secs
14	2 secs × 14! / 10! ≈ 13 hours	3 secs
16	2 secs × 16! / 10! ≈ 200 days	16² 2¹⁶ tours = 11 secs
18	2 secs × 18! / 10! ≈ 112 years	11 secs × (18/16)² 2^(18-16) ≈ 1 min
25	2 secs × 25! / 10! ≈ 270 billion years	11 secs × (25/16)² 2^(25-16) ≈ 4 hours
50	2 secs × 50! / 10! ≈ 5 × 10⁵⁰ years	11 secs × (50/16)² 2^(50-16) ≈ 58,000 years

So if we had some patience, we could find the optimal tour for a 25 city map, but we still can't handle the 50-city landmarks map.(There are refinements to Held-Karp that can handle 50-city maps, and could do it even with 1960s-era computing power.)

We're starting to run out of tricks, but we have one more general strategy to consider.

Ensembles of Other Algorithms:`ensemble_tsp`¶

When we have several optimization algorithms and we're not sure which is best, we can always try them all and take the best result. We will defineensemble_tsp, to combine the algorithms we have previously developed. First, if the set of input cities is small enough, it solves the problem optimally withhk_tsp. If the set is too large, it tries a selection of algorithms, and chooses the best resulting tour. The result is guaranteed to be as good or better than any of the component algorithms; but the run time is guaranteed to be longer than any of the component algorithms.

In [125]:

ensemble=[altered_dq_tsp,altered_greedy_tsp,altered_mst_tsp,repeated_altered_nn_tsp]defensemble_tsp(cities,threshold=16,algorithms=ensemble):"Apply all algorithms to cities and take the shortest resulting tour."iflen(cities)<=threshold:returnhk_tsp(cities)else:returnshortest_tour(tsp(cities)fortspinalgorithms)

Let's go to the benchmarks:

In [126]:

benchmarks(ensemble+[ensemble_tsp])

           altered_dq_tsp |   4953 ± 221 ( 4575 to  5399) |  0.049 secs/map | 30 ⨉ 60-city maps       altered_greedy_tsp |   4766 ± 207 ( 4320 to  5185) |  0.009 secs/map | 30 ⨉ 60-city maps          altered_mst_tsp |   4823 ± 227 ( 4354 to  5250) |  0.009 secs/map | 30 ⨉ 60-city maps  repeated_altered_nn_tsp |   4640 ± 194 ( 4298 to  4991) |  0.148 secs/map | 30 ⨉ 60-city maps             ensemble_tsp |   4630 ± 187 ( 4298 to  4991) |  0.213 secs/map | 30 ⨉ 60-city maps

In [127]:

benchmarks(ensemble+[ensemble_tsp],Maps(30,120))

           altered_dq_tsp |   6771 ± 220 ( 6273 to  7248) |  0.347 secs/map | 30 ⨉ 120-city maps       altered_greedy_tsp |   6539 ± 240 ( 5994 to  7203) |  0.037 secs/map | 30 ⨉ 120-city maps          altered_mst_tsp |   6616 ± 213 ( 6268 to  7010) |  0.050 secs/map | 30 ⨉ 120-city maps  repeated_altered_nn_tsp |   6402 ± 185 ( 6015 to  6779) |  0.701 secs/map | 30 ⨉ 120-city maps             ensemble_tsp |   6390 ± 184 ( 6015 to  6779) |  1.100 secs/map | 30 ⨉ 120-city maps

In [128]:

benchmarks(ensemble+[ensemble_tsp],Maps(10,250))

           altered_dq_tsp |   9750 ± 288 ( 9187 to 10167) |  3.052 secs/map | 10 ⨉ 250-city maps       altered_greedy_tsp |   9229 ± 261 ( 8723 to  9606) |  0.215 secs/map | 10 ⨉ 250-city maps          altered_mst_tsp |   9484 ± 142 ( 9190 to  9668) |  0.221 secs/map | 10 ⨉ 250-city maps  repeated_altered_nn_tsp |   9187 ± 194 ( 8785 to  9390) |  3.524 secs/map | 10 ⨉ 250-city maps             ensemble_tsp |   9153 ± 216 ( 8723 to  9390) |  6.959 secs/map | 10 ⨉ 250-city maps

So theensemble_tsp returns tours that are shortest, but the run time is slowest, as expected. It improves onrepeated_altered_nn_tsp by less than 1%.

Further Explorations¶

That's all I'm going to write for now. But there are still plenty of open questions for you to explore:

Branch and Cut: this is a technique to cut off a search early, when a partial solution is obviously not optimal. We saw how Held-Karp cuts off some permutations of cities when another permutation is better. A refinement on that is to keep track of, say, the best total length of the segment going through all the Bs cities. Then, any time you have a partial segment through some of the Bs cities that exceeds the best total, we can stop right there, before even finishing all the Bs. With this technique, you can find optimal tours for around 50 cities.
Linear programming: Lookup the topic "linear programming" and see how it applies to TSP.
Heuristic Algorithms: There are many approaches for using heurisitic estimates to find good (but not optimal) tours. For example,ant colony optimization algorithms make random choices of which edge to follow, and then the edges that occur in the best tours get reinforced with some virtual pheromones, and other ants tend to follow those pheromones.Simulated annealing takes its inspiration from metallurgy.
TheLin-Kernighan heuristic is one of the best.
TheChristofides algorithm gives a guarantee of 3/2 the optimal tour length (improving on the minimum-spanning-tree guarantee).
altered as a function: we defined a lot of one-line functions that just called another algorithm, and then callsalter_tour on the result. Can you write a function,altered(func), which takes a TSP algorithm as argument, and returns a TSP algorithm that does the original algorithm and then callsalter_tour?
Why doesmst produce an optimal result, whilegreedy_tsp does not, even though the two algorithms have similar structure in the way they iterate overshortest_edges_first?
The code in this notebook was designed for clarity, not efficiency. Can you make the code faster?
William Cook maintains a great page on the TSP.
William Cook also has adraft chapter on Discrete Optimization featuring TSP. His algorithms are in C, and he chooses a different set of algorithms to explore. I find his explanation very beautiful and concise.
If you are heavily into math, there's ataxonomy of solutions.
What else are you interested in?

Movatterモバイル変換

The Traveling Salesperson Problem¶

Understanding What We're Talking About (Vocabulary)¶

All Tours Algorithm:alltours_tsp¶

Representing Tours¶

Representing Cities¶

Representing Points and Computingdistance¶

Random Sets of Cities¶

Plotting Tours¶

All Non-Redundant Tours Algorithm (improvedalltours_tsp)¶

Complexity ofalltours_tsp¶