Subscribe to Continuations to receive new posts directly to your inbox.
In the last two Tech Tuesdays, I introduced the topic of computational complexity. Today as promised I want to dig a bit deeper into the question of best, average and worst case complexity. The canonical example used for this is sorting. For simplicity we will assume that we are sorting a list of numbers. Several different sorting algorithms have been developed over the years that go by fun names such as Insertion Sort, Quicksort and even Bubble Sort (here is a past Tech Tuesday three-part introduction to algorithms).
On the wonderful Internet someone has created an amazing web site that illustrates the different sorting algorithms at work. You should head over there now and check it out. You will notice that the site distinguishes between random data, nearly sorted data, reversed data and data with few uniques (meaning there are not that many distinct values). So what’s the best algorithm?
There is no single dominant sorting algorithm and that’s because different algorithms have different best, average and worst case characteristics. For instance, a very widely used sorting algorithm is Quicksort which takes a “divide and conquer” approach. The basic idea is to pick a pivot element and then create two sublists – one with all the elements great than the pivot and one with all the elements smaller. The pivot sits (by choice) right between those two sublists. Each of these lists is smaller than the original list and we can just apply Quicksort to them until we encounter lists of size 1 which are always sorted.
Best case and average case performance for Quicksort both behave as n log n meaning that if you have n items to sort then the growth rate of the time required is roughly n log n. The worst case performance of Quicksort though is n ^ 2. Now you might be tempted to say n ^ 2 doesn’t seem that much worse than n log n but you would be mistaken. Here is a table that illustrates the difference between the two
As you can see it starts out innocently enough, but when you get to 100,000 items (which is not a big number on say the Internet), the worst case is already 20,000 times worse than the average case.
Now here is something particularly intriguing. The worst case for some naive implementations of Quicksort turned out to be an already sorted list! Various improvements have been made to Quicksort over the years and there are sorting algorithms, such as Heapsort, that have n log n as their worst case.
I hope this example already makes it clear why understanding the differences between average and worst case performance can matter a lot. There are examples where the worst case is even worse than above. For instance, the simplex algorithm used for solving linear programs has polynomial time average complexity. But people have been able to construct worst cases that have exponential complexity (and in case you missed it, that is bad, bad news). Thanks to Oliver Friedmann, co-founder of Ziggeo, for pointing out the simplex example to me (Oliver has authored many papers in this field with important results).
If you are still not convinced that this is an important matter, you should think about designing systems for the Internet where you do not control the inputs and where you may be facing adversarial opponents. If your worst case complexity grows too fast you may wind up expending way more resources than you had anticipated. Conversely, in cryptography if you are basing things on worst case but average case turns out to be much simpler you might have system that is easier to break than you imagined.
Over 100 subscribers
Collect this post as an NFT.