Last Tech Tuesday was now two weeks ago due to Hurricane Sandy. As a quick refresher, we looked at where the word algorithm comes from, saw that an algorithm is similar to a recipe and gave an example of a simple algorithm for finding a repeated number in a sequence of numbers. Today we will take a deeper dive on this particular algorithm and see at least one specific implementation. All of this is still part of a by now fairly long running series of posts on programming.
So revisiting the algorithm from last time it was supposed to detect whether a sequence of integers contains the same number at least four times in a row. I presented a candidate algorithm and then pointed out that it still had a mistake. Several people sent me email correctly identifying that the algorithm didn’t actually stop once it found four numbers in a row. Here then is the corrected version:
Put the numbers in a data structure Set counter to 1 Get the first number (which will be our previous number) While there is a next number *and* the counter is less than 4 Get the next number If the next number is the same as the previous number increment the counter Else Set counter to 1 Use the next number as the new previous number If the counter is 4 Display that a four times repeated number was found Else Display that no four times repeated number was found
Before we go on and turn this into code in a bunch of different programming languages there is a lot that can be learned just from looking at this algorithm so far.
First, we are solving a very specific problem with this algorithm. We are simply determining whether or not a number is repeated four times. We are not even displaying which number it is (how would you do that?) or where in the sequence it was found. Nor would we discover if there was a second sequence (because we stop after the first one). And for the case of finding multiple repeated sequences what about overlapping ones? E.g. does 3111115 contain one sequence of four 1s followed by a single 1 or two overlapping sequences of four 1s as in “31111” and “11115”? So it is always extremely important to understand the details of the problem because they may heavily influence the algorithm that works.
Second, we are doing no testing of inputs or examination of edge cases here. If instead of numbers the input that was supplied was a sequence of words our algorithm would not detect that. Sometimes it is OK to assume that the inputs will take the form stated in the problem. In other domains input validation must be part of the algorithm for it to succeed. As far as edge cases go, what about a sequence of length zero (ie containing no numbers at all)? How would our algorithm perform there? As we will see for our currently specified version there may be a problem (can you spot it?).
Third, even though we stated at the beginning to put the numbers into a data structure, we never specified which one. Instead we simply assumed that whatever data structure it was, it has the capability of determining whether there is a next number and if so giving us that next number. In fact, this algorithm would even work with an “online” stream of numbers. When we say “online” in an algorithm we generally mean as opposed to “batch,” meaning where we have all the numbers at the start. If we were being fed the numbers one by one from the keyboard or some data source (such as a sensor at a production line) the algorithm above would work. This is an example of the power of abstraction. We don’t need to know the specifics of the dat structure just that it supports specific operations.
Fourth, we can see from this algorithm without needing to implement it in any particular language that it is quite efficient. As we will discuss at some point in the future what we are particularly interested is the time and space efficiency of algorithms. By time we mean how long it takes to rund and by space we mean how much storage it consumes. In the example above we are going over the sequence of numbers only once (and less than that if a repetition is found). We say that the algorithm has linear running time, meaning that the increase in time is linear to the increase in the size of the problem. It also takes up very little space because in addition to the sequence itself it only has two additional variables.
Fifth, how did we come up with this algorithm? It seemed rather ad hoc — we simply tried something. In general that’s not a good approach to finding algorithms. Usually we try to determine what class of problems our specific problem belongs to and then select from one of several known algorithms instead of coming up with our own. As we will see in a subsequent post, this problem can be seen to belong to a class known as “pattern matching.” We are looking for a pattern of four repeated numbers. There are a variety of different pattern matching algorithms and we can draw on them to solve the problem. Using a known algorithm means that maybe someone else has already implemented it (or possibly it is even part of the programming language) and is available as open source or as a library.
All of these five points are important to take into consideration whenever you think about algorithms and I am planning to explore them in more detail at some future point. In the meantime though, as long promised, here finally is the first implementation of the algorithm in Python:
# numbers = [1, 15, 7, 3, 8, 8, 19, 5, 12, 8, 2, 6, 8] numbers = [1, 15, 8, 8, 8, 8, 7, 3, 19, 5, 12, 2, 6] counter = 1 prev = numbers.pop() while len(numbers) > 0 and counter < 4: next = numbers.pop() if prev == next: counter = counter + 1 else: counter = 1 prev = next if counter == 4: print "Found four times repeated number" else: print "No four times repeated number found"
One of the reasons I picked Python first (and also one of the reasons why people like Python) is that the code looks very clean and in fact amazingly close to our “pseudo code” version of the algorithm. The only thing that takes maybe a bit of explanation is that in the while loop we test for len(numbers) which gives us the length of the list of numbers. If that’s greater than 0 than it means there are still numbers for us to look at.
Another nice thing about this Python implementation is that because of the way Python handles data types, this code actually works not just with numbers but also with say words (as you can see here). Some will consider this a feature and others a bug. Speaking of bugs, I should point out that there are two things wrong with this implementation. One is that it doesn’t handle the edge case of an empty list of numbers (how would you fix that?). The other is not technically a bug but it does something that’s quite different from how you may have envisioned this algorithm working (here is a hint). Consider both of these as tests. Next time we will look at implementations of this same algorithm in other languages.