Last Tech Tuesday was now two weeks ago due to Hurricane Sandy. As a quick refresher, we looked at where the word algorithm comes from, saw that an algorithm is similar to a recipe and gave an example of a simple algorithm for finding a repeated number in a sequence of numbers. Today we will take a deeper dive on this particular algorithm and see at least one specific implementation. All of this is still part of a by now fairly long running series of posts on programming.
So revisiting the algorithm from last time it was supposed to detect whether a sequence of integers contains the same number at least four times in a row. I presented a candidate algorithm and then pointed out that it still had a mistake. Several people sent me email correctly identifying that the algorithm didn’t actually stop once it found four numbers in a row. Here then is the corrected version:
Before we go on and turn this into code in a bunch of different programming languages there is a lot that can be learned just from looking at this algorithm so far.
First, we are solving a very specific problem with this algorithm. We are simply determining whether or not a number is repeated four times. We are not even displaying which number it is (how would you do that?) or where in the sequence it was found. Nor would we discover if there was a second sequence (because we stop after the first one). And for the case of finding multiple repeated sequences what about overlapping ones? E.g. does 3111115 contain one sequence of four 1s followed by a single 1 or two overlapping sequences of four 1s as in “31111” and “11115”? So it is always extremely important to understand the details of the problem because they may heavily influence the algorithm that works.
Second, we are doing no testing of inputs or examination of edge cases here. If instead of numbers the input that was supplied was a sequence of words our algorithm would not detect that. Sometimes it is OK to assume that the inputs will take the form stated in the problem. In other domains input validation must be part of the algorithm for it to succeed. As far as edge cases go, what about a sequence of length zero (ie containing no numbers at all)? How would our algorithm perform there? As we will see for our currently specified version there may be a problem (can you spot it?).
Third, even though we stated at the beginning to put the numbers into a data structure, we never specified which one. Instead we simply assumed that whatever data structure it was, it has the capability of determining whether there is a next number and if so giving us that next number. In fact, this algorithm would even work with an “online” stream of numbers. When we say “online” in an algorithm we generally mean as opposed to “batch,” meaning where we have all the numbers at the start. If we were being fed the numbers one by one from the keyboard or some data source (such as a sensor at a production line) the algorithm above would work. This is an example of the power of abstraction. We don’t need to know the specifics of the dat structure just that it supports specific operations.
Fourth, we can see from this algorithm without needing to implement it in any particular language that it is quite efficient. As we will discuss at some point in the future what we are particularly interested is the time and space efficiency of algorithms. By time we mean how long it takes to rund and by space we mean how much storage it consumes. In the example above we are going over the sequence of numbers only once (and less than that if a repetition is found). We say that the algorithm has linear running time, meaning that the increase in time is linear to the increase in the size of the problem. It also takes up very little space because in addition to the sequence itself it only has two additional variables.
Fifth, how did we come up with this algorithm? It seemed rather ad hoc – we simply tried something. In general that’s not a good approach to finding algorithms. Usually we try to determine what class of problems our specific problem belongs to and then select from one of several known algorithms instead of coming up with our own. As we will see in a subsequent post, this problem can be seen to belong to a class known as “pattern matching.” We are looking for a pattern of four repeated numbers. There are a variety of different pattern matching algorithms and we can draw on them to solve the problem. Using a known algorithm means that maybe someone else has already implemented it (or possibly it is even part of the programming language) and is available as open source or as a library.
All of these five points are important to take into consideration whenever you think about algorithms and I am planning to explore them in more detail at some future point. In the meantime though, as long promised, here finally is the first implementation of the algorithm in Python: