Subscribe to Continuations to receive new posts directly to your inbox.
Subscribe to Continuations to receive new posts directly to your inbox.
Today’s Tech Tuesday post will wrap up the algorithms mini-series. Last Tuesday I presented implementations of the same algorithm in five different languages. I had mentioned in an earlier post that the particular problem we are solving – seeing if a sequence of numbers contains a number that is repeated at least four times in a row – belongs to a class of problems known as pattern detection or pattern matching. Now there are much more general algorithms for this class of problems and instead of rolling our own, we may be able to use such a more general algorithm. Here is an example of doing just that for our problem using something known as regular expressions in Javascript:
That’s it! Whoa, where did all the code go compared to the previous implementations? Seems like the longest parts here are setting up the numbers and then displaying the result. All the work is done by just one tiny piece of code:
How is that possible? Here we are setting up our numbers as a string and then using a so-called regular expression – the /(\d+)\s\1\s\1\s\1/ – to match a pattern in the string of numbers.
I won’t explain here in detail how regular expressions work (another post) but think of them as a language for describing patterns. Saying \d in this language means, “match a digit”. Putting a + behind it means at least one but possibly more of what comes before (so here at least one but possibly more digits). Putting parentheses around the the \d+ means remember that pattern of digits (which is saying, remember this number). The \s matches any whitespace character and a blank here separates two numbers. Finally the \1 is what’s known as a back-reference. It says, match what you previously remembered. So the whole expression says: match a number and then see if that number occurs three more times immediately afterwards.
This is very efficient from a programming perspective as we had to write a minimal amount of code. Someone else implemented the matching algorithm for us and we are simply describing what we want matched. Since regular expressions are widely used their implementations tend to be both efficient and bug free. Those are good reasons for using a more general algorithm that someone else has implemented either as part of a language, or as a library, or these days as a web service.
That doesn’t mean we couldn’t potentially do better with our custom algorithm in certain cases. For instance, the trick above would be inefficient to use in a situation where the numbers are arriving over time as opposed to being known upfront. We would have to re-run the regular expression over the entire sequence of numbers every time, which would get quite slow if the sequence becomes long. Also, if the sequence of numbers consists of relatively long integers, the generalized engine of regular expressions (which operate on text) may be slower than the comparison of integers used in our custom approach.
Next week I will provide a brief recap that relates everything we have now learned about variables, data types, reserved words, control structures, functions, data structures and algorithms back to the questions raised in my original post on programming. We will also come back to the idea that programming is about building up ever more powerful constructs which make it easy to solve problems. In particular we will see that regular expressions are an example of something known as a domain specific language (here the domain is: patterns and pattern matching).
Bonus: there is actually a bug with this regular expression that will arise in certain cases. Can you see what they are and suggest how to fix the problem?