Tech Tuesday: Algorithms (Wrap Up)

Today’s Tech Tuesday post will wrap up the algorithms mini-series.  Last Tuesday I presented implementations of the same algorithm in five different languages.  I had mentioned in an earlier post that the particular problem we are solving — seeing if a sequence of numbers contains a number that is repeated at least four times in a row — belongs to a class of problems known as pattern detection or pattern matching.  Now there are much more general algorithms for this class of problems and instead of rolling our own, we may be able to use such a more general algorithm.  Here is an example of doing just that for our problem using something known as regular expressions in Javascript:

// numbers = "1 15 7 3 8 8 19 5 12 8 2 6 8";
numbers = "1 15 8 8 8 8 7 3 19 5 12 2 6";
if (numbers.match(/(\d+)\s\1\s\1\s\1/)) print("Found four times repeated number");
else print("No four times repeated number found");

That’s it!  Whoa, where did all the code go compared to the previous implementations?  Seems like the longest parts here are setting up the numbers and then displaying the result.  All the work is done by just one tiny piece of code:

numbers.match(/(\d+)\s\1\s\1\s\1/)

How is that possible?  Here we are setting up our numbers as a string and then using a so-called regular expression — the /(\d+)\s\1\s\1\s\1/  — to match a pattern in the string of numbers.

I won’t explain here in detail how regular expressions work (another post) but think of them as a language for describing patterns.  Saying \d in this language means, “match a digit”.  Putting a + behind it means at least one but possibly more of what comes before (so here at least one but possibly more digits).  Putting parentheses around the the \d+ means remember that pattern of digits (which is saying, remember this number).  The \s matches any whitespace character and a blank here separates two numbers. Finally the \1 is what’s known as a back-reference.  It says, match what you previously remembered. So the whole expression says: match a number and then see if that number occurs three more times immediately afterwards.

This is very efficient from a programming perspective as we had to write a minimal amount of code.  Someone else implemented the matching algorithm for us and we are simply describing what we want matched.  Since regular expressions are widely used their implementations tend to be both efficient and bug free.  Those are good reasons for using a more general algorithm that someone else has implemented either as part of a language, or as a library, or these days as a web service.

That doesn’t mean we couldn’t potentially do better with our custom algorithm in certain cases.  For instance, the trick above would be inefficient to use in a situation where the numbers are arriving over time as opposed to being known upfront.  We would have to re-run the regular expression over the entire sequence of numbers every time, which would get quite slow if the sequence becomes long.  Also, if the sequence of numbers consists of relatively long integers, the generalized engine of regular expressions (which operate on text) may be slower than the comparison of integers used in our custom approach.

Next week I will provide a brief recap that relates everything we have now learned about variables, data types, reserved words, control structures, functions, data structures and algorithms back to the questions raised in my original post on programming.  We will also come back to the idea that programming is about building up ever more powerful constructs which make it easy to solve problems.  In particular we will see that regular expressions are an example of something known as a domain specific language (here the domain is: patterns and pattern matching).

Bonus: there is actually a bug with this regular expression that will arise in certain cases.  Can you see what they are and suggest how to fix the problem?

Enhanced by Zemanta

Posted: 20th November 2012Comments
Tags:  tech tuesday programming algorithms

Tech Tuesday: Algorithms (Implementations)

Last week in Tech Tuesday, we continued our introduction to algorithms by looking at a first implementation.  As a reminder we are trying to find out whether a sequence of integers contains a number that is repeated four times in a row. Here is that Python code together with a bug fix for dealing with empty number sequences:

# numbers = [1, 15, 7, 3, 8, 8, 19, 5, 12, 8, 2, 6, 8]
numbers = [1, 15, 8, 8, 8, 8, 7, 3, 19, 5, 12, 2, 6]

counter = 1
if len(numbers) > 0:
    prev = numbers.pop()

while len(numbers) > 0 and counter < 4:
    next = numbers.pop()
    if prev == next: 
        counter = counter + 1
    else:
        counter = 1
        prev = next

if counter == 4: 
    print "Found four times repeated number"
else: 
    print "No four times repeated number found"

I will now show implementations in a number of different languages.  These have been written specifically to stay as close as possible to the original English language description of the algorithm.  Or put more starkly: these are not the best possible code in any of these programming languages (including the Python example).  There are ways to solve the problem that are more idiomatic for the respective language.  But for now the main goal is to show that understanding which algorithm solves a problem is more important than any particular language.

From Python I decided to go to Javascript next.  Obviously that required some syntactic changes, such as adding var to define variables, semicolons at the end of lines and parentheses to enclose blocks of code (the first two of these are somewhat optional in Javascript)

# var numbers = [1, 15, 7, 3, 8, 8, 19, 5, 12, 8, 2, 6, 8];
var numbers = [1, 15, 8, 8, 8, 8, 7, 3, 19, 5, 12, 2, 6];

var counter = 1;
var index = 0;
var prev, next;
if (numbers.length > 0) { 
    prev = numbers[index++];
}

while (index < numbers.length && counter < 4) {
    next = numbers[index++];
    if (prev == next) { 
        counter++;
    }
    else {
        counter = 1;
        prev = next;
    }

if (counter == 4) { 
    print("Found four times repeated number");
}
else { 
    print("No four times repeated number found");
}

You can see this code running here.  In addition to the syntactic changes, because Javascript doesn’t have a list data type, I chose to use an array which required introducing an additional index variable to let us move through that array (note: arrays are available in Python as well of course and so we could go back and make the Python code work that way as well).

From Javascript, the next language I decided to use was PHP.  That turns out to be an easy change: just switch the var declarations to a $ preceding the variable name and use array() to initialize the number sequence (run it)

# $numbers = [1, 15, 7, 3, 8, 8, 19, 5, 12, 8, 2, 6, 8];
$numbers = [1, 15, 8, 8, 8, 8, 7, 3, 19, 5, 12, 2, 6];

$counter = 1;
$index = 0;
$prev, next;
if (count($numbers) > 0) { 
    prev = $numbers[$index++];
}

while ($index < count($numbers) && $counter < 4) {
    $next = $numbers[$index++];
    if ($prev == $next) { 
        $counter++;
    }
    else {
        $counter = 1;
        $prev = $next;
    }
}

if ($counter == 4) { 
    print("Found four times repeated number");
}
else { 
    print("No four times repeated number found");
}

Next up I decided to provide an implementation in C.  I went back and based that on the Javascript code because most of what I had to do was replace the var declarations with int (see the discussion of data types).  Here is the C code (run it)

int numbers[13] = {1, 15, 7, 3, 8, 8, 19, 5, 12, 8, 2, 6, 8};
/* int numbers[13] = {1, 15, 8, 8, 8, 8, 7, 3, 19, 5, 12, 2, 6}; */
   
int counter = 1;
int index = 0;
int prev, next;
int length = sizeof(numbers)/sizeof(int);

if (length > 0) { 
    prev = numbers[index++];
}
 
while (index < length && counter < 4) {
    next = numbers[index++];
    if (prev == next) counter++;
    else {
        counter = 1;
        prev = next;
    }
}
 
if (counter == 4) 
    printf("Found four times repeated number");
else 
    printf("No four times repeated number found");

The only other salient change is that in C we have to explicitly state up front how many numbers we will be looking at when we initialize the array.  I then use sizeof to find the length of that array so that if the array were to come from some other initialization process the remaining code would do just fine.

So far the syntax and reserved words of the languages used are amazingly similar and our code looks and feels much the same.  Again that’s because I purposefully avoided idioms from the respective languages.  Now just for the fun of it here is an implementation of this algorithm in Lisp.  This is written in completely non-idiomatic Lisp and without obvious shortcuts to follow the English language outline closely. I want to bring home the point that an understanding of how computation works is far more important than learning any one particular language.  So here is the Lisp code (run it)

; (setq numbers '(1 15 7 3 8 8 19 5 12 8 2 6 8))
(setq numbers '(1 15 8 8 8 8 7 3 19 5 12 2 6))

(setq counter 1)
(if (> (list-length numbers) 0) 
    (progn (setq prev (car numbers)) 
           (setq numbers (cdr numbers))))

(loop
    while (and (> (list-length numbers) 0) (< counter 4))
    do
        (setq next (car numbers))
        (setq numbers (cdr numbers))
        (if (eql prev next) 
            (setq counter (+ counter 1))
            (progn (setq counter 1) (setq prev next))))

(if (eql counter 4)
    (print "Found four times repeated number")
    (print "No four times repeated number found"))

It will take a second to get past the syntax of parenthesis and the somewhat odd keywords (car and cdr? see if you can figure out from the code what they do).  But then the code is actually is quite easy to follow because it mimics what all the other code examples do closely.

In upcoming posts we will learn more about different ways to approach this same problem.  By that I mean both other algorithms and more idiomatic implementations.  The key take away though for now should be that for a company to say that they are looking for a “python programmer” is a lot like saying they are looking for a “Black & Decker furniture maker.”

Enhanced by Zemanta

Posted: 13th November 2012Comments
Tags:  tech tuesday programming algorithms implementation

Tech Tuesday: Algorithms (Introduction)

We have covered a lot of ground in the previous Tech Tuesday posts on programming.  Now it is time to return to the original definition that I provided in the introductory post, where I wrote that programming is “telling a computer what to do.”  We have all the pieces we need:  we have covered the syntax and semantics of programming languages;  we have learned about variables and the data types they can hold;  we have looked at reserved words and the control structures we can form with them; we have seend how to define functions and pass arguments to them; and in the last six posts we have encountered a variety of different data structures. So how do we put all of these together to tell the computer to do something?

What we need is to define a proper sequence of steps that leads to the desired outcome.  In cooking we would call that a recipe. In programming we call it an algorithm, a word that comes from the Latin transliteration of the last name of the Persian mathematician Muḥammad ibn Mūsā al-Khwārizmī.  He wrote one of the first books on Algebra and provided detailed steps for solving quadratic equations.  Or as we would say today: he specified an algorithm for solving quadratic equations. It can be a big sounding word but it really is akin to a recipe.  If you follow a sufficiently detailed recipe (and do it right) the food will come out (roughly) the same every time.  Ditto for an algorithm.

Enough with the abstract, let’s go to a concrete example.  I encountered this problem over the past weekend when I was talking to a young student taking a computer science class.  Imagine you are given a sequence of integers and you are supposed to detect whether it contains an instance of the same integer at least four times in a row.  So consider the following two sequences:

1 15 7 3 8 8 19 5 12 8 2 6 8
1 15 8 8 8 8 7 3 19 5 12 2 6

The first sequence contains no such instance (it has the number 8 repeated twice but that’s it).  The second sequence has an instance of the number 8 repeated four times.  This may seem like a contrived problem but there are actually many real world applications that have roughly this structure.  Imagine for instance a production line that makes different parts and where making the same part repeatedly could be a sign of a problem with a machine that’s stuck.

Speaking of stuck, the student I was talking to was a bit stuck on some specifics of the programming language (Java) which he was supposed to use to solve this problem.  I explained that it’s a better idea to first express the algorithm in plain English and only then think about implementing it in a particular language.  So here is a first cut at it:

Start counting at 1.
As long as the next number in the sequence is the same as the previous one keep counting up. 
If your count gets to four stop. 
If the next number is not the same as the previous one restart counting at 1.

This is probably good enough for a human to follow but lacks a fair bit of detail for translating it into computer code.  This is where we start thinking about all the parts from above and how we might use them.  First, we will need data structure to hold our sequence of numbers.  We could use an array or a list for that.  Second, we need a variable to serve as our counter.  Third, we need to repeat going over the numbers until we are either done with them or the counter gets to four, so we need a loop and a conditional as our control structures.  So here is a second cut at the algorithm

Put the numbers in a data structure
Set counter to 1
Get the first number (which will be our previous number)
While there is a next number
    Get the next number
    If the next number is the same as the previous number increment the counter
    Else
        Set counter to 1
        Use the next number as the new previous number

This way of writing down the algorithm is beginning to look a lot like a computer program.  But it is not written in a specific programming language.  That’s a critical insight about algorithms — they are language independent recipes for getting things done.  Or put differently, an algorithm like the one above can be implemented in any programming language (as long as it is Turing complete, which is true for any general purpose language).

Next Tuesday we will first fix an important omission in the algorithm above (can you state what it is?). Then we will learn about something known as pseudo code and wind up writing our first implementation (I am open to suggestions for which language to use first).  In subsequent posts we will look at a bunch of different languages and learn about the performance of different algorithms.

Enhanced by Zemanta

Posted: 23rd October 2012Comments
Tags:  tech tuesday programming algorithms