Tech Tuesday: Concurrency (Intro)

This will be almost the last Tech Tuesday post in the initial series on programming. The basic premise of this series was that programming is a lot like giving a person instructions for how to do something. As I had mentioned in the very first post of the series and then again in the recap, one challenge is that we may not be the only one providing instructions. And from every day life we know how hard it can be to be on the receiving end of this.

Imagine for a moment working in a matrix organization where you report to two different managers (eg the regional head for North America but also the global head for your product). There are at two canonical ways in which instructions from these two managers could cause problems for you. First, they could directly contradict each other as in the global product lead wanting you to charge a high price for profitability and the regional manager wanting you to charge a low price for rapid growth. Second, the amount of work that the two pile onto you could add up to more than you have time or other resources for.

In all the discussion of programming so far we have (conveniently) assumed that our program was the only one executing on the machine. In reality that is hardly ever the case. Instead we have multiple programs running concurrently. Now I will use that term somewhat loose view of what that exactly means because there are some subtle differences between two programs truly running at the same time (as in a machine with multiple CPUs or a multi-core CPU, or a set of machines that are coupled to each other) and programs executing one at a time but in an interleaved fashion, but these won’t matter for understanding the basic issues.

With concurrency the machine faces exactly the same problems as the poor employee in the example above. If one program wants to set a value to high and the other to low which value should be chosen? And if two programs together need more than the available memory, compute or input/output resources which one should get them? This can lead to interesting problems even if all the code that is running is our own code. For instance, in writing code for a web site we often write it as if it were handling one visitor at a time. That of course works perfectly well when you have relatively few visitors to the site as modern machines are amazingly fast.

But what happens if your site is on the home page of Reddit? Suddenly lots and lots of visitors show up roughly at the same time. One way your web server may deal with that is by running firing off multiple instances of your code. These might all request a database connection, require memory to run in, and so on. And pretty quickly more resources are being requested than the machine can make available. So this is an example of the second problem.

Now imagine that your site allows users to upvote or downvote a news story. When you have only the occasional visitor one at a time there are no problems. But as your traffic surges you may have many users nearly simultaneously clicking the upvote and downvote buttons. A naive implementation might look something like this:

If this is the only program running and only one copy of it is executing at the same time there is no issue and this will work just fine. But as soon as multiple copies of this are running concurrently we have an example of potentially conflicting instructions. Let’s say the votecount is currently at 10. If two copies are executing at the same time, then the following could happen. Let’s assume that copy 1 is slightly ahead. It retrieves a votecount of 10. Then copy 2 also retrieves a votecount of 10. Copy 1 has received an upvote and increases the votecount to 11. Copy 2 received a downvote and decreases the votecount to 9. Copy 1 stores its votecount and immediately afterward so does Copy 2. The new votecount is now 9. But it should be 10. The upvote was lost in the process. This is an example of conflicting instructions being improperly resolved.

Given the discussion of debugging and testing it should come as no surprise that concurrency tends to be the source of the most vexing bugs. When the code runs by itself everything is just fine. But certain constellations of simultaneous execution cause the bugs to surface! Next Tuesday I will talk a bit about how these problems arising from concurrency can be dealt with.