Tech Tuesday: Data Types (Part 1)

Since it’s been a couple of weeks first off a quick reminder that we are in the middle of a Tech Tuesday cycle on programming. We left off learning about literals, constants and variables. If you don’t remember, now is probably a good time to briefly look back at it. As you do, you will notice that the examples involve both numbers, e.g. 42, and text, e.g. “Bill Gates.” In the last post these appear as literals and as values of named constants or variables.

As humans we bring a lot of knowledge with us that makes us recognize numbers and text as being different “types” of data. For instance, we know that we can do calculations with numbers. We immediately recognize an expression such as 35 + 7 and can calculate the result as 42. Similarly, we know that text has certain characteristics, such as consisting of a known number of characters (true even for a language such as Mandarin). For example “Bill Gates” consists of two separate words and a total of 10 characters if you count the blank space between the two words.

Now if you have been following Tech Tuesday from the beginning you will have learned that inside a computer both numbers and text are represented using the binary system. If not, you should go and quickly read that. You will see how numbers, texts, images, etc. are all represented by sequences of bits (0s and 1s). So here then comes the big question: how does the computer know what’s a number and what is text and what it can do with each?

Consider the following bit of Javascript code to really see the issue:

As humans reading this code we see there is a potential problem here. We are creating two variables one containing a number (a) and the other containing text (b) and then are trying to create a third variable by “adding” the values of the first two. But inside the computer each of these is just a sequence of bit. So what should this code do? What will this code do? Note that these are two different questions. The first is normative (“should”) and the second is descriptive (“will”). Also note that this code has correct syntax, and what we are really investigating are the semantics (what does it mean?).

Let’s start with the descriptive one which we can answer by simply running this code in a web browser or some javascript console. If you do this you will find that the output is “5some short words” – what happened? The computer in executing this code somehow decided to turn the number 5 into the character “5” and then combine that with the text “some short words” into “5some short words”.

Now let’s take a look at the normative question instead. What “should” have happened? That turns out to be at the root of some very fierce arguments between people over the years. Before we get there, let’s try to make a list of possible outcomes:

1. The computer could throw some kind of “I don’t know what to do here” error when executing the program as it gets to the part where we try “var wtf = a + b;”.

2. The computer could present this error after “reading” the program but before “executing” it (that assumes the computer has an opportunity to analyze the code before executing it).

3. The computer could try to “add” the two in some fashion as it did in Javascript. This requires (a) some kind of conversion and (b) some interpretation of the meaning of “add.” For instance, the result could be 21 instead (how?).

Data types in programming are all about which of these three possibilities is chosen in practice. Different programming languages and in some cases even different ways of using the same programming language will result in different outcomes being realized.

I feel this post is getting long, so you will have to wait until part 2 (hopefully next Tuesday) to learn more about how and why this issue gets resolved differently! In the meantime, I hope this provides a very different angle on data types than the usual clinical enumeration of booleans, integers, strings, etc.