Last Tech Tuesday we got started on data types as port of the ongoing cycle on programming. This post will not make sense without reading Part 1, so if you haven’t you should go back and do that now. We encountered the problem of “adding” two variables where one holds a number and the other holds some text. Today we will dig a bit deeper into this problem.
Let’s start by looking at the question what it even means for a variable to hold a value. As I explained in the post on literals, constants and variables, the variable is a name that we can use to refer to its value and the value can change as our program executes. At the heart of any way of doing this will eventually be an address in the computer’s memory. It’s at that address that we find a binary representation of the data in question. For instance, for text we might find a sequence of bytes where each byte represents a character encoded using ASCII with eventually a byte containing 0 (null) to mark the end of the text (that in any case is how it worked until we needed multi-byte characters for internationalization). For a whole number (an integer) we might simply find the binary version of that number.
Separately, we might have a kind of internal “phone book” or address table where for each variable name we keep track of the address in memory at which the value for that variable lives. So coming back to our example from Part 1 where we had:
var a = 5; var b = "some short words";
The “phone book” here would have two entries. One for the name “a” and one for the name “b” roughly as follows
a : 1000 b : 1002
Where 1000 and 1002 are addresses in the computer’s memory. When we look at the actual memory locations, we might see something like the following
1000 : 00 <- first byte of number 5
1001 : 05 <- second byte of number 5
1002 : 115 <- ASCII for "s"
1003 : 111 <- ASCII for "o"
1004 : 109 <- ASCII for "m"
1005 : 101 <- ASCII for "e"
1006 : 32 <- ASCII for " " (a blank space)
1007 ... 1015 : more ASCII codes
1016 : 100 <- ASCII for "d"
1017 : 115 <- ASCII for "s"
1018 : 0 <- NULL
So in this setup neither the address table nor the actual memory locations contain any information about what kind of data is held there. How could we then possibly have a computer program that knows what’s there and how to properly deal with it?
int a = 5; char b = "some short words";
The “int” in front of the “a” says that the name “a” will refer to an integer value. And the “char” in front of “b” says that “b” will refer to text made up of characters with the “” following the “b” saying there is more than one single character in “b”. When the computer takes our C program and turns it into machine code executable by the CPU, it can now make sure to use instructions that work for integers when dealing with variable “a” and instructions that work for sequences of characters when dealing with “b”.
var wtf = a + b;
In C, we would have to decide up front in our code whether we want “wtf” to be a number (int) or text (char). Let’s say we wrote
int a = 5; char b = "some short words"; char wtf = a + b;
leaving aside for the moment that this is *not* correct C we can see easily now how the computer has all the relevant information in the code of our program to do one of two things:
1. It could refuse to turn this program into machine code and instead complain that it does not know to combine an integer with a sequence of characters.
2. It could infer from our declaration that we want “wtf” to be a sequence of characters that it should turn anything that follows into a sequence of characters.
As it turns out, a so called C compiler, the program that turns a program written in C into machine code would choose option 1. In fact, in order to get this to work in C we need to use an explicit expression to both convert the number and combine it with the text.
int a = 5; char b = "some short words"; char wtf; sprintf(wtf, "%d%s", a, b);
The sprintf stands for something like “string print formatted”. We give it the variable where we want the combined text to go (“wtf”), then a a format - the “%d%s” which means a number followed by some text - and finally the variables holding the number and the text. Also note that we had to pick an explicit length for “wtf” to make this work.