Tech Tuesday: Data Types (Part 3)

This will be the third and final part of talking about data types as part of the programming cycle on Tech Tuesdays. Part 1 introduced the basic issue. Part 2 went under the hood and showed how explicit type declarations are one way to solve the problem. Today we will look at another approach and none of this will make much sense if you haven’t read the first two parts.

Last week I showed how an address table can be used to keep track of what the current value of a variable is. I then also showed how these values might be represented in memory. At the time I did not point out that in the early days of computing memory was incredibly expensive. That in turn explains why back then it made sense to use as little memory for each value as possible. So if you knew that a variable would only take on values between 0 and 255, you should explicitly declare that variable as a byte (or a char) in your program. For instance, the Apple II on which I learned most of my early programming had 48 KB of main memory.

Fast forward to today where memory is hardly ever a constraint (for instance, I am writing this on MacBook with 4 GB of memory!). Now we no longer need to worry much if at all about memory as a constraint. So we can afford to store a lot of meta data (i.e. data about data) along with the actual data. For instance, we can include information about whether something is text or a number. For text we can include how long the text is and what kind of encoding it uses. For numbers we can include their precision or even their constituent parts (e.g. for rational numbers keep the numerator and denominator).

So let’s revisit the canonical problem from Part 1 which gave rise to our whole discussion of data types:

When this program gets to the wtf = a + b step, it can now “inspect” each variable and retrieve information about the data type of the current value for that variable. That information can then be used to decide how to carry out any conversions that may be necessary. As we saw in Part 1, Javascript takes the number 5 and turns it into the 1 letter text “5” and then simply appends the “some short words” to that.

We have now encountered two fundamentally different ways of resolving the problem. One was to include data type information in the program itself and use it at “compile time” (when translating into machine code) and the other is to keep the data types with the data and have it available at “run time” (when the program executes). As it turns out those two methods are not mutually exclusive. In fact we can think of a 2 x 2 matrix of possibilities as follows (flattened out into a table) that distinguish different programming languages from each other

Types in Code Types with Data Programming Languages
Yes Yes Java, C#
No Yes Perl, Python, Ruby, PHP, Javascript
Yes No C, Pascal
No No – not possible –

Only the fourth combination is not possible at all because there is no mechanism to resolve the problem we identified. Now some people would say that Assembly language should go there, but that’s really not quite the right comparison because Assembly has no concept of the kind of higher level operations that would result in data type issues of the kind we have discussed throughout these posts.

Now you may have heard before about “strongly typed” versus “weakly typed” languages. Strongly typed languages enforce a lot of rules about data types whereas weakly typed ones do not. We can now begin to understand that the reality is more complicated because some languages have data type information only at compile time, some only at runtime and some at both. The degree of “type” enforcement and/or automatic conversion that is possible depends on what is available. For instance C can only check at compile time, so for any data type problems that cannot be detected then (and there are many), C programs may simply not work as expected.

Conversely, just because a language has run time information about data types available does *not* mean that it has to do automatic conversion. It could simply decide not to carry out the operation. PHP does a lot of automatic type conversion, Python does very little. We will revisit the topic of implicit conversion once more after we have learned about objects and object oriented programming, but to give you a preview you may want to read briefly about the wonderfully named “duck typing.”

That wound up being a lot more about data types than I originally thought I would write, but I felt that most introductions to data types that I have seen out there simply enumerate what’s available and state the rules for a particular language but provide no background as to why data types even exist and how to think about them at a higher level. Next week I think I will tackle reserved words and control structures.