Today on Tech Tuesday we are diving into Data Structures. The idea of a data structure that combines multiple attributes to describe a thing, goes back as far as the 1960s and possibly earlier. Both the COBOL and ALGOL programming languages already had this concept and IBM’s mainframe operating systems such as DOS/360 had built in storage support (in form of ISAM files). At the time most computing was super pragmatic and motivated by solving business problems – in fact COBOL stands for COmmon Business-Oriented Language. So it was natural to have a programming construct that allows bringing a customer name and address together or say the dimensions, weight and material of a product.
A term commonly used at the time to describe a collection of attributes that together make up a thing was a “record” (a term that lives on in SQL databases). Another term used was a “struct”, which was the keyword used in ALGOL and later in the C programming language. Let’s take a really simple example. Suppose we want to write a program that deals with pixels on the screen or points in a graph. For each point we need to keep track of its x and y coordinates. In C we would do that with a struct as follows:
where “typedef” is a keyword used to define new data types. Here our new data type is called “point” and has two coordinates – x and y – which are both integer values. We can now create a couple of points (comments in C start with /* and end with */):
What this demonstrates is that we can refer to a point as a whole instead of having to pass around its components separately. That for instance allows us to write a function distance that takes two points as its arguments and returns their distance as follows:
This is getting us closer to the level of abstraction that we are aiming for. We can now write code that uses points and can hide dealing with the individual coordinates inside of functions such as distance defined above. You can see a running example of this here.
Now if you have been following along with Tech Tuesday and in particular read the three previous posts on Data Types you will know the question that we will want to ask. Does the “point” data type exist only in the code or is it stored with the data? As we have seen with the C programming language before, this type information really only exists in the code. What that means is that the program once it runs has no way of determining if a variable is of type point. In fact, even the labels that we used for the coordinates (x and y), only exist in the code. They are not kept with the data. The C struct instead packs the two integers right next to each other in memory. That is very efficient in terms of memory usage and also every access is done simply by an offset to the beginning of the structure.
Next week we will look at alternative implementations of defining “things” that are more dynamic and carry more information with them when the program executes. Your takeaway today though should be that the ability to group attributes and put them into a type has been around for a very long time.