Tech Tuesday: Data Structures (Maps and Hashes and Dictionaries, oh my!)

Last Tech Tuesday we started digging into Data Structures by examining records and structs. We saw that these allow us to group attributes belonging to something like a person or a product or a geometric point but generally don’t keep the metadata around when the program is executing. Metadata means data about data and here would be the names of the attributes (and potentially their respective data types). All we have access to are the attribute values.

There are other data structures that provide us more information at runtime. There is a group of these which we will look at today that goes by different names depending on the programming language. You may encounter them as a map, a dictionary, a hash, an associative array, and probably a few more. Let’s start with a simple piece of PHP code that illustrates the idea by reprising our example from last time:

We can see an immediate big differences here from last week’s C code. We are not actually declaring a data type at all. We are simply assigning something directly to variable a. That something is an associative array which is created by a call to the function array, which takes as its inputs so called key-value pairs. The keys here are “x” and “y” and the values are 3 and 2 respectively. We can access the value stored for a given key by using the notation $variable[“keygoeshere”].

Now how would we go about defining a distance function? Here is the PHP code

which you can see in action in this simple example. Since there is no point data type here the two arguments to the function are simply variable names. That means that static inspection of the program code alone wouldn’t be able to catch a bug such as making the following call distance(“I am not a point”, “neither am I”) which of course would not work.

So far it sounds like we have only lost things (data type checking of the code) and have wound up with slightly uglier notation. Why would we ever want to do this? Here is the first reason. Let’s say we later decide that we want to keep track of the color of points, we can simply write the following:

Our point a now has a new attribute called “color” with the value “blue." This extra attribute only exists for point a and point b does not (yet) have it. We were able to do this without changing any of the existing code and everything will execute just fine. In the C example we would have had to change our definition of the struct and every new point would make room for a color attribute whether or not we use it.

A second advantage of this type of data structure is that we can examine which attributes are present at runtime. For instance, if our code dealt with both 2D and 3D points, where 3D points have a "z” attribute, we can determine if a particular point has the “z” attribute as follows

Of course we might have by accident stored a string in the z attribute and the code above wouldn’t detect that but we could add a check to make sure it contains say an integer. One place we might want to use the ability to examine points is in our distance calculation. If we have two 3D points we would use a different formula than for two 2D points.

Associative arrays, maps, dictionaries, whatever they may be called in a particular language have many uses other than keeping attributes for things together which we will maybe get around to examining in the future. Next Tuesday we will look at a particular way of expressing associative arrays called JSON which is short for JavaScript Object Notation. As the name suggests this originated with JavaScript but is now used much more widely.