Tech Tuesday: Data Structures (Lists, Trees, Graphs)

As previously promised, this will be the last Tech Tuesday post on Data Structures for a while and starting next week we will return to other aspects of programming. Much of what we have seen in the previous data structure posts are structures that are built into the respective programming languages, such as structs in C, associative arrays in PHP, JSON in Javascript, objects in PHP and multi-dimensional arrays in C. Today we will take a look at rolling our own data structures that let us set up such things as linked lists, binary trees, graphs and more.

We will stick with our example of points. Let’s imagine we want a structure that lets us put together a list of points that together make up a shape. For each point we want to be able to access the next point in the shape. Here is a bit of Python code that gives us a data structure to accomplish that

And here is how we could use this kind of point in our subsequent code

You can just fire up Python from your terminal on a Mac or see this little bit of code running here.

Why and how does this data structure work? Each point is an object that as one element has a field refers to the next point in the shape. We make the connection to the next point by calling the append method on an existing point with the next point as the argument. The critical piece is the assignment “self.next = point” inside the append method. That does not create a copy of the point but instead makes self.next refer to the same object as point does. You can verify that by trying out the following (assumes you have executed the previous code):

You will see that changing the x coordinate of point b also changes it for the next point referred to by point a because this is one and the same object.

As it turns out Python has a fair bit of support for so-called collections, which are container data structures that we can use instead of having to roll our own. So instead of the above code, we could have used a deque to hold our points. Built-in collections are likely to be faster than anything we come up with but there is not always a built-in collection that meets our needs. The method of storing references to objects shown above could for instance be used to build a tree of points or an arbitrarily interconnected graph of points.

In programming then a good bit of time should be invested in figuring out the right data structure to use. That could be a built-in one or one that we create ourselves to fit our specific problem. But data structures alone don’t make a good program. We also need ways to insert data, retrieve it, manipulate it, etc. For instance if we have a list of points, how do we figure out whether they form a closed shape? Coming up with a series of steps to solve such a problem is called and algorithm and that is what the upcoming posts will be about.