Last Tech Tuesday, I introduced programming languages by making a series of analogies to human languages. We will continue the programming cycle today by deepening that analogy with regard to syntax. Syntax is the set of rules that describe how you can put together the words of a language to form a sentence. For instance, part of the syntax of English is that we separate words with a space insteadofputtingthemrightnextoeachother. Another part of the syntax of English is that we end sentences with a period, exclamation mark or question mark instead of say a comma, We then start the next sentence with a capital letter. Some of these syntax rules are fairly obvious others are a bit more difficult such as word order. In English we generally use subject, verb, object in that order instead of putting us the verb first.
Why do languages have syntax? As the examples above illustrate you were able to read even those parts where I purposefully violated the very syntax rule I was describing. But those parts were more difficult to read – you had to make more effort to find the meaning. In fact, consider the following example: “Eats shoots and leaves” and compare it to “Eats, shoots and leaves.” The former is likely about a panda the latter about a cowboy. Now with a bit of context you can figure that out independent of the placement of the comma, but the comma makes finding the meaning easier and more precise.
Conversely, correct syntax by itself doesn’t in any way guarantee meaning. That too can easily be seen in human languages. "The lazy car swam slowly over an obtuse rainbow" is a sentence with correct syntax but without any obvious meaning. That’s not to say that this sentence could not be have meaning in the context of a larger story that redefines how we commonly use some of the words but just by itself it has no meaning. We see this when we contrast it with another sentence with the same syntactical structure, as in “The old car drove quickly around a sharp corner.”
The same is true for programming languages. Syntax helps the computer establish the likely intended meaning but it doesn’t guarantee meaning by itself. Computers and programming languages tend to be much less forgiving about syntax though than humans. For instance, in most programming languages “5 + 3” is a calculation that results in the value 8. If you write “5 3 +” instead, in most programming languages that will result in a Syntax Error. The program will either stop running or not even start and the computer will complain about a syntax error.
There are two important reasons why computers tend to be stricter about syntax. First, they tend to know a lot less than humans (I will come back to that next Tuesday), so inferring meaning when the syntax is wrong is more difficult. Second, syntax errors may indicate situations where the programmer actually meant to write something different altogether, say in the example above maybe “53 + 7” where the extra space between the 5 and the 3 and the missing 7 were oversights. Would you rather have the computer assume you meant “5 + 3” or flag this as an error?
Syntax can become quite the hot button issue at times. For instance, in Javascript, the syntax rule is for lines of code to end with a semicolon. But it is a rule that Javascript isn’t strict about. Most of the time if you end a line without a semicolon, Javascript will insert that semicolon for you as it figures out the meaning of the code. As it turns out though in certain edge cases (and when used in conjunction with other tools) this can break in somewhat unpredictable ways. Here is an epic debate between various programmers about just such a situation.
Whatever programming language(s) you learn, it’s essential to understand the syntax in order to write working code. And it is worth keeping in mind that the computer may not be the only one needing to read your code. Other programmers working with you are after you might need to as well. So even if an expression is syntactically correct sometimes it is worth picking a different one that is easier to read. That is also the reason why most programming languages have formatting conventions. In some language, most notably Python, formatting (which lines are indented) are in fact part of the formal syntax of the language.
Just to show the potential diversity of syntax among computer languages here is a recursive factorial function in three different programming languages. The point here is not necessarily for you to understand what is going on but rather appreciate that code can look quite different.
Lisp
Javascript
Python
Next Tuesday will be about the semantics of programming languages which is how the code has meaning for the computer (and also what it means for the code to have meaning).