Last week’s overview of “How the Web Works” introduced the URL (Uniform Resource Locator) as the fundamental way things are addressed on the web. Before we pick apart some actual URLs, it is worth looking at the name itself. The promise behind “Uniform” is that this addressing scheme can be used across all kinds of resources and that explains why URLs are so powerful - they can be used to address content such as a blog but also services such as the Twilio telephony API. On the web a blog entry and an incoming phone call are both simply “resources”. That means a resource is a highly abstracted concept and as you will learn if you stick with Tech Tuesday, abstraction is amazingly powerful. And on the web the URL is the most powerful abstraction of them all!
So here is a URL to pick apart:
http://blog.dailylit.com/2012/01/16/in-honor-of-dr-martin-luther-king-jr/
The very first part, the “http:” indicates which protocol to use to access this resource. What other protocols might we find there? The obvious one is https: the secure (meaning encrypted) version of http:. Here some other protocols that you may have encountered around the web “mailto:” which indicates that the resource that follows is an email address and the protocol to speak to it is SMTP or you may have seen “ftp:” for resources that are accessible via File Transfer Protocol (FTP). Another protocol supported by many browsers is “file:” which means that the resource that follows is a file on the machine on which the browser is running.
Following the “http:” are two forward slashes “//” – these indicate that this URL starts with a domain name, which in this case is “blog.dailylit.com” – we will dissect domain names in more detail in the Tech Tuesday on DNS. There we will investigate the relationship between domains and actual servers but for now it is worth pointing out that grouping resources by domain serves an important trust purpose. Your expectations about accessing content at chase.com are meaningfully different from wepretendtobechase.com. Of course it’s not always that obvious and people go to great lengths to pretend to be someone else. There is a good test of your knowledge of which domains to trust.
Following the domain name is the location of the resource within that domain. This is the “/2012/01/16/in-honor-of-dr-martin-luther-king-jr/” part in the URL above. There are several things going on here that are worth noting. First, this location is structured in an easily human readable and comprehensible form. Just by looking at the URL you can infer that this is a post about Martin Luther King on MLK day. We call this kind of location a “pretty URL.” Having pretty URLs is a good idea not just because it helps humans figure out what they are likely to get when they access the resource but also because search engines, especially Google, make pages with pretty URLs rank higher in search results (assuming that the page content actually appears to be a match for the URL).
But there is even more to a pretty URL like “/2012/01/16/in-honor-of-dr-martin-luther-king-jr/” – the slashes “/” in the URL indicate some notion of hierarchy or of a path to the resource. It also suggests that the following shorter URL should point to something useful http://blog.dailylit.com/2012. In fact this retrieves all the blog posts from 2012. There is no requirement that the domain fulfilling the request understand this shorter URL, but the fact that it does corresponds both with intuition and allows for additional degrees of automation and discovery. For instance, without any further knowledge you should be able to construct the URL for finding all the blog posts from November 2011. Here it is http://blog.dailylit.com/2011/11/ . Again, there is no requirement on the server to respond to this with a list of posts and it could instead respond with say a 404 Page Not Found. The http protocol does not speak to this, which is one of its many strengths as it lets the person or organization controlling the resource decide how to respond.
Now not every URL starts with a “//” – there are also URLs that don’t contain a domain but instead just a path to a resource. Consider for instance the following http:/ – where the resource pointed to by this URL is located depends on the context in which it is encountered. This is an example of a relative URL. It points to a resource within the context of another resource. If you are reading this in the context of the Tumblr dashboard, the link will take you to your dashboard. If you are reading this on my blog, which is at the domain “continuations.com” it will take you to the home page of my blog. Relative URLs allow for more compact expression of the location of a resource but they can also introduce interesting errors. For instance, think about what resource that relative URL will point to if you simply copy it and send it to someone via email and they open it in a web mail client!
This post is getting quite long and I haven’t yet covered fragment identifiers or query strings. Instead of going on, I will cover fragment identifiers in the context of HTML and query strings when describing how URLs can be used to transmit additional information that can be used by the server in deciding how to respond to the request to the resource, so keep following Tech Tuesday!