Today we are continuing on with the web request cycle. After the browser has parsed the URL and obtained the IP address of the server via DNS, the browser now has to communicate with the server. That is done using the so-called Hypertext Transfer Protocol or HTTP for short. The beginnings of HTTP go back to the early 1990s when Tim Berners-Lee first devised it drawing inspiration from Ted Nelson, who had coined the term Hypertext in 1963. For an even earlier description of a similar idea it is worth reading Vannevar Bush’s amazing “As We May Think” from 1945!
HTTP builds on top of the lower level Internet protocol TCP which permits establishing a connection between two machines (see my introduction to networking). A so-called HTTP session consists of a series of requests from the browser followed by responses from the server. Each request consists of a request method, a resource (URL), a set of headers and optionally a request body.
The most common HTTP request methods are verbs such as GET, POST, PUT and DELETE (I am capitalizing them because that’s how they appear in the protocol). What’s great about these is that they are wonderfully descriptive of what you expect the request to do. GET is supposed to, well, get information from the resource. I say resource rather than server because that is the right level to think about with regard to HTTP – it is about manipulating abstract resources rather. PUT on the other hand puts information at the resource (without regard to what’s already there). DELETE – you get the idea – deletes the information at the resource. This relative obviousness and some associated expectations around how these methods behave provides a powerful foundation for the transfer of information (more on that in a future post on so-called RESTful APIs).
The headers contain additional information about the request. For instance, the “Date” header field contains the date and time when the request was sent. Or the “Referer” header (misspelled in the protocol and in most implementations!) contains the URL of the page on which the currently requested resource was found. It is worth looking at the list of possible HTTP headers, which also shows the headers for a response (see below). It should be pointed out that the HTTP protocol allows for the creation of additional headers which can carry custom information (not always what you would want as in the recent case of O2 sending users’ phone numbers!).
The request body is used for POST and PUT requests to carry the data. For instance when you encounter a registration form on the web that asks for your name and email address, the information you type into the form fields is (generally) carried in the body of the resulting HTTP request.
Once the server has received and processed the request it will send an HTTP response. The response has a structure that’s quite similar to the request. Instead of the method, the server returns a status code, then some headers, and finally a response body.
The status code indicates what happened at the server and hence what to expect in the body of the response. The standard code is “200 OK” which means the server processed the request and everything went well. There are more precise responses in that vain, such as “201 Created” which means the server created a new resource (e.g. in response to a PUT request). There are a series of codes to deal with resources that have moved, such as “301 Moved Permanently” which provides a new URL that should be used for all future requests for this resource. And there are a bunch of codes to indicate various error situations such as the famous “404 Not Found” for which some web site return very funny contents in the body of the response. Again, it’s worth browsing the complete list of response codes.
The response headers contain a lot of additional information about the response. For instance the “Content-Type” header field describes what kind of content the response body contains. Examples of different values for this header field are “text/html; charset=utf-8” for a web page in HTML using the UTF-8 character set or “image/jpeg” for an image that is compressed using JPEG. Without knowing this the browser would have to infer the content type from inspecting the body of the response which would be very cumbersome. There are a ton more headers in a response that are similarly critical to the proper functioning of the HTTP protocol, such as how long a recipient can “cache” (locally store) the body (in order to help speed up a subsequent access and also relieve the server and network).
Finally there is the body of the response which contains the actual information. The body is a bunch of bytes. What they represent can vary wildly as explained above. It could be an HTML web page or an image or something altogether different. One of the great powers of the HTTP protocol is that it is really content agnostic.
Because there is a lot going on with the HTTP protocol under the hood and much of it matters it is a bit of a shame that many people including active developers don’t really understand it and as a result either create things that don’t work as expected (e.g. making resources change in response to a GET request) or re-invent features on top of HTTP that HTTP already contains (e.g., content caching). If you do any work on the web it is well worth digging deeper than this post!