Tech Tuesday: Web Servers

If you have been following along, we are now on Step 5 of the web cycle where we find ourselves at the server that is answering an HTTP request. Because the HTTP protocol is well defined anyone in theory can implement a web server. In practice these days most people run one of less than a handful of servers with Apache, IIS (Microsoft) and more recently nginx accounting for the bulk of all web sites. The reason for this degree of concentration is that much like database software, the web server is a mission critical piece of the stack and a lot of work has gone into making sure these servers work well for a wide variety of uses.

Let’s first consider the simplest possible situation: a GET request for a URL where the resource on the server is a file. In this case all the web server needs to do is read the file from disk and send it back packaged up as an HTTP response. That means sending an HTTP status code of 200 (assuming the file was found and properly read), followed by a bunch of headers indicating the type of the response (indicating for instance whether the file contained text or HTML or an image), followed by the actual contents of the file. If the file is not found, the server would return a response code of 404. Or if the server finds the file but cannot read it for some reason it might return a response code of 500.

Even this relatively simple task of answering a GET request for a file in reality is a bit more complicated because the HTTP protocol has a bunch of important optimizations. Imagine a situation where a great many browsers all request the same file over and over. It would be very inefficient to actually send an unchanged file back again and again. So instead the HTTP protocol allows for a couple of different mechanisms, such as the cache-control or etag to determine between the browser and the server whether a resource (here the file) has changed and needs to be served anew. If based on this the web server determines that it does not need to resend the file, it will send a 304 Not Modified HTTP status code instead.

Now things get a fair bit more complicated when the web server has to deal with the submission of data via a POST request. In general, the web server needs to do a bunch of work to figure out how to respond. The response will generally depend on the data that was submitted with the form. Web servers therefore provide mechanisms for invoking a program and passing the submitted data to that program (which might be written in a language such as PHP, or Python, or Ruby, or pretty much any other language for that matter). The program can then determine what to do based on the data and dynamically assemble the response. The web server then passes that response back to the browser.

Web servers do a great many other things, such as imposing load limits and doing URL rewriting, but this should give you the basic outline.