Tech Tuesday: How The Web Works (Overview)

As promised at the end of last year’s Tech Tuesday, we are starting this year with a cycle on how the web works. Just as a reminder, Tech Tuesday’s aim is to require no previous knowledge other than what has been covered before. So this overview may be trivial for some readers but I wanted to make sure to bring everyone along.

Let’s assume you have fired up your favorite web browser. Now you type the address “dailylit.com” into the address bar (if you always go to web sites by typing their name into a search engine, I urge you to discover the address bar and type in the address).

What happens now? How does the browser go from a web address for a site to that site’s content on your screen? That turns out to be an amazingly complex series of steps:

Step 1: The address “dailylit.com” is part of what is known as a URL. The full URL is “http://dailylit.com” and your browser automatically pre-pends the “http://” to save you the typing. The HTTP bit indicates to the browser which protocol to use to speak to the server (more on that in Step 3 below). Typing a URL into the address bar starts the same sequence of steps as if you had clicked on a link (e.g. among a set of search results) pointing to the same location as in DailyLit. In the very first step the browser “parses” the URL (meaning it takes apart the URL into its various parts) in order to determine where it is supposed to look for content.

Step 2: The “dailylit.com” portion of the URL is the domain name (you can get your own from a domain registrar). Think of this much like the name of a person. If you want to call a person on the phone you need to look up their phone number based on their name in some phone book (e.g. the contact list on your cell phone or in the dark ages some paper book made from dead trees). Similarly in order for your browser to retrieve the content from DailyLit, it needs to first lookup the IP address of the server on which the content lives. This is done by consulting a “phone book” known as DNS which stands for Domain Name System and is a near miraculous invention.

Step 3: Now that the browser has an IP address, in the case of DailyLit currently 72.32.133.224, it makes a request to “GET” content from 72.32.133.224. GET is capitalized and in quotes here because it is one of several defined requests supported by the so-called Hypertext Transfer Protocol or HTTP – which was what the beginning part of the URL. In essence this request simply says GET me the content that resides at 72.32.133.224. This is the protocol that got started with Tim Berners-Lee’s work in the very late 80s and early 90s and is to this day the underpinning of the interaction between web browsers and web servers.

Step 4: Through the magic of Internet networking, that GET request is routed via a whole bunch of intermediary devices (routers, switches, firewalls, load balancers oh my!) to the machine with the IP address. In fact, you can look up for yourself how many intermediate hops exist between you and the server and that’s something we will do in an upcoming Tech Tuesday

Step 5: We are now on the Server. The server receives the incoming GET request. On the server machine the work is co-ordinated by a program known as a web server (something like Apache or NGINX). The web server retrieves the contents for the page and starts sending them back to the browser again over the Internet. Important side note: because the Internet is packet switched, the content is cut up into smaller parts (packages) that may travel different routes to get back to the browser. All of that cutting up and re-assembling is handled by lower levels of the network and is transparent to both the web server and the web browser. That too is one of the many awesome features of the Internet that we easily take for granted.

Step 6: Back at the browser. The browser is receiving the content in the form of an HTTP Response. That response contains a bunch of different stuff. For instance, it contains a so-called HTTP Response Status Code to indicate to the browser whether the server thinks it has some useful information [Response Code 200 OK]. If the server had a problem, e.g. it didn’t find have any content for this URL it will send a different code, such as the famous 404 Page not Found. The browser needs to start parsing the response to figure out what to do next. That will in all likelihood include many additional requests by the browser to the same and possibly other servers to retrieve content that was referenced in the initial response, such as CSS and Javascript files. Every one of these additional requests involves all the steps from 1-6 AGAIN!

Step 7: Even while it is still waiting for the responses from these additional requests (and possibly even more pieces of the original request) to arrive the browser will start to figure out how to render the content that it has received on the screen. That means figuring out what to show where, which is made incredibly complex by the interaction between the HTML (roughly: the content itself), the CSS (roughly: the styling or look and feel of the content) and the Javascript (roughly: the dynamic behavior of the content). This work involves a so-called rendering engine and also a full fledged computer language interpreter (for Javascript).

Step 8: The browser continues to execute the Javascript code (which might, for example, animate an object to move across the page) while at the same time waiting for input from you. For instance, when you hover with the mouse above a link that might change the look and feel of that link. In the early days of the web, the most that would happen now is that a click on a link will start the whole process over at Step 1 for the next page. Today, however, many additional requests to the web server may occur without the page ever refreshing as new content is dynamically fetched and added to the existing page and other content written back to the server.

In the upcoming Tech Tuesdays we will look at each of these steps in some detail, starting with the anatomy of a URL next Tuesday. In the meantime, I hope I have managed to convey some of the amazing complexity that is involved in something that we now take for granted and people interact with billions of times every day around the world. And all along you should keep in mind that I haven’t even mentioned any of the complexity behind the scenes, such as the browser interacting with the computer’s operating system to make all of these steps happen.