Saturday, March 17, 2018

What happens when you type google.com into your browser and hit enter?

This is a question I heard often while interviewing engineer candidates, and it sounds so simple that most people would say something like: "It loads Google, duh!". However, to a good engineer, this question can take over an hour to answer. Computers are extremely complex, this is not a surprise to anyone. It's easy to take it for granted that it all works so seamlessly every time we sit down and load a website like Google. I wanted to challenge myself and try to answer the question from memory as best I can (mistakes and all!), and possibly re-visit the question in the future as I learn more.
  1. Hitting enter on the keyboard completes a circuit that sends a signal to initiate loading of the website into the browser.
  2. The operating system has to allocate memory for the incoming data to be stored locally and displayed by the browser (an application, itself running in RAM).
  3. All of the code is broken down into 1's and 0's for the processor to compute the data. There are a lot of registers, caches, and a specific instruction set, which is like a language used only by that type of processor.
  4. Computing uses power and generates heat, which is dissipated by heatsinks and fans in most modern desktops and laptops, or by passive cooling in mobile devices.
  5. The computer's NIC (Network Interface Card) communicates with the local router/gateway and switch to establish a stream of data. A TCP packet is assembled with a header specifying where it wants to go, with a return address for where the data should return, and the payload carrying the data.
  6. The router changes the recipient address to itself, this happens for every hop of the network between the client and server. This is like a mailman carrying a letter to the post office, and intentionally labeling the letter with that post office's address so the next recipient of the letter knows where to send it back.
  7. The communication happens over copper CAT5 cables with 8 smaller cables that are called "Twisted Pair", with four of them being RX (receive) and four of them being TX (transmit). These are all twisted slightly differently so that there isn't any interference. The long distance communication happens over fiber optic cables at the speed of light. I once heard that a single fiber optic cable can handle enough bandwidth for the entire world to call one place. The limitation is the hardware on both sides, which is always improving.
  8. Computers don't know how to load websites by name without relying on DNS (Domain Name Service) to resolve it to an IP address. So a request is sent to your local DNS server asking "What is the IP Address for Google.com?". DNS will likely have this in it's cache and can answer very quickly, otherwise it has to ask the parent DNS server for the address.
  9. Once it has the IP Address, the browser will utilize the network stack of the operating system to open a socket on the high end of the port spectrum (probably 50000-65000 for TCP IPV4) and initiate a TCP (Transfer Control Protocol) connection with Google's web server with a SYN, SYN-ACK, ACK (Synchronize, Acknowledge Synchronize, Acknowledge).
  10. The TCP protocol is designed so that received data is always verified, and any missing data is re-transmitted.
  11. The initial connection will load on port 80 (HTTP), but the web server will automatically redirect it to port 443 (HTTPS).
  12. The SSL/TLS handshake begins with a "CLIENT HELLO". It asks the server which ciphers it supports and the client will choose the most secure cipher it can support. Additionally, the client will examine the server's SSL Certificate for validity and authenticity, by asking the issuing CA (Certificate Authority): "Is this really Google?".
  13. CAs are the issuers of SSL Certificates. It is their job to verify the people requesting a certificate are really who they say they are. With EV (Extended Validation) certificates, the CA will take it a step further and verify their phone number and address.
  14. Once a session key is negotiated using some amazing math (basically magic to me at this point) with prime numbers, encrypted data can be sent to and from the server where only the client and server can decrypt the data. The client uses his private key and Google's public key to encrypt the data, which can only be decrypted by the server with his private key and the client's public key. A man in the middle cannot easily decrypt and read the data without a TLS proxy or having already compromised one of the endpoints. Ideally it will be using a cipher with FS (Forward Secrecy), which adds another layer of encryption to each request. This means that even if the original key is identified, all of the data cannot be decrypted with it.
  15. The browser communicates with the web server via the HTTP protocol by sending a GET request. The server replies with data the browser can use to display the website.
  16. The HTML/XML/Javascript is rendered into a human readable format that we all see every day.
  17. Some additional data is exchanged for analytics, advertisements, and session cookies
  18. The information is displayed on a monitor that has hundreds of pixels per inch, at it's most basic level a pixel is a combination of Red, Green, and Blue that is mixed differently to display different colors.
This is not a perfect picture of everything that happens under the hood. You would get a different answer from every engineer you asked, and their answers would highlight their areas of expertise. I think about this question all of the time, and strive to learn more every day. The most amazing thing about this process is that it happens in milliseconds, billions of times every day, with reliability that we've learned to depend on.

No comments:

Post a Comment