Hi, everyone! I'm very happy that you came here! I will do my best to make quality content and explain the main thing of this post! Let's do this!
I read a lot of sources related to this question, what is going on when I try to reach some page on Internet? Many of them are very useful but in each were skipped some steps or they were described not enough deep, so I will try to put all of this information on this post and show you as guide step by step.
Before the start, I want to tell that I will not describe physical specific (like how keyboard gets the signal), but will review this process from programmer side.
1. Browser actions
1.1 When you start writing the URL (Uniform Resource Locator) of the website on the address bar, your web browser is starting function of autocomplete and give you variants of sites based on your record. This base of sites has been build based on your history or general records of browser (ex. google.com, yandex.com etc.)
1.2 When you press the "Enter" key, the browser gets all information from the address bar and check the record on symbols type. URL record can consist of the next characters:
If characters are different than the browser should run the function "Punycode" which converts these symbols to ASCII (American standard code for information interchange) format.
1.3 In next stage browser checking if this request goes on the external or internal resource. Like internal resource, it can be the local file with HTML (Hypertext Markup Language) text.
If this external request, browser is moving to the next step.
1.4 Next step of the browser will be checking whether this site is using restrict connection or no.
If connnection is restricted then browser is using HSTS (HTTP Strict Transport Security) function. This function is checking the list of secure sites and if this site record is found, it's running the forced secure connection to this resource (using HTTPS, instead of HTTP protocol of connection).
This list you can find on the site of your browser :
And also you can change HSTS list in settings of your browser :
Google : chrome://net-internals/#hsts
Firefox : about:preferences#advanced (certificate tab)
2. DNS lookup
(Domain Name System) is decentralized naming system. This system associates various information with domain names assigned to each of the participating entities.
2.1 If this is the external request, the browser should know the address of source to which browser will make the request and first step of it will be checking the internal cache of the browser (for example on chrome it is "chrome://net-internals/#dns" link)
If there are no records regarding this domain in cache than will be running function "hostbyname" which will check the file in your machine on the availability of it.
- function which returns value with type "hostent". Here name is either a hostname, or an IPv4 address in standard dot notation (as for inet_addr(3)), or an IPv6 address in colon (and possibly dot) notation.
2.2 The first step of this function will be checking internal configuration (on OS level) on the availability of the needed address.
This file is stored on each OS and platform in different places. These are only a few of them :
- Linux : /etc/hosts
- Windows : %SystemRoot%\System32\drivers\etc\hosts
- Macintosh : /etc/hosts (a symbolic link to /private/etc/hosts)
The full list you can find on Wikipedia
2.3 If this function can’t find this record neither on cache or “hosts” file then it makes a request to the DNS server configured in the network stack. This is typically the local router or the ISP's (Internet Service Provider) caching DNS server.
2.4 If function can't find this domain on your router and provider, then request is forwarding to "root servers".
A root name server
is a name server for the root zone of the Domain Name System (DNS) of the Internet.
Full list of these servers you can find by follow the link : https://www.iana.org/domains/root/servers
However "root servers" will not give you information regarding needed host. They just forward you to lower domain server. And this action will be until you get to lowest domain server and receive needed information.
Just, for example, you wanna get the address of "google.com", the first request will be to the nearest root server (marked like ".") that you wanna get the IP address of google site, but like the response, you will get the address of ".com" DNS. The second step you will ask the same question of ".com" domain server, but answer would be the address of "google.com" DNS. And the third step will be asking this question of "google.com" domain server and only in this step, you will receive needed address.
You may ask, why such long path is needed? Why can't we make it easier? The main point is you will not are making requests of information regarding "google.com" on root server or other DNSs all the time. After the first request, your provider to this domain will cache address. And for other people, the provider will give this information from the personal cache. That's why this process is faster than you think.
3. ARP process
After getting needed address, for making the route to the destination host and forming of needed packets of data, you should know the physical address of the host. On this issue will help ARP (Address Resolution Protocol).
3.1. Your host making a request to the gateway (in most cases is router or switch) on MAC address.
3.2 Your device checking their MAC address table. It has a view like :
IP address = MAC Adress
address1 = MAC1
address2 = MAC2
3.3 If there is no record regarding requested host, the device running broadcast request for all local devices in the network and on the external gateway. This request with the next format :
Sender hardware address - 192.168.0.1
Sender protocol address - 08:00:5A:21:A7:22
Target hardware address - 126.96.36.199
Target protocol address -
3.4 When destination host receives this request, after comparison of target protocol address with the personal address it starting to make ARP response packet.
Sender hardware address - 188.8.131.52
Sender protocol address - 03:01:1A:2B:A7:21
Target hardware address - 192.168.0.1
Target protocol address - 08:00:5A:21:A7:22
3.5 You are receiving needed physical address and also route to this host.
All these steps are going via transport protocol using TCP connect.
4. Transport of data
4.1 After this is doing of the opening of the socket to destination host for the transport of data. Types of sockets and how it's working described on previous my post, so if you need have a look on it, go to the link
4.2 If destination host using only the secure connection, so one of the steps will be TLS handshake.
4.2.1 Your host sending to destination "ClientHello" message with personal version of TLS
4.2.1 The server replies with a "ServerHello" message to the client with the TLS version, selected cipher, selected compression methods and the server's public certificate signed by a CA (Certificate Authority). The certificate contains a public key that will be used by the client to encrypt the rest of the handshake until a symmetric key can be agreed upon.
4.2.3 The client verifies the server digital certificate against its list of trusted CAs. If trust can be established based on the CA, the client generates a string of pseudo-random bytes and encrypts this with the server's public key. These random bytes can be used to determine the symmetric key.
4.2.4 The server decrypts the random bytes using its private key and uses these bytes to generate its own copy of the symmetric master key.
4.2.5 The client sends a Finished message to the server, encrypting a hash of the transmission up to this point with the symmetric key.
4.2.6 The server generates its own hash and then decrypts the client-sent hash to verify that it matches. If it does, it sends its own Finished message to the client, also encrypted with the symmetric key.
4.2.7 From now on the TLS session transmits the application (HTTP) data encrypted with the agreed symmetric key.
Information regarding TSL handshake was got from post of user "Alex" on GitHub.
5. Last steps
5.1 If this simple request on the web page, source host should send HTTP response and show it on your browser. Full list of HTTP status codes you can find on Wikipedia
5.2 After receiving HTML text browser before decoding, he is trying to get all dependencies (like CSS, JS or images)
5.3 Browser parsing all receiving data and due to HTML tags, CSS styles and is building the interface which is showing for you.
And again thank you all of you! I hope I removed the dark curtain and now you have imagined how your computer is getting your favorite sites :)
So will be in touch!