Under the hood of Browsers
The web browser is inarguably the most common portal for users to access the web. The advancement of the web browsers has led many traditional “thick clients” to be replaced by browser enhancing its usability and ubiquity.
The web browser is an application that provides access to the webserver, sends a network request to URL, obtain resources, and represent them in an interactive way. Common browsers include Internet Explorer, Firefox, Google Chrome, Safari, and Opera.
Functionality of Web Browser
The browser presents the web resource you choose in the info window and proceed with user interaction. Basically, it is fetching, processing, displaying, and storing.
Structure of Web Browser
- User Interface
- Browser Engine
- Render Engine
- UI BackEnd
- Networking (HTTP at minimum, FTP, SMTP, email e.t.c)
It is a space where interaction between user and browser (application) occurs via the control presented in the browser. No specific standards are imposed on how web browsers should look and feel. The HTML5 specification doesn’t define UI elements but lists some common elements: location bar, personal bar, scrollbars, status bar, and toolbar.
It provides a high-level interface between UI and the underlying rendering engine. It makes a query and manipulates the rendering engine based upon the user interaction. It provides a method to initiate loading the URL, takes care of reloading, back, and forward browsing action.
Rendering Engine is responsible for displaying the content of the web page on the screen. The primary operation of a Rendering engine is to parse HTML. Rendering engine by defaults displays HTML, XML, and images and supports other data types via plugin or extension.
Rendering Engine flow
The modern browser uses different rendering engines.
Gecko : Firefox
Webkit : Safari
Blink : Chrome, Opera (version 15 onwards).
The web content is displayed through a series of the process:
HTML Data to DOM
The requested content from the networking layer is received in the rendering engine (8 kb chunks generally). The raw bytes are then converted to a character (based upon character encoding) of the HTML file. Characters are then converted into tokens. Lexer carries out lexical analysis, breaking input into tokens. During tokenization, every start and end tags in the file are accounted for. It knows out how to strip out irrelevant characters like white space and line breaks.
The parser then carries out syntax analysis, applying the language syntax rule to construct the parse tree by analyzing the document structure. The parsing process is iterative. It will ask lexer for new token and token will be added to parse tree if language syntax rule match. The parser will then ask for another token. If no rule matches, the parser will store the token internally and keep asking for tokens until rule matching all the internally stored token is found. If no rule is found, then the parser will raise the exception. This means the document was not valid and contained syntax errors.
These nodes are linked in the tree data structure called DOM (Document Object Model) which establishes the parent-child relationship, adjacent sibling relationships.
CSS Data to CSSOM
Raw bytes of CSS data are converted into character, token, node, and finally in CSSOM (CSS Object Model). CSS has something called cascade which determines what styles are applied to the element. Styling data to the element can come from parents (via inheritance) or are set to the elements themselves. The browser has to recursively go through the CSS tree structure and determine the style of the particular element.
Combination of DOM and CSSOM to Render Tree
DOM tree contains the information about HTML elements relationship and the CSSOM tree contains information on how these elements are styled. Starting from root node the browser traverses each of the visible nodes. Some nodes are hidden (controlled via CSS) and not reflected in the rendered output. For each visible node, the browser matches the appropriate rule defined in CSSOM and finally, these nodes are emitted with their content and styling called Render tree.
It then proceeds to the next level called layout. The exact size and position of each of the content should be calculated to render on a page (browser viewport). The process is also referred to as reflow. HTML uses a flow-based layout model, meaning geometry is computed in a single pass most of the time. It is a recursive process starting from the root element () of the document.
Each of the renderers is traversed and the paint method is called to display the content on the screen. The painting process can be global (painting the entire tree) or incremental (the render tree validates its rectangle on-screen) and OS generates the paint event on that specific nodes and the whole tree is not affected. Painting is a gradual process where some parts are parsed and rendered while the process continues with the rest of the item from the network.
Different Browser uses different JS engines
Chrome: V8 Engine (Node JS was built on top of this)
Mozilla: Spider Monkey (formerly known as ‘Squirrel Fish’)
Microsoft Edge: Chakra
UI Back End
It is used for drawing a basic widget like combo boxes and windows. Underneath it uses operating system user interface methods. It exposes a generic platform that is not platform-specific.
This layer is persistent which helps the browser to store data (like cookies, session storage, indexed DB, Web SQL, bookmarks, preferences, etc.) The new HTML5 specification describes a database that is a complete database in a web browser.
It handles all kinds of network communication within the browser. It uses a set of communication protocols like HTTP, HTTPs, FTP while fetching the resource from requested URLs.
Web Browser relies on DNS to resolve the URLs. The records are cached in the browser, OS, router, or ISP. If the requested URL is not cached in, the ISP DNS server initiates the DNS query to find the IP of that server. After receiving the correct IP address the browser establishes the connection with the server with protocols. The browser sends the SYN(synchronize) packet to the server asking if it is open for TCP connection. The server responds with ACK(acknowledgment) of the SYN packet using the SYN/ACK packet.
The browser receives an SYN/ACK packet from the server and will acknowledge by sending an ACK packet. Then TCP connection is established for data communication. Once the connection is established, data transfer is ready. To transfer the data, the connection must meet the requirements of HTTP Protocol including connection, messaging, request, and response rules.
Comparison of Browsers
There are many different web browsers in the market today. Although the primary application of the browser is the same, they differ from each other in more than one aspect. The distinguishing areas are platform(Linux, Windows, Mac, BSD, and other Unix), Protocols, Graphical User Interface(GUI), HTML5, open-source, and Proprietary, explained in details here.