Many protocols are now being used in a way, not at all envisioned at the time of their creation. HTTP is no exception to this. Since the amount of transferred data is getting higher each year, the HTTP protocol is regularly adopting its functioning to enhance the speed of data transfer over the wire.
In this article, I dive into one of the key features, based on which the HTTP/2 protocol significantly reduces the amount of transferred data from one entity to another. This feature is the header compression format, called HPACK. The older versions of HTTP protocol support data compression, but the HPACK introduces a whole new level of compression.
HPACK introduces a completely new approach to header packaging and management. Websites today require dozens or hundreds of requests and the redundant header fields in these requests consume bandwidth unnecessarily. Therefore, HPACK is a compressor, which's main function is to eliminate redundant header fields.
HPACK specification is rather short, but as it goes for other HTTP/2 related specifications, this one is also often unclear and ambiguous, creating numerous issues and uncertainty for implementers. It is also written with an experienced developer in mind and focuses primarily on the decoder functioning and assumes that the implementor will be knowledgeable enough to add all details he sees are needed for the working product.
On top of that, a significant shift in thinking is required from the implementer of the HTTP/2 protocol. It’s not only a single request/response session that a connection in HTTP/2 represents. We can start multiple simultaneous streams in one connection, representing multiple request/response sessions, which was not possible in the previous versions of the HTTP protocol. The HPACK compressor uses this characteristic of HTTP/2 by indexing headers considering the whole connection and not per stream, which might seem somewhat unusual. Since I was well acquainted with the HTTP protocol, I somehow missed this particular information during my first read, that’s why I’m specifically addressing it here. So please take a moment to let this last paragraph sink in before you continue reading.
Specification quickly goes into a very technical level of the HPACK decoding so it gives the impression that HPACK is very complex and contains a lot of “unnecessary” rules. While in fact, the implementation of HPACK contains three main parts of the process:
Indexing table is a list, to which the HPACK saves the commonly used headers. Each entity indexes headers per connection, separately for incoming (decoding) and for outgoing (encoding) data.
The numbering of entries starts with index
1 and the first 61 headers are static items, keeping their position in the table. These are specific headers, provided by the HPACK specification, based on their statistical relevance, and therefore deserve their permanent position in the table.
Other headers are listed in the table from position 62 onwards and are called dynamic headers. Header entries are ordered as FIFO (first-in, first-out) and duplicated entries are allowed. Dynamic headers are always inserted at index 62, which shifts all indexes of the existing custom headers one step lower. For the dynamic part of the table, we need to set a limit of how many bytes of the dynamic headers the table is allowed to store. When, while adding a header, this limit is crossed, the headers are evicted from the back of the table, so the table never exceeds the limit.
This specific functioning is addressed in the HPACk specification as two separate tables, to which it refers as the static and the dynamic table. However, we are dealing with a single list, where two tables are combined into a single address space for defining index values.
The illustration below shows the structure of the indexing table.
<---------- Index Address Space ---------> < Static Table >< Dynamic Table > +--+-------------+--++--+-------------+--+ |01| ... |61||62| ... |XX| +--+-------------+--++II+-------------+DD+ II = Insertion point DD = Dropping point
Let's see how such a table is used by entities. When a client sends the request, it can indicate in the header block that a particular header and potentially also its value, should be indexed. The table for outgoing headers on the client's side would thus look something like this:
On the server’s side, when it reads the headers it would create a table that would look exactly the same. If the next client request would send the same headers, it could simply send a header block including only header indexes:
The server will then look up and expand into the full headers what those indexes represent. This essentially explains the whole concept. The mechanism is innovative and highly efficient. I guess no added discussion on its effects on the performance is necessary since there are plenty of benchmarks, proving its efficacy available online.
The encoder performs the task of data compression. It converts the data from its original readable form into an optimized byte sequence by applying the rules defined in the HPACK specification. As explained earlier, the specification is interspersed with rules, and is best to take notes and map these rules out to understand how the compressor performs each task. This way we can understand what needs to be implemented and, most importantly, how to start the implementation process itself.
The HPACK encoding has specific rules for representing integer and string primitive types. Usually, the implementer will start with this part, since all other encoding rules are based on these primitive type representations.
Integer representation defines the rules for encoding integer numbers. Integers are used to represent name indexes, header field indexes, or character string lengths.
String literal representation defines the rules for encoding string literals. With these, we encode the header name and value literals. The content of these rules can be written in plain text format or encoded with the Huffman algorithm.
With these basic rules, HPACK defines the binary formats for the representation of the actual headers.
Indexed header field representation represents fully indexed headers. These are the headers that are stored in the indexing table under specific index numbers. Since both the header name and value are stored in the indexing table, only this index number is encoded. Such headers are really minimal and therefore optimal in terms of performance.
Literal header field representation defines headers that are not or only partially indexed. If the header field name matches the header field name of an entry stored in the static or dynamic table, the header field name can be displayed using the index of this entry. Otherwise, the header field name is displayed as a string literal. Header values are always displayed as a string literal. Such headers can be marked as "index", "do not index" or "never index". The latter tells us that the data is sensitive and that the entity should handle it with some restrictions (e.g.: protect it with a password).
HPACK is designed as a single-standing mechanism that can also be used outside the HTTP/2 protocol. For this reason, the specification provides a rule for signaling changes related to the allowed size of the dynamic table.
- Dynamic table size update defines the rule for signaling changes in the size of the dynamic table. Such a change is signaled by the encoder, while the limit must be less than or equal to the limit determined by the protocol using HPACK. In HTTP/2 this limit is the last value of the SETTINGS_HEADER_TABLE_SIZE received by the decoder and acknowledged by the encoder. Encoder and decoder use the HTTP/2 protocol to communicate the change in table size and if the change is accepted at both ends, the encoder applies the change and reports it to the decoder using the HPACK mechanism.
These five rules, with some additional conditional rules as described in the HPACK specification, define the HPACK encoder.
The decoder takes over the task of the decompressor, i.e. it executes the commands inversely to the encoder. It converts the data back into its readable form, ensuring that the indexing table is identical to that on the encoder side.
The decoder is usually the one that determines how many resources can be used for the purposes of HPACK compression, among others. The decoder signals this to the encoder in the HTTP/2 protocol with the parameter
SETTINGS_HEADER_TABLE_SIZE using the
SETTINGS frame. This change is made when the settings are confirmed by both sides in a way that the HTTP/2 protocol requires. In fact, the encoder is the one that actually requires a change in the size of the dynamic table to meet the requirements of the values agreed upon via the HTTP/2 protocol.
Experiments show that HPACK works very well, especially on pages with large, repetitive headers (e.g. cookies). Since most of the headers sent from entity to entity for a given website are duplicated, HPACK's table lookup mechanisms effectively eliminate these duplicate bytes from communication.
In order to get an easy to read and understand the text, I intentionally kept back with some details described in the HPACK specifications. However, I strongly recommend that you also read the official HPACK documentation. Hopefully, this will be much easier to understand thanks to this blog. I have also written a complete HPACK implementation for HTTP/2 in Rust. It is publicly available on GitHub as an open-source project. The source code is well documented and therefore a good additional source of information for all who want to know all details and tricks of HPACK.