HTTP/2 Server Push Diary

#webperf #showdev #webdev

TL;DR Commons Host now implements an HTTP/2 Server Push Diary to solve the over push problem.

The server push diary is a Cuckoo Filter that tracks any assets which were pushed previously by the server on each connection. Subsequent attempts to push the same resource are checked against the diary and skipped by the server to avoid redundant data transfer.

Here is an example website with two pages sharing the same image and stylesheet dependencies.

A user (👨🏻‍💻) first visits one page and then another. The server (🤖) uses a Cuckoo Filter (🐦) as server push diary to prevent over-push.

Cuckoo Filters

The diary uses a Cuckoo Filter: An extremely space efficient and high performance data structure that makes it possible to track thousands of individually pushed resources, say an entire node_modules folder or set of database records.

The diary is a probabilistic data structure. Data stored can not be retrieved in its original form. Instead the diary can answer whether the same data was previously stored. This is a convenient test when the cost of a repeated operation (e.g. network transfer) far exceeds the cost of the filter (i.e. tiny amount of RAM and CPU).

The server can tune the probability of false negatives. The optimal values are a matter of speculation, so I'd like to see how far people decide to push (pun so intended) this feature. Currently the diary is set to a size of ~1000 entries at ~12 bits per record. This allows for hundreds of pushed resources with very few false negatives.

If this concept sounds familiar, you may have heard of Bloom Filters. The Cuckoo Filter offers efficiency improvements and most importantly allows removal of items. This is useful in the web context when cached items expire and become stale. A high performance implementation of the 2014 Cuckoo Filter research paper exists and has been ported to Node.js by Matteo Collina as cuckoofilter-native.

What About Cache Digests?

The Cache Digest HTTP/2 extension specification appears to be on hold, as far as I can tell.

Commons Host always supported Cache Digests using the Cache-Digest header or cookie. Browsers could send a Bloom or Cuckoo Filter representing their cache to the server. The server used this as a diary to avoid over-pushing. Sadly browser developers have yet to implement native support. Experimental implementations using Service Workers and the Cache API are technically viable but come with a fair set of developer considerations that have so far not proven popular.

Hopefully diaries, being automatically enabled and requiring zero developer effort, can help prove the merit of Server Push and Cache Digests. I believe they are elegant ideas, with solvable problems. We may yet see their success.