This post is a mirror of a post I wrote on my own blog. Feel free to check it out here! I release articles on my website 2 weeks earlier than here.
--
Let's imagine one day you've been poking around the network usage section of your phone--trying to see what apps are killing your allotted 10GB of mobile data.
You scroll down and notice the usual suspects, YouTube, TikTok, whatnot. Then out of the blue, you start to see a bunch of applications that seem out of place: Newspaper apps, stock apps, even some banking apps! These apps can sometimes use more bandwidth than what you think.
How could that be? It turns out that many applications from the New York Times to Robinhood will often re-poll for the latest information from every few minutes to every second. These constant GET requests, while small, can add up.
In this article, I'll be explaining to you a method many of these Apps (hopefully) use to reduce the amount of bandwidth they take up, Conditional GETs. Conditional GETs can help prevent your apps from getting the same 20kb response every time you ping your server.
The gist
Conditional GETs are used in asset caching to prevent a browser from receiving the same javascript/image/CSS payload if a browser cached the latest copy. We should try to use conditional GETs in any request to the server when we poll for cachable content.
Let's look at a typical flow for the conditonal request:
- The browser requests some content from a website.
- The server returns the content with one or both of these headers:
-
Last-Modified
:some-date
- The time (usually a timestamp) that this content was last modified -
Etag
:some-generated-value
- A unique id referencing a resource to a particular state in time- An ETag could be a hash of the content, an id assigned whenever the content is updated, or a unique string representing the content
-
- The browser requests the same content later time; the browser can pass some conditional request headers:
-
If-Modified-Since
:some-date
- The last timestamp saved on the browser -
If-None-Match
:some-generated-value
- The previous ETag saved on the browser
-
- The server will check if any of those two values satisfy these conditions:
- If the content is the same, the server will return a
304
status - If the content is different, the server will return new data with a new
Last-Modified
and orEtag
.
- If the content is the same, the server will return a
In Practice
In the example below, I am creating a server that allows a user to update and retrieve their user information. The application would allow us to fetch a user's social media information on request.
We use the attribute updatedAt
of someUser
to validate the "newness" of response and return it as Last-Modified
. We will work with ETags
later.
Going Deeper
More headers!
The conditional request specification gives us a few different conditional header tags we can work with besides If-None-Match
and If-Modified-Since
. Those are listed below:
-
If-Match
: If the ETag on the server matches the ETag passed in, the server should send us new data. -
If-Unmodified-Since
: If the timestamp on the server is older than the timestamp we pass in, the server should send us new data. -
If-Range
: If an ETag or timestamp on a server matches a range of timestamps of ETags we pass in, the server should send us new data.
Strong and Weak Validation
The ETag HTML specification provides us two methodologies we can implement for validating our Etags:
Strong validation must ensure that the content requested is byte-by-byte the same as the previously requested content for a client to receive a 304 response. An example could be a dataset containing all your banking information. If anything has changed on the server, we should always send the most recent data.
Weak validation means that the server's content could be different from what already is on the client, but the change is not significant enough for the server to pass back new data. Let's go back to that banking information example. Let's say the banking information also contains some metadata information on an A/B test going on. This information is not essential and probably doesn't need to be updated on the client if we are performing live updates on the browser.
To ask a server to perform weak validation, you would prepend your Etag with W/
.
Let's build a server that can perform both strong and weak Etag validation.
const express = require('express');
const md5 = require('md5');
const server = express();
const port = 3000;
const article = {
content: 'Hello there! this is an article there!',
meta: 'Meta content for user',
adInfo: '349243'
}
// gets an article from "our database"
const getArticle = () => Promise.resolve(article);
const generateETag = (article) => {
const contentHash = md5(article.content);
const metaHash = md5(article.meta + article.adInfo);
return `${contentHash}_${metaHash}`;
}
const validateETag = (etag, article) => {
const useWeakValidation = etag.includes('W/');
const parsedTag = etag.replace('W/', '');
if (useWeakValidation) {
const weakCompare = md5(article.content);
return weakCompare === parsedTag.split('_')[0];
}
const strongCompare = generateETag(article);
return strongCompare === parsedTag;
}
server.get('/article', async (req, res) => {
const etag = req.headers['if-none-match'];
const article = await getArticle();
if (!etag || validateETag(etag, article)) {
res.sendStatus(304);
return;
}
const nextEtag = generateETag(article);
res.setHeader('ETag', nextEtag);
res.send({ article });
})
server.listen(port, () => console.log(`App listening at
http://localhost:${port}`));
Above, we created a function called generateTag
that creates an ETag composed of two parts, a contentHash
and metaHash
. The contentHash
is an md5 hash of only the article's content. The metaHash
is an md5 hash of all the non-content parts of this article.
We also created a validation function that will:
If weak validation is requested: we return a new payload if the article's
content
's md5 hash is different than what is on the server. We will send a 304 if any other data has changed.If strong validation: we will return content if anything in our article object has changed, we check both parts of the md5 hash group.
Weak validation is a little more complicated to implement then just checking if any byte has changed. Still, the benefit of building weak validation can help reduce unnecessary GETs when doing repetitive polls.
Conclusion
Conditional GETs are a straightforward way to reduce the bandwidth handled through your application. The bandwidth savings can directly reduce your networking costs and also help your customers reduce their networking costs (if they pay for their bandwidth).
Try this solution out alongside client-side caching, and you can have even more savings as users who return to your website or app don't need to redownload content that hasn't changed since their last visit. Anyway, give it a go--let me know what you make!
Top comments (2)
Is there any way to do it automatically? I am also in need of this solution, but I want to do it automatically like static files.
Thanks for your solution!
Hi there - sorry for the delay.
What are you developing with, most static asset middleware should already come with it prebuilt. If you're using express, using
express.static
function should automatically come with etag support and caching.