A rant on failing to build

#http #cache #etag #headers

A humble beginning

I like node servers, I like writing bare servers and I also like Express.js. I consider myself a beginner in Node.JS but I just wanted to write an express middleware that generates ETag headers. ETags are a shiny/new and neat way to invalidate cache.

At first, when I read about ETag headers I thought that having 'strong' and 'weak' variants of it is useless and I came to know that I was wrong. 'strong' ETags represent that the header value is generated based on the bytes of the response body. Which means when I generate an ETag for any given content it needs to be unique for that content, in fancy words it means a strong ETag is a hash of the content which is generated by a collission resistent algorithm.

The flashy code

Capturing the response body is pretty easy from a middleware. It goes something like...

const crypto = require('crypto');

const taggart = opts => {
  // standard express style
  return (req, res, next) => {
    // save methods
    const write = res.write;
    const end = res.end;

    // sha1 ain't that bad
    const hash = crypto.createHash('sha1');

    // keep track, for content-length
    let length = 0;

    const onData = (chunk, encoding) => {
      // sometimes chunk can be 'undefined'
      if (!chunk) {
        return;
      }

      // convert chunk to buffer
      chunk = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk, encoding);

      // update hash using chunk data
      hash.update(chunk);
      length += Buffer.byteLength(chunk, 'utf8');
    };

    const onEnd = (chunk, encoding) => {
      onData(chunk, encoding);

      // generate tag
      const l = length.toString(16);
      const h = hash.digest('hex');

      // weak or strong? use length and hash as ETag
      const tag = opts.weak ? `W/${l}-${h}` : `${l}-${h}`;
      res.setHeader('ETag', tag);
    };

    // override the default methods
    res.write = (...args) => {
      onData(...args);
      write.apply(res, [...args]);
    };

    res.end = (...args) => {
      onEnd(...args);
      end.apply(res, [...args]);
    };

    next();
  };
};

module.exports = taggart;

The vision

What we are doing is that we are hijacking the res.write and res.end methods of res object, which is an instance of http.ServerResponse. The write and end methods of the res object are used to write data to the response that is sent to the client and they are inherited from Stream class.

In the beginning we are creating a hash and in the onData method we are updating the hash using the chunks and getting rid of them(do not store the chunks, they can get pretty huge). We are also keeping track of the size of the response.

A call to the res.end indicates that the response has ended and now we can finalize the hash in onEnd function and set it as a header. But there is catch. In HTTP the requests and responses are streamed. For every call to res.write the partial response is sent to the client when it can be, and the headers are sent in the first chunk. The headers are sent in the first chunk, which happens during the first call to res.write.

The dead end

If you try to run the above code, you will get a fatal error. If your response is small enough(less than 65535 bytes or so ¯\_(ツ)_/¯), then it can fit in the first chunk. You get the whole data in a single chunk, you update the hash, and set the header in the call to res.end. Totally works, but only if your responses are less than 64Kb or so.

But I want to send ETags for images and videos which are I'am pretty sure are not less than 64Kb. It would really help to send back a 302 http code for a 2Mb image, right? Due to the streaming nature of the responses we are unable to do this. The hash can be generated only when res.end is called but by then the headers might have already been sent.

The only way is up

So, now for the compromises:

We can settle with ETags only for small responses.
We can generate hashes for static content before hand, maybe in a build process or something, save it in a dictionary and retrieve it later. Hassle.
We can generate hashe on every request - first prepare the response, generate the hash and then stream the response with the proper header. Clean and practical.
Trailers can be used, I am working on it, it doesn't look promissing, I might be wrong.

The conclusion

We can use modules like etag which is kind of coupled with send and which looks like it is intended to be used with server-static. They are written by the same guy anyway. send generates 'weak' ETags and it does so based on file stats.

I've come to realize that generating ETags through a module is hard. I has to be hooked into your server, its a low level component. Checking if the content is stale or not is pretty easy anyway.

References

Note: This is my first public blog post ever. Thanks for reading all the way through, hope you like it. I'am a non-native English speaker, I'am working on my skills there.
My so called failed middleware is on GitHub and Iam working on the fix-tagging branch do not checkout the master branch, its dirty.
Please leave suggestions, it would help me much. Thanks.

DEV Community