So an overly flashy title from one of my articles six years ago finally bit me in the back side... 😅
Bitmasks: A very esoteric (and impractical) way of managing booleans
Basti Ortiz ・ Oct 28 '18
In my defense, this was written by my naive 2018 self. Six years and one computer science degree later, I have vastly changed my perspective on this. This is why we are going in the completely opposite direction. In this article, we will discuss why bitmasks are not so esoteric and impractical after all.
Parallelized Truth-Checking
Let's go back to the example used in the original article.
const config = {
isOnline: true,
isFullscreen: false,
hasAudio: true,
hasPremiumAccount: false,
canSendTelemetry: true,
};
Now suppose that we are tasked to check if all of these flags are true
. In a world without bitmasks, we may end up with the following solution:
function isAllTrue(flags: boolean[]) {
for (const flag of flags)
if (!flag) return false;
return true;
// ALTERNATE:
// return flags.every(flag => flag);
}
isAllTrue(Object.values(config)); // false
This is great and all. Some may even consider its alternate one-line version the epitome of clean JavaScript code. After all, who could say no to the Array#every
method?
But unfortunately, as we increase the number of flags in config
, our O(n)
solution eventually shows its cracks. Of course in practice, it is rare for a config
object to scale up to a million flags. At that point, there are bigger architectural decisions that need to be reconsidered.
Nevertheless, for the sake of argument, we are left asking: is there a more efficient way to do this? Yes, as a matter of fact, there is an O(1)
solution to this problem: bitmasks!
Bitmasks enable us to simultaneously query all of the bits in an integer in a single CPU instruction. Since we are interested in checking if all the bits are turned on, we simply have to apply a bitwise-AND
on an all-ones bitmask (i.e., 0b1111...1111
).
// ASSUME: We have a way to convert the
// `config` to its bitwise representation.
function isAllTrue(flags: number) {
// Explicitly convert to an integer just in case.
// https://tc39.es/ecma262/#sec-numberbitwiseop
const bits = flags | 0;
// Obtain a bitmask where all bits are ones.
const BITMASK = ~0;
// The bitwise-AND with the `BITMASK` verifies
// whether the `bits` are all ones (i.e., all true).
return (bits & BITMASK) === BITMASK;
}
With that optimization, we now have a constant-time checker that can simultaneously check 32 bits (a.k.a. 32 Boolean values) in a single CPU instruction. This is much better than our original linear-time solution, which went through each bit one at a time.
Now you may ask: what if we have more than 32 Boolean values? Fortunately, some languages (such as JavaScript and Python) support arbitrarily-sized integers, which effectively remove the 32-bit limit. That is to say, we can operate on any finite number of flag bits and bit masks. Specifically for JavaScript, we have the BigInt
type.
For other lower-level languages that only have fixed-size integers (such as C, C++, Rust, and Zig), there is no magical way to extend this algorithm to more than the maximum number of bits that the language supports. Luckily for us, we can still at least "chunk" the input into 32-bit groups. In fact, this is how arbitrarily-sized integer operations are implemented under the hood.
Note that the algorithm would still be linear-time, but it is at least 32 times faster than iterating one bit at a time! This is not to even mention the removal of the for
-loop overhead (e.g., branch instructions, jump instructions, etc.).
/**
* @param {bigint} flags - The bit flags stored as one big integer.
* @param {bigint} count - The actual number of bits in {@linkcode flags}.
*/
function isAllTrue_Chunked(flags: bigint, count: bigint) {
const BITMASK = (1n << 32n) - 1n; // 32 ones
while (count > 0) {
// Only consider the first 32 bits for this chunk.
const bits = flags & BITMASK;
if (bits !== BITMASK) return false;
// Advance the cursor.
flags >>= 32n;
count -= 32n;
}
return true;
}
Analogously, we can also apply this algorithm to the bitwise-OR
operator, which now checks if at least one bit is turned on rather than checking if all bits are turned on. The point is: bitwise operations enable us to write highly parallelized Boolean queries.
Struct-Packing for Network Efficiency
Another neat use case for bitmasks is querying the bit flags in a tightly packed struct
. This is especially relevant in computer networking, where data packets are typically serialized as a literal struct
. For instance, one may represent the header of a Transmission Control Protocol (TCP) segment as a struct
with the following fields:
#include <stdint.h>
struct tcp_header {
uint16_t src_port;
uint16_t dst_port;
uint32_t seq_number;
uint32_t ack_number;
uint8_t data_offset: 4;
uint8_t reserved: 4;
uint8_t flags;
uint16_t window_size;
uint16_t checksum;
uint16_t urgent_pointer;
uint32_t options;
};
Of particular interest is the flags
field, which is essentially an array of bits that indicate whether the packet is a SYN
, ACK
, FIN
, or RST
packet (among other bit flags). The specifics of the protocol is beyond the scope of this article, so we instead explore why a TCP header uses an 8-bit integer for bit flags rather than individual bool
fields.
It is important to first consider that a bool
in C is essentially an 8-bit integer that can only be either 0
or 1
. Although a single bit is sufficient (and necessary!) to represent false
and true
, the C standard still requires that a bool
is 8 bits wide (where 7 bits are left unused).
From the perspective of network efficiency, an array of bool
is therefore terrible packet design! That is to say, an array of bool
values requires 8 bits per flag. With the tcp_header.flags
field containing 8 flags, we would require 64 bits just to represent them in a struct
! These bandwidth inefficiencies add up at the unfathomable scale and volume of data transfers in the Internet.
The solution is struct
-packing. Instead of wasting 7 bits per bool
field, we should instead "pack" the eight bit flags as a single 8-bit integer. We then use bitmasking to query and extract each field of interest. And just like that, we saved ourselves from 56 unused bits.
const uint8_t CWR = 0x80; // 1000_0000
const uint8_t ECE = 0x40; // 0100_0000
const uint8_t URG = 0x20; // 0010_0000
const uint8_t ACK = 0x10; // 0001_0000
const uint8_t PSH = 0x08; // 0000_1000
const uint8_t RST = 0x04; // 0000_0100
const uint8_t SYN = 0x02; // 0000_0010
const uint8_t FIN = 0x01; // 0000_0001
// Is this a SYN packet?
tcp.flags & SYN == SYN;
// Is this an ACK packet?
tcp.flags & ACK == ACK;
// Is this a SYN-ACK packet?
const uint8_t SYN_ACK = SYN | ACK;
tcp.flags & SYN_ACK == SYN_ACK;
Caching with Bloom Filters
Following the theme of struct
-packing, bitmasks once again show up in systems that require efficient caching strategies. Chief among the use cases are content delivery networks (CDNs) and database systems.
But first, let's go back to what a cache is for. When certain operations are too expensive to run frequently, a cache is responsible for intercepting inputs and checking whether a corresponding previously computed output may be returned instead.
Now let's consider a case study on web servers. Nowadays, many web servers dynamically render HTML on the fly depending on the current state of some internal database. Sometimes, this dynamic generation may be expensive. Perhaps the database query is a non-trivial aggregation over millions of rows. Perhaps an external API invocation takes a long time to process the request. Perhaps an upload is taking too long.
In any case, a cache layer may prove to yield significant performance gains on the system. But what exactly are we caching? A simple example for the sake of demonstration is a parameterized web page whose contents depend on a portion of the URL—say /user/:id
. As much as possible, suppose that we would like to avoid frequently generating user profiles. Hence, the only sensible time to rerender is after the user modifies their profile. Everything is served from a CDN cache otherwise.
A naive in-memory cache in the web server could store the URLs (e.g., /user/100
) of known cached pages in a set-like data structure. If a URL exists in the cache, then serve from the CDN. Otherwise, we generate the HTML page from scratch. Sadly, this design comes at the cost of higher memory usage. Imagine having to cache /user/1
all the way up to /user/1000000
. Yikes!
A more memory-efficient way to cache this knowledge in-memory (without blowing up RAM) is through a Bloom filter. Instead of caching all of the known id
values of cached pages, a Bloom filter caches only a subset of these id
values.
We thus set up the Bloom filter in such a way that if the hash of a URL gets a cache hit, then we know that the corresponding user may have modified their profile (which should be rare in practice). Otherwise, we know for certain that no such modifications have taken place yet, so it is safe to serve from the CDN cache directly.
In theory, we may implement a Bloom filter as an array of Boolean values. The URLs must then be hashed as a valid index into that array. Alternatively, as it is the theme of this article, we may use bitmasks instead.
function bloomFilterHash(url: URL) {
// Extract the :id from the URL.
const index = url.pathname.lastIndexOf('/');
if (index < 0) throw new Error(':id not found!');
// This is a simple hash where we simply take the ID modulo 32.
// This always gives us a valid index into a 32-bit Bloom filter.
const id = BigInt(url.pathname.slice(index + 1));
return { raw: id, bloom: id % 32n };
}
// Initially zeroed 32-bit Bloom filter.
let bloomFilter = 0n;
// GET handler for the /user/:id.
app.get('/user/:id', async req => {
// Hash the URL as an index into the Bloom filter.
const { raw, bloom } = bloomFilterHash(req.url);
const bitmask = 1n << bloom;
if (bloomFilter & bitmask !== bitmask) {
// The bit is turned off. Therefore, no modifications
// have been made before. We can use the cached profile.
return new Response(null, { status: 302, ... });
}
// The bit is turned on, which means one of the (possibly many)
// users that computed this hash has modified their profile.
// Time to generate a brand new profile page. Ideally, we should
// eventually clear the Bloom filter so that future requests
// need not pay this overhead (even though it should be rare).
const data = await fetchUserProfile(raw, ...);
const html = await generateProfileHtml(data, ...);
return new Response(html);
});
// POST handler for modifying user profiles.
app.post('/user/:id', async req => {
// Hash the URL as an index into the Bloom filter.
const { raw, bloom } = bloomFilterHash(req.url);
const bitmask = 1n << bloom;
// Mark the Bloom filter bit as "dirty" (i.e., modified).
bloomFilter |= bitmask;
await modifyUserProfile(raw, ...);
return new Response(null, { status: 204, ... });
});
Conclusion
In the original article that started this all, I framed bitmasks as an "esoteric" and "impractical" way to "over-engineer" Boolean values that only "hardcore computer scientists" should care about. Much of that negative sentiment and gatekeeping has gone away nowadays as I've grown as a mentor, a software engineer, and a computer scientist myself. In its place is a greater appreciation for the attention to detail involved in such low-level optimizations.
"[Bitmasking] is impractical and unreadable most of the time. Constant documentation and awareness are required to make sure that the correct bits are being selected and manipulated."
Though if there's anything worth saving from the original article, I still stand by my concerns on readability and maintainability. Without clear documentation, bitmasks look like magic numbers out of thin air. The runic soup of bitwise operators doesn't make the situation better either given its seemingly arbitrary incantations of tildes and minuses.
Wherever possible (and perhaps until performance bottlenecks arise), it is wiser to keep bitmasks as a last-resort tool in our belt. The cognitive overhead necessary in deciphering bitwise logic is frankly unappealing in many cases.
"Frankly, I wouldn't recommend using [bitmasks] regularly, if at all. As clever as it is, it is just too esoteric for common use... Overall, there aren't many applications for this, especially in a high-level language like JavaScript."
Parallelized truth-checking, struct
-packing, probabilistic caching, bit fields, permission bits, and pattern matching are just a few examples of the more "practical" side of bitmasks in the real world. Even in high-level languages such as JavaScript, bitmasks are in fact not just mere intellectual play among "hardcore computer scientists".
Much like design patterns, data structures, and algorithms, the humble bitmask serves as one of the many tricks that we have up our sleeves. As with most things in our practice, it's just a matter of using the right tool for the job while keeping code quality in check.
Just like last time... bitmask responsibly.
Top comments (3)
I am kinda amused, Bigint is in stdlib for Zig, and there are third party math libraries for all the others given it is required to implement RSA). Most C compilers (and probably C++) support implicit struct packing via extensions which does the same thing.
To top it off Bigint isn't the best way to implement this, accessing just one bit requires allocating a mask of the same size as the Bigint, where as if you simply store your bits as an array of unsigned 32 bit ints you can access any single bit and only have to allocate 32 bits for the mask.
TLDR: If using a language which has compiler support for bit packing, use that, if not search around for some implementation of enumset which does the same thing. Neither requires you to do any manual bit twiddling (which is fun and you should learn it anyway).
That's true. I just wanted to conceptually show that instead of storing an array of Booleans, a possible optimization is to use integers instead. The
BigInt
is just an implementation detail. As you proposed, you could also store an array of 32-bit integers instead. That would also be fine. 👍This is a very wonderful article, thanks for sharing.