2024 July 20: Six years and one computer science degree later, I have since changed my mind on most of the negative sentiments in this article. Se...
For further actions, you may consider blocking this person and/or reporting abuse
While JS is a high-level language and it initially certainly wasn't meant to be used for much more than some simple script blocks, the engineering and runtimes that we use this language in have lowered things considerably.
Consider that JS runtimes have multiple tiers of JS code processing, the latter ones compiling it into direct assembly and throwing optimiser passes over it. JS engines detect and optimise shapes of objects to eliminate all the handling around string lookups for the keys of your objects.
So, while you can still use JS for everyday high-level tasks, you can now successfully use it for very high data throughput operations as well, both in the browser and in server contexts. And memory usage and data layout impacts throughput significantly. Consider this made up but actually kind of realistic metadata object:
To create this object, a JS engine has to allocate a standard JS object, allocate 3 strings for the keys and store the 3 values. Now if there are many objects being handled like this then JS engines will get smarter about them, but they still have to create new dynamically allocated objects for each one.
BTW, with "many" I'm thinking of the tens or hundreds of thousands or larger magnitudes.
Given that I know as the developer of this app that the
sortKey
field is never larger than, say, 1 million and thegroup
is at most 50 and thefirst
is a boolean, which is 1 or 0.I can use this info to put all this info into a single 32-bit integer.
sortKey
can fit inside 20 bits (2^20 > 1M),group
can fit into 6 bits (2^6 = 64) andfirst
can fit into a single bit, total 27 bits. I can store these as such:Where F = first, G = group and S = sortKey
The above object can then be represented as:
To create this value, the JS engine has to … allocate nothing (in optimised passes), it's just an integer that fits comfortably in a register inside the CPU, we even have 5 bits to spare! ;) I can now use the bit masking and insertion techniques you mentioned to read/write data in these values (did I mention that bit ops are single instructions on the CPU?)
Now, if this was indeed only for a single variable, we wouldn't have gained much, but we're talking about large amounts of data here. Say, I will process 1 million of these blocks at a time. To do that I can:
And done, all memory accounted for, and with simple indexing I can access each element.
Doing the same for the JS object variant would mean:
The first 2 can be moved to init time with object reuse, but the 3rd one is maybe even more important. Modern CPUs really don't like cache misses and fragmented memory reads, it can have a significant impact on the runtime of your code.
Either way, the end result is that by representing the data more efficiently the code will run (much) faster and take less memory, which means being potentially better than competition, having lower server costs (fewer servers needed with less memory in each), not having to wait for results, being able to handle bigger workloads, etc.
So again, for "normal" usage indeed, don't bother, but for high-volume data processing it really pays to look at data layout optimisation, and bit fields are one of the tools you can use for that!
This optimization is fantastic. And horrific at the same time. It can provide performance improvement seemingly out of thin air. And it can be a nightmare to support for anyone other than the author. What if the number of groups changes dramatically? Like, grows to 100, or 200? Or is removed entirely? What if first is replaced by 6 booleans? What if... the metadata changes completely?
My point: bitwise operations are a great tool, but it has extremely specific usage cases. I'd advice average software developers to be extremely skeptical about it (don't use it unless you know what you want to achieve and there's no better way, or it requires significant sacrifices elsewhere).
Not really, especially if you're used to it. It is common practice to assign each flag to a constant in order to improve readability, rather than using the numbers directly.
Bitmasks are quite common in systems programming and binary file formats. Several UI toolkits (Qt, Gtk...) also use them. They can also be used to implement certain mathematical functions on machines that don't have a FPU.
There's life outside web development.
I really love the example you gave. It's very interesting. I applaud the creativity and cleverness displayed in those 27 bits. I just learned something new and cool today. Thanks!
Stop confusing people with weird words like "CPU" and "register", what's wrong with you? Did you even read the article and especially the context in which it was written?
I really hope you're being satire.
I am new to the software development and we use bitwise masking. It's a real nightmare for newbies as code is totally unreadable. At least I might understand some basics now.
Permissions are the only real way most people will see bitmasks these days unless you're doing something particularly low-level.
Like file permissions:
reads literally as
Not realising this is exactly why everyone abuses
chmod -R 777
so wildly.It's also good1 for wrapping up complex permissions in a program:
vs.
Meh, "good". ↩
I was supposed to include an example on complex conditions in the article, but then I thought that it was probably getting too long already. Thanks for bringing this up!
Also, may I ask what the
x
s stand for inrwxr-xr-x
?They're the execute bits - if it's a file, you can run it as a command, if it's a directory, you can
cd
into it.Hey, your article was very useful. Thank you for that. I think it's useful for use-cases when you have lists of booleans and you want to compare against. Instead of diffing in loops "foo === arr[i] && true === arr[i]" having into numbers is simpler.
I've expanded and made a class for making it simpler to use. I've called it tuples-boolean-bitmasker, the source is in a project on a GitLab monorepo I own, here are the unit tests I've written while building BitMasker
I feel like this article and others here on DEV regarding bitwise operations are spreading fear about something quite simple. I think that the undertones of the article may make readers more likely to panic if they run into bitwise operations in the wild. I use them quite often for work, and its really not bad. Its just a thing. I do appreciate the depth of the article though, good post!
Great article! If you allow, I will add my example of bitmasking and a very cool one imho
When it comes to data compression algorithms many times you need to write or read just one bit. In nowadays computer systems that kind of granularity is impossible because the minimum amount of data is a byte. So if it weren't for bitmasking and binary operators in language compression algorithms would be very useless.
Sometimes, I feel like bit manipulation is an underappreciated art. Your example proves it. We use compression algorithms everyday, especially in the Web where all network requests must be minimized as much as possible, yet there is little attention and appreciation given to it.
I don't do much programming in Java script and I don't know why one would use bit masks in a language that is mostly for browser side high level coding.
But calling them esoteric and impractical would lead me to believe you don't deal with embedded coding and good old standard c. Often in embedded systems we can't afford to be wasteful with resources, since c has a much smaller compiled foot print than C++ and higher languages it is used.
Also when working with creating a client server protocol that sends and receives data packets (like a multi player game), you often need to send flags that say things like jump, attack, crouch, move, etc..
You could create a struct to send as a packet using a bunch of C++ BOOL's. But here's the rub, if you're using a bool per flag, a bool requires minimum 1 Byte or 8 bits; however, if you use an unsigned int as a bitfield, and bit masks, you can effectively have 8 bools per byte for a char, or uint8_t.
You get 8 times the bool flags, with 1/8th the data packet size. Reducing the data packet size reduces the latency. Also bitwise operators are very fast for the cpu to process.
For Java Script, I'd probably not bother with bitmasks and bitwise operators, but if you are using c or c++ and trying to minimize the amount of memory you're using, bit masks will always beat bool.
Yup, you are definitely correct there. Perhaps I have been too harsh on bitmasks. I never really considered embedded systems when I wrote this article. I simply had a "JavaScript mindset", so to speak.
Definitely not an expert but my take away from my limited experience with bit masks are like this:
For example you can have presets like
admin = 24; user = 17; poweruser = 31;
. And just setconfig = admin
and you are good to go rather than assigning many variables.Another example, bit masks are used in Unity 3D engine to set the layers a Camera renders in runtime. Unity has two parts, an editor where you set things using GUI, and your code. Layer count and names are customized in the Unity editor. And in your camera code, instead of using a whole list of exposed layer objects, you can provide a bit mask to modify which layers are rendered with that particular camera.
Yeah, I also have very limited experience with bitmasks myself. Seems like bitmasking isn't as impractical as I initially thought. Thanks for sharing!
I found your post long after it has been created, while I was looking for answer about bitwise operation.
I have a very good subject to use it.
I'm working on a personnal tool to play Soduku. It is a HTML, Javascript and CSS based tool.
In Sudoku game, the purpose is to complete a 9X9 cells grid, so the same digit exists only once in each row, column and 3X3 box. It is important to be able to control if a digit can be set at a specific place.
Classical control is quite complex, because you have to check every cell in the row, the column and the box hosting your target at least 3 loops with 9 steps each.
Using bits make things much easier : you just have to compute one value for each row, one for each column and one for each box of the grid, creating a binary digit corresponding (value 1) to the position of the possible digits. For example, if a row contains 3 an 8, the digit for that row is 101111011.
All 3 digits for a cell can be calculated in the same step of the loop.
For a cell, an simple AND on the three digits (row, column and box) to which belong the cell gives the possible values.
When a value is set, it is easy to update the corresponding digit in the 3 arrays, and then to check for next turns.
Thnaks for initial post and answers ...
Honestly, I should be the one thanking you. What a brilliant use of bitmasks! Thanks for showing me this example. It really started my turning my gears for possible uses of bitmasking.
I absolutely love working with bitmasks! They pair incredibly well with MongoDB: there are query expressions for $bitsAllSet/$bitsAnySet + $bitsAllClear/$bitsAnyClear to make for some very succinct and expressive queries, and then there also $bit update operators for setting/unsetting bits. Field selection is a lot more straightforward; it indexes well, especially w/ compound indexes; and it just feels clean. I've found that a lot of queries are just much easier to express like this, for instance:
Also, there are a lot of times in which logical expressions are much cleaner with bitmasks. For instance, say you wanted to update a doc and wanted to trigger some behavior based off of the changes; the optimal way to do this IMO is to use a findOneAndUpdate with bitwise updates ($bit) and then to return the old doc / sync the bitwise updates and to compute the XOR. Once you have the XOR of the update, it's really easy to derive events: IE
status & XOR & VERIFIED
-> they are verified & they weren't verified before this update. The most powerful use case I've found is for shared documents, where you have a subdocument array for user data w/ an ObjectID ref + a bitmask, and then you can use $elemMatch updates to set each users respective flags, and then you can do awesome things like taking the AND of each users flags + the XOR of an update as to derive events based of when flags are mutually set by both users and became such as a result of this update. This very simple design pattern works for a staggering number of use cases.Ayyyyy. I can't disagree with your level of enthusiasm! Bitmasks are definitely great for these kinds of use cases. As long as the code for these design patterns are relatively straightforward and easy to fathom, bitmasks aren't really as "bad" and "esoteric" as I have made them to seem like in the article.
Nice discussion!
As I see it, bitmasks are a very important tool for use cases where either performance or memory constraints are important. Performance, because bitwise operations can sometimes achieve the same as more complex arithmetic, but at a fraction of the cost. And memory constraints because storing a single conceptual boolean value in a single bit is about as efficient as you can get, memory-wise.
Although it surely isn't the only one, C++ is truly the language for highly optimized computing. I like the fact that one can have many options for a number data type. For example, we can choose to be as memory-efficient as we want by using a
short
over a normalint
type; unlike in JavaScript where all numbers are 64-bit IEEE 754-compliant. This is good for beginners because they don't have to worry about the intricacies of the language, but when it comes to servers and other large-scale operations, it might be a huge help to the workload if numbers weren't 64 bits long.I think you wrote an explanation of bitfields that is both accessible and in depth, well done.
I think that the best use case for bitfields/bitmasks are found where the various "flags" you toggle on and off are not independent of each other, but have some meaningful relationship, e.g. the values in the field are "additive" in a sense.
For example, log levels are considered additive by some applications: if you log at "info" level, then you also log "error" and "warn" level messages, this relationship between the different possible values is well captured by bitfields by representing the "lowest" value as the smallest value in the field:
Then whatever is logging can decide whether to log a message or not based on whether the configured log level "has" the specified flag by using bitwise and (&) as you mentioned, or making a numeric comparison:
Another example is configuring media "directions" (you can run into this with webrtc):
In these cases, the values have some meaningful relationship with each other and using bitfields feels quite natural to me to represent these relationships.
I never knew that error logs had levels. No wonder they look like hexadecimal gibberish to naive users like me.
WebRTC has a pretty cool API for using bits rather than strings, such as
'forwards'
,'backwards'
, or'both'
, to describe media direction. I definitely want to see more of that in modern APIs.I worked with bitmasks(a while back) to write a decoder for GRIB, a binary format for weather data. The utility of a bitmask in this context seems much more obvious. It allows you to extract individual bits from a given byte of data. There is NO other way you can do this because processors do not handle anything under 1 byte.
Very nice explanation. In JS React actually uses Bitmasks for the modes it provides since you can wrap many modes into each other it gives you a simple way to pass all the available mode states as a single variable.
In Unix file permissions are handled this way: like 6 = 110 (read & write) or 7 = 111 (rwx)
This makes me wonder what the rubrics are for determining the "best and optimal move" in Chess.
A while ago I found out that making chess engines is holy hell incredibly intrincate
I found this engine a guy made in Scala, exploiting it's concurrency and DSL capabilities: github.com/marianogappa/ostinato
It was one of those moments when I say, "damn, I do not know crap about programming and I suck"
But back to the post, great job! I'm myself delving a bit more into bit ops, and your post is super well explained.