DEV Community

Marc Katz
Marc Katz

Posted on

Dark Magic, Evil Bits, and WTF

How Quake III devs sold their soul to the devil to save a few milliseconds

Content warning: Math

In video games, one of the most computationally intensive function is determining lighting. This is because when calculating light reflections in three dimensions, you need to find 1 divided by the square root of a number, or in other words, 1/sqrt(n), its inverse square root. Since the number will be in floating-point format, division is computationally expensive. With modern computers, this effect is minimal, but back in the 90s, more thought had to be put into each machine cycle to get games to work.

The developers working on Quake III Arena, released in 1999, came up with an ingenious solution to this problem. They found a way to calculate an inverse square root without division. Their code in C, with the original comments, was:

float Q_rsqrt( float number )
{
 long i;
 float x2, y;
 const float threehalfs = 1.5F;

 x2 = number * 0.5F;
 y  = number;
 i  = * ( long * ) &y;                       // evil floating point bit level hacking
 i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
 y  = * ( float * ) &i;
 y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
// y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

 return y;
}
Enter fullscreen mode Exit fullscreen mode

Part 1: Evil Floating point bit level hacking

So, this code is… complicated to say the least, and without knowing what exactly is going on, it is incomprehensible. Even if you know what each line does, why it works is still a long way away.

Let’s start with the line i = * ( long * ) &y;. As commented, this line is evil, in the sense that it breaks the rules. We take y, which is a 32-bit floating-point number, and shove its bits into i, which is a long (a 32-bit integer).

In C, floats are stored in the following fashion: one bit for the sign, eight bits for the exponent (e), and 23 bits for the mantissa (m). If we look at the memory directly, y is 0eeeeeeeemmmmmmmmmmmmmmm (side note: we know that the sign bit will be 0, denoting a positive number, since the square root of a negative number is imaginary). This means that y = (1 + m) * 2 ^ e. Well, kinda. In order to have negative exponents, e is given an offset of 127 (1111111 in binary), so it actually is y = (1 + m) * 2 ^ (e — 127).

So why make it a long? In order to do the following steps, we need to act on y’s bits directly. It is easiest to do this as an integer, since C will now let us add and subtract without worrying about exponents and the like.

Part 2: WTF

Now we get to the next line: i = 0x5f3759df — ( i >> 1 );, and as the developers so succinctly put it in the code’s comments, “what the fuck?”
What are these arrows? What is that number? and how does any of this help us get a square root?

Let’s start with the easiest part first: bit shifting. i >> 1 simply shifts the bits in i once to the right. Since i is a binary number, this is equivalent to dividing by 2. Now, our number looks like 00eeeeeeeemmmmmmmmmmmmmmm. If we view this back as a float, this is approximately equivalent to (m/2) ^ (e/2).

But why do this? Well, we first have to reexamine what we are trying to do. Another way to view 1/sqrt(n) is n^(-1/2). So by bit-shifting right, we’ve (kinda) taken the square root. We negate it by subtracting our new number. But we still have to fix the errors that bit-shifting caused.

The major potential source of error is last bit in the exponent, represented the bold e above, which now has found itself in the area of the mantissa. In order to fix this error, and some other issues (Such as halving the mantissa when we didn’t want to), the developers of Quake III found a Magic Number that helps alleviate these accumulated errors. This magic number is 0x5f3759df, or 1011111001101110101100111011111 in binary, or 1597463007 in decimal. How this number is derived is beyond the scope of this article, but it involves blood sacrifices and dark wizardry (or so I assume). If you want to learn more about this, read Dr. Lomont’s paper, linked below.

And to put this all in a bow, we put our new, probably cursed, number back into a float with the line y = * ( float * ) &i;.

Part 3: Newton’s Method

We’re almost at the finish line, we just need to make our guess (y) more precise. The number we have is good, but it could be better. The way that the code does this is with one iteration of Newton’s method of approximation, x_n+1 = x_n — f(x_n) / f’(x_n), where x_n is the current guess, f(x) is the function we’re trying to approximate, and f’(x) is its derivative. Since Newton’s method finds 0s, we need a function that we want to be 0. In this case, our function will be the error, so our f(y) = 1/y²-x, where x is the number we started with. This means its derivative is f’(y) = -2/y³. Putting this together, we get y — ( (1/y²-x) / (-2/y³) ), which simplifies to y * (1.5 — x * y² / 2). This gives us the final piece of the puzzle, y = y * ( threehalfs — ( x2 * y * y ) );. Now, this method can be done multiple times to further make the it more accurate, but it seems the developers tried a second iteration and found it unnecessary.

And finally, we have our Fast Inverse Square Root. All it took was some hacking and magic. But…

Part 4: Was it worth it?

Yes. This method is way faster than the built-in ways of solving for it, and it is very accurate. The maximum relative error is only 0.175%. But is it worth doing today? Modern computers can handle exponentially more computations than in the 90s, but games and simulations that would requires lots of these calculations have also become more demanding, complex, and exact. Whether or not you want to use a quick approximation like this or just whatever is native to your system will depend on the exact situation you are in and how much computing power you need to squeeze out of your user’s computers.

Sources:
http://www.lomont.org/papers/2003/InvSqrt.pdf
https://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/

Top comments (0)