Quick-Preview
IEEE-754 floating point rounds the exact result of each operation to the nearest representable value. Because rounding is happening after every addition, the order you do addition does matter for high precision tasks.
Difference Example
Reason
Addition for floats are done under the format of the highest exponential bit between the 2 numbers. If one of the numbers cannot be represented when the exponential bit is fixed, it gets rounded to the nearest representable one.
Say for FP32, there is a 1 sign bit, 8 exp bit, and 23 fraction bit along with 1 leading bit. A float value would have the form
The minimal spacing between adjacent representable floats(ULP) is
Considering the first example, 1e8 has exponential component of 2^26, which is the highest among 1e8 and 1, so we use that exponent format for the whole calculation.
The minimal spacing is thus 2^(26-23) = 2^3 = 8. For any number not a multiple of 8, it cannot be represented and must be rounded to be stored in binary form.
When we were summing 1 to 1e8, 1 is far below 8, and gets rounded to the nearest representable number for exponential component equal to 2^26, which is 0. Thus 1 is ignored during this process.
However in the second example, -1e8 and 1e8 cancel each other, and summing 0 with 1 does not need rounding.
Thus changing the summing order does lead to different outcomes, which should be taken cautiously during programming
Top comments (0)