It was after midnight, and I couldn't sleep. I opened up Chrome on my phone and I'd pulled up an editorial about COBOL earlier in the evening, and it gets me thinking about how our computers actually do math. The author linked an article all about issues with floating point math and some of the issues it's caused. You'd be surprised how many issues come from just not realizing that 0.1 can't be exactly represented in binary. So I decided to check out an example in the article with their test values, x=77617, y=33096:

Thanks to Quick LaTeX for the formatting!

# Trying It Out

I wanted to see if I could replicate their findings:

Calculation Precision | Result |
---|---|

Single | 1.172603... |

Double | 1.1726039400531... |

Extended | 1.172603940053178... |

Actual Value | -0.827396059946 |

So I implemented the function in Clojure:

```
(defn trial-double [x y]
(+
(* 333.75 (Math/pow y 6))
(* x x (-
(* 11 x x y y)
(Math/pow y 6)
(* 121 (Math/pow y 4))
2.0))
(* 5.5 (Math/pow y 8))
(/ x 2 y)))
```

Calling `(trial-double 77617 33096)`

yields the result `-1.1805916207174113E21`

. Oddly, this is **not** the same incorrect answer that they achieved in the article. Mine was 21 orders of magnitude different, and the wrong sign! I tested it out in Excel as well...

```
=333.75 * y^6 + x^2 * (11*x^2*y^2 - y^6 - 121 * y^4 - 2) + 5.5 * y^8 + x/2/y)
```

... and I got the same wrong number that I did with Clojure!

However, as the article mentions, this is **not** the correct answer. So I set out to calculate the right answer. I ended up trying it out with a combination of Clojure's Ratio type and Java's BigInteger type (since Clojure's BigInt had an overflow issue):

```
(defn trial-rational [^java.math.BigInteger x ^java.math.BigInteger y]
(+
(* 1335/4 (.pow y 6))
(* x x (-
(* 11 x x y y)
(.pow y 6)
(* 121 (.pow y 4))
2))
(* 11/2 (.pow y 8))
(/ x 2 y)))
```

Evaluating this with the test values, `(trial-rational (biginteger 77617) (biginteger 33096)`

yielded the fraction `-54767/66192`

. Coercing that to a `double`

evaluated to `-0.8273960599468214`

. I'd gotten the right answer!

# What Does It Mean?

While this example was clearly meant to show that floating point operations can break in dramatic fashion, let's not chalk it up as just an academic example. Instead, let's think about things that you've written in the past. How many times have you tried to increment something by 0.1 or 0.01, only to have errors accrue? If I add up 0.1 ten times (like you would expect to do in a simulation with 10 Hz data), depending on the precision I get either 1.0000000149011612 or 0.9999999999999999, **not** 1.0. Accretion of floating point errors just like this caused Patriot missiles to miss their targets in the 90s during the Gulf War, and soldiers died as a result. Regardless of your stance on the military, I think we can all agree that people ought not die because people and machines represent numbers differently.

# How Did You Get Here from COBOL?

The editorial that got me thinking about all this was diving into why a lot of the financial sector - especially the IRS - was still using COBOL. It turns out that floating point calculations were easier to understand (though more costly to perform at the time) than fixed point calculations, and so they progressively won out. However, when more exact representation of numbers are required, it turns out that fixed point gets the job done much better. I have no clue if the formula at the center of this thought exercise would be --more accurate-- even in the ballpark of the correct answer, so I can't say that it is always going to be better than floating point calcs.

# What Should We Do?

As it turns out, floating points are *generally* accurate enough for most applications, and that accuracy tends to degrade only as you add/multiply more and more numbers that don't have an exact floating point representations. However, if you absolutely have to have accuracy, there are a couple of options.

## Store as Integers

If you're consistently dealing with a consistent number of decimal places, you could consider storing values as integers with a known scale factor. For example, You Need a Budget stores currency values as "milliunits". So (USD) $123.45 is stored as 123450 millidollars. Why not 100? Because some currencies (they list the Jordanian dinar as an example) go to 3 decimal places. Storing things as scaled integers and casting to floating point as needed will increase the accuracy of your calculations in this case.

## Use Rational Numbers

If you find yourself dealing with fractional values often, it might be worth considering a library for rational numbers. Some languages have them built-in - Haskell and Clojure come immediately to mind, and I'm sure that others exist. Using rational numbers under the hood and only coercing to a floating point when necessary will reduce the amount of error that can creep into your calculations.

## Symbolic Computing

What if you find yourself dealing with lots of numerical methods? Anyone who has dealt with ordinary differential equation solvers has had to worry about floating point errors in fixed step solvers (Runge-Kutta 4 from Numeric Recipes, anyone?). While you could always migrate to an adaptive step size solver (assuming you have a continuous function), we now have the computing power available to us to perform symbolic computations now. Think of it like Wolfram Alpha but built in to your program. Remember the power rule for derivatives? So do symbolic computation engines. Colin Smith has ported a lot of Gerald Sussman's ideas from SICP and Structure and Interpretation of Classical Mechanics (SICM) from Scheme into Clojure. If you don't need to go quite that far, Automatic Differentiation may work out well.

# Conclusion

Floating point calculations are generally accurate enough. If you need better accuracy, thankfully there are several options available that you can explore. Just remember to take the time to determine what level of accuracy you need (or more importantly, what your customers need), and pick the type of calculations that suit those needs best. Good luck, and happy coding!

## Top comments (7)

Good question! So the powers lead to large values, but the precision being lost is due to there being data lost at lower values. For example, if I add 1,000,000,000 and 0.00000005, I have values at two very different orders of magnitude. In all likelihood, in single precision (at least, maybe double precision also), I will

losethat data behind the decimal place. Effectively, I'll end up with a stored floating point value of 1,000,000,000, and not 1,000,000,000.00000005. Floating points work very well when you have data in similar orders of magnitude, but not in the above example.Hopefully that clears some things up! If not, I'd be happy to try to provide a better explanation :-)

It actually is possible to represent

`0.1`

exactly in binary, but not in IEEE Floating Point. Check out`unum`

by John Gustafson.You are absolutely correct; there are multiple varieties of floating points available out there. I should have specified IEEE-754 definitions since those are the ubiquitous (but not sole) representations of floating points in modern computing. Thanks for the clarification!

Thank you! I was actually about to write something similar.

I often see people that use nodejs for payment stuff. Even here on dev.to there was an interview with a guy who said that they factored out a payment system and wrote it with

nodejs and mongodb! That's like asking someone: "What are two worst tools for building a Payment system?"Why? Because Javascript has

onlyfloating point numbers and payment data is inherently relational.Really gets me angry

Interesting article Jason! The project I'm working on is trying to deal with the floating-point accuracy problem of moving a player around in an "infinite" world.