Tomik

Posted on Mar 23 • Edited on Mar 30

0.1 + 0.2 != 0.3

#programming #computerscience

Today, I'm going to explain how computers store floating-point numbers. Computers cannot store floating-point numbers exactly the same way we usually write them in decimal format. Instead, they store everything using binary digits (0 and 1). For this purpose, the IEEE 754 standard was created. It describes how floating-point numbers are stored in binary format.

A floating-point number in IEEE 754 has three parts:

Sign
Exponent
Mantissa (Fraction)

Today, we will look at just two types of floating-point number formatting because they are the most common.

Single precision (32 bits)
Double precision (64 bits)

Single precision

A 32-bit floating-number is divided as follows:

1 bit for the sign
8 bits for the exponent
23 bits for the Mantissa / Fraction
Sign bit:
- 0 is positive number
- 1 is negative number
Exponent:

Stored using a bias of 127.
Mantissa (fraction):

Represents the fractional part, with an implicit leading 1 (for normalized numbers).

Double Precision

A 64-bit floating-number is divided as follows:

1 bit for the sign
11 bits for the exponent
52 bits for the Mantissa / Fraction
Sign bit:
- 0 is positive number
- 1 is negative number
Exponent:

Stored using a bias of 1023.
Mantissa:

Represents the fractional part, with an implicit leading 1 (for normalized numbers).

Example: Representing 4.5

Binary representation:
0 10000001 00100000000000000000000

sign is 0 (positive)
exponent = 10000001 (binary) = 129 (decimal)

Actual exponent = 129 - 127(bias) = 2
fraction = 00100000000000000000000

This represent the mantissa:

$1.001_2 = 1 + \frac{1}{16} = 1.125$ We will use the formula: $value=(-1)^{sign}\times2^{(exponent-bias)}\times(1+fraction)$ Calculation: $(-1)^0 \times 2^2 \times 1.125 = 4.5$

How to convert decimal to IEEE 754

1. Take the absolute value first:

Take the number 3.8125 as an example:
Convert to binary:

3 = 11₂
0.825 = 0.1101₂ So: 3.825 = 11.1101₂

2. Normalize (Scientific Binary Form):

We write it as:

1.1101_2\times2^1

3. Sign bit:

Number is negative -> sign = 1

4. Exponent:

Use bias (127):

1 + 127 = 128

Binary:

128 = 10000000_2

5. Mantissa:

Take the fractional part after the leading 1:
1.11101 -> 11101000000000000000000
(pad with zeros to 23 bits)

$Convert mantissa to fraction$

Result:

We got the value 1 10000000 11101000000000000000000.

We can easily represent -3.8125 as a floating-point number in IEEE 754. However, we cannot represent 0.1 exactly, unlike -3.8125.

How to convert IEEE 754 to decimal

This time, let's try converting the number 0.1 to IEEE 754:

1. Convert to binary:

That's where the complications begin. We need to repeatedly multiply by two until zero appears after the decimal point:

As a result, we will get an infinite value:

0.1_{10}=0.000110011001100110011..._2

2. Normalize:

0.0001100110011... = 1.1001100110011...×2^{-4}

3. Sign Bit

Positive -> sign = 0

4. Exponent

−4 + 127 = 123

Binary:

123 = 01111011_2

5. Mantissa:

Take 23 bits after leading 1:
10011001100110011001100
We cut (round) the infinite sequence here

Final IEEE 754 Representation:

0 01111011 10011001100110011001101
But when we convert this value into decimal we get 0.10000000149011612
That’s why 0.1 plus 0.2 not equal 0.3 exactly instead 0.30000000000000004

Special Values

+0 / -0:

Exponent: all 0s
Mantissa: all 0s
Meaning: Positive/Negative zero

0 00000000 00000000000000000000000 = +0 
1 00000000 00000000000000000000000 = -0

Denormalized:

Exponent: all 0s
Mantissa: non-zero
Meaning: Very small numbers

0 00000000 10000000000000000000000 = 5.877471754111438e-39

±Infinity:

Exponent: all 1s
Mantissa: all 0s
Meaning: Overflow / division by zero

0 11111111 00000000000000000000000 = +INF   
1 11111111 00000000000000000000000 = -INF

NaN:

Exponent: all 1s
Mantissa: non-zero
Meaning: not a number

0 11111111 10000000000000000000000 = NaN

DEV Community

0.1 + 0.2 != 0.3

Single precision

Double Precision

Example: Representing 4.5

How to convert decimal to IEEE 754

1. Take the absolute value first:

2. Normalize (Scientific Binary Form):

3. Sign bit:

4. Exponent:

5. Mantissa:

Result:

How to convert IEEE 754 to decimal

1. Convert to binary:

2. Normalize:

3. Sign Bit

4. Exponent

5. Mantissa:

Final IEEE 754 Representation:

Special Values

Top comments (0)