Today, I'm going to explain how computers store floating-point numbers. Computers cannot store floating-point numbers exactly the same way we usually write them in decimal format. Instead, they store everything using binary digits (0 and 1). For this purpose, the IEEE 754 standard was created. It describes how floating-point numbers are stored in binary format.
A floating-point number in IEEE 754 has three parts:
- Sign
- Exponent
- Mantissa (Fraction)
Today, we will look at just two types of floating-point number formatting because they are the most common.
- Single precision (32 bits)
- Double precision (64 bits)
Single precision

A 32-bit floating-number is divided as follows:
- 1 bit for the sign
- 8 bits for the exponent
- 23 bits for the Mantissa / Fraction
- Sign bit:
- 0 is positive number
- 1 is negative number
-
Exponent:
Stored using a bias of 127.
-
Mantissa (fraction):
Represents the fractional part, with an implicit leading 1 (for normalized numbers).
Double Precision
A 64-bit floating-number is divided as follows:
- 1 bit for the sign
- 11 bits for the exponent
- 52 bits for the Mantissa / Fraction
- Sign bit:
- 0 is positive number
- 1 is negative number
-
Exponent:
Stored using a bias of 1023.
-
Mantissa:
Represents the fractional part, with an implicit leading 1 (for normalized numbers).
Example: Representing 4.5
Binary representation:
0 10000001 00100000000000000000000
- sign is 0 (positive)
-
exponent =
10000001(binary) = 129 (decimal)Actual exponent = 129 - 127(bias) = 2
-
fraction =
00100000000000000000000This represent the mantissa:
We will use the formula:Calculation:
How to convert decimal to IEEE 754
1. Take the absolute value first:
Take the number 3.8125 as an example:
Convert to binary:
- 3 = 11₂
- 0.825 = 0.1101₂ So: 3.825 = 11.1101₂
2. Normalize (Scientific Binary Form):
We write it as:
3. Sign bit:
Number is negative -> sign = 1
4. Exponent:
Use bias (127):
Binary:
5. Mantissa:
Take the fractional part after the leading 1:
1.11101 -> 11101000000000000000000
(pad with zeros to 23 bits)
Result:
We got the value 1 10000000 11101000000000000000000.
We can easily represent -3.8125 as a floating-point number in IEEE 754. However, we cannot represent 0.1 exactly, unlike -3.8125.
How to convert IEEE 754 to decimal
This time, let's try converting the number 0.1 to IEEE 754:
1. Convert to binary:
That's where the complications begin. We need to repeatedly multiply by two until zero appears after the decimal point:
As a result, we will get an infinite value:
2. Normalize:
3. Sign Bit
Positive -> sign = 0
4. Exponent
Binary:
5. Mantissa:
Take 23 bits after leading 1:
10011001100110011001100
We cut (round) the infinite sequence here
Final IEEE 754 Representation:
0 01111011 10011001100110011001101
But when we convert this value into decimal we get 0.10000000149011612
That’s why 0.1 plus 0.2 not equal 0.3 exactly instead 0.30000000000000004
Special Values
-
+0 / -0:
Exponent: all 0s
Mantissa: all 0s
Meaning: Positive/Negative zero
0 00000000 00000000000000000000000 = +0
1 00000000 00000000000000000000000 = -0
-
Denormalized:
Exponent: all 0s
Mantissa: non-zero
Meaning: Very small numbers
0 00000000 10000000000000000000000 = 5.877471754111438e-39
-
±Infinity:
Exponent: all 1s
Mantissa: all 0s
Meaning: Overflow / division by zero
0 11111111 00000000000000000000000 = +INF
1 11111111 00000000000000000000000 = -INF
-
NaN:
Exponent: all 1s
Mantissa: non-zero
Meaning: not a number
0 11111111 10000000000000000000000 = NaN






Top comments (0)