In this article, we deal with items 3-4 of the following key skills that build up to understanding the floating point system:

- Define and identify the most and least significant bits in a word. Count how many combinations are possible in a word given its bit length.
- Encode a positive integer in binary. Decode an unsigned integer to decimal.
- Encode an integer in two's complement notation. Decode a signed integer to decimal.
- Use sign extension to expand the number of bits in a word. Explain the mathematics.
- Convert a decimal fraction to binary. Explain why this does not require any particular encoding scheme, such as a floating point standard.

# Two's complement notation

Two's complement notation is a practical and widely-used way to represent signed integers. This notation is not tied to any particular computer architecture, real or abstract.

The only change from unsigned notation is that the most significant bit (the 31st bit in our case) has the same magnitude but a negative sign: If we use 32-bit words, then its value is -2^31 , not 2^31 .

Because of that, the available range is shifted by -2^31 units to the left. In unsigned notation, the range of representation was [0, 2^32 - 1]. In signed notation, the range of representation is [-2^31, 2^31 - 1]. Note that 2^32 is exactly twice the value of 2^31 , since they are subsequent powers of two (the base 2 makes them twice as different). Hence, 2^32 - 2^31 - 1 = (2^31 + 2^31 ) - 2^31 - 1 = 2^31 - 1.

It takes practice to understand how this change affects arithmetic. Let's get into practice.

## Practice

**QUESTION 1** Convert -56 to a signed binary integer.

**Answer**: We use the same process we used in unsigned notation. The "smaller" as in "closest smaller power of two" still means actually smaller (not absolute value).

In this case, -64 is the closest power of two to -56 that is smaller than it. Use the subtraction: -56 - (-64) = 8. -64 is -2^6 .

8 happens to be the third power of two.

Use bit positions 6 and 3: `100 1000`

. 8 - 64 is indeed -56.

If we were to use all eight bits of the byte, then the answer would be `1100 1000`

. -128 + 64 + 8 is also -56. This is not a coincidence: We will explore the *sign expansion* technique in a bit.

**QUESTION 2** Convert `101 1001`

to a decimal.

Add the powers of two. The most significant bit has a negative value assigned. So, the summation looks like: -2^6 + 2^4 + 2^3 + 2^0 = -39.

What about `0101 1001`

? Its summation looks like 2^6 + 2^4 + 2^3 + 2^0 = 89. So, this is not the same as before. In fact, it is a positive number.

Lastly, consider `1101 1001`

. Its summation looks like -2^7 + 2^6 + 2^4 + 2^3 + 2^0 = -39.

So, we put as many 1's to the left as possible if we wanted to expand the number of bits?

**QUESTION 3** There is a byte position in your memory that reads `0101 0100`

. You are asked to expand it to a four-byte word.

The given notation represents the number 84. We could either make it `1111 1111 1111 1111 1111 1111 0101 0100`

or `0000 0000 0000 0000 0000 0000 0101 0100`

. Let's determine the values of these competing representations.

The first represents -172. The second represents 84. In hindsight, it is clear that the first one would never work in the first place because the most significant digit is 1---it represents a negative integer. So, we should use the second representation.

# Sign expansion

As we see through examples, if we have a representation like `y...`

, and if we wanted to increase the number of bits in it, then we do not either just add 0s or 1s to the left of the MSB: The result may or may not equal the original number.

In fact, if we have `y...`

, then the correct expansion is `y...y...`

. That is, duplicate the MSB to the left.

For example, if the byte representation `1111 1111`

was to be now represented by a 2-byte string, then the result should be `1111 1111 1111 1111`

. In case of `0000 0000`

, it should be `0000 0000 0000 0000`

.

## The Mathematics

The technique of sign expansion relies on the fact that -2^n+1 + 2^n (`11...`

) is the same as -2^n (`1...`

). The following proof draws from the idea that they are powers of two, so one of them has a magnitude (absolute value) exactly twice as great as the other.

-2^n+1 + 2^n =?= -2^n

-( 2^n + 2^n ) + 2^n =?= -2^n

-2^n =?= -2^n

True.

# Bonus: Overflow detection when all operands agree on sign

If all operands have the same sign (positive or negative), then if the result has a different sign, then overflow happened.

How? If we add all positive numbers, then the result should be positive. If it is negative, then overflow happened, since it is arithmetically impossible.

## Practice

**QUESTION 1** Sign-expand the 16-bit word `1101 0101 0010 0000`

to 32 bits.

**Answer**: `1111 1111 1111 1111 1101 0101 0010 0000`

.

**Question 2** Is this statement true or false, in all cases? "If overflow did not happen, then at least one operand must had had a different sign than the rest."

**Answer**: False. `0001 + 0001 = 0010`

. No overflow, but all operands agree on sign.

**Question 3** "If overflow did not happen, then either or both of the following must be false: There was a sign agreement among the operands; the result was the same sign."

**Answer**: True by contraposition.

## Top comments (0)