Jenny Shaw

Posted on Jan 6, 2020 • Edited on Jan 8, 2020

The Binary Numeral System (in Under 7500 Bytes)

#codenewbie #todayilearned

In this second article of my Unicode blog series, I'm going to introduce the binary system.

A computer accepts input, stores, processes, and outputs all data in binary -- that includes everything you consume on your computers, from the text and color on this page, to images, gifs, video, and sound.

So if we want to understand how Unicode is structured and how it's possible for us to have access and use over 150 scripts and over 3000 emojis, we need to understand a little about how binary works.

What are Binary Numbers?

Binary data is composed of 1s and 0s.

A binary digit, or a bit, is the smallest unit or building block of information that can be stored in a computer, and it can have one of two values, either 0 or 1. Varying sequences of bits strung together can each represent unique data. So, with more bits, we can compose more complex pieces of information.

Sizing Up

Decimal Values	Decimal Measure Names	Decimal Symbols
10⁰	Byte	b
10³	Kilobyte	KB
10⁶	Megabyte	MB
10⁹	Gigabyte	GB
10¹²	Terabyte	TB

Binary Values	Binary Measure Names	Binary Symbols
2⁰	Byte	b
2¹⁰	Kibibyte	KiB
2²⁰	Mebibyte	MiB
2³⁰	Gibibyte	GiB
2⁴⁰	Tebibyte	TiB

A byte consists of 8 bits. We'll discuss in more depth this later on as we discuss ASCII and Unicode, but a byte is typically enough to store 1 typed character. That means your longest tweet will usually max out at 280 bytes, or that 500-word cover letter I recently slaved over for a certain company was likely just under 3000 bytes, nearly 3 kilobytes -- but I won't dwell on that.

An image is composed of pixels each represented by a binary number. A high-resolution image can contain millions of pixels, which means that an image of that quality could be anywhere between 1 to 5 megabytes.

A video, basically a collection of sequential frames or images, run between 24 to 30 frames per second, and that's a lot more data than just one image! A YouTube video in HD could require as much as 12 megabytes per minute. But if we were to stretch to movie-length, then that could require as much as 4 gigabytes.

The amount of data required to fit a terabyte would be extraordinary. Check out the numbers from this blog and Dropbox and you'll quickly get the idea.

How Does the Binary System Work

I think it's easiest to explain the binary system by refreshing our memories with a system we're all already very familiar with.

Quick Review: The Decimal System

When we learned how to count bigger numbers in school, we were taught to memorize base 10 place values from right to left -- 1s, 10s, 100s, 1000s, and so on. And we learned that each place fit one digit, and each digit value or state could range from 0 to 9.

Base Ten	10³	10²	10¹	10⁰
Place Values	1000	100	100	1

So if we were to break down our present year, 2020, it would have a value of 2 thousands, 0 hundreds, 2 tens, and 0 ones.

Digit / State	2	0	2	0
Place Values	10³	10²	10¹	10⁰

So when we multiply the digits with their respective place values and sum them together, we get the value 2020.

Let's the math:

(2 * 1000) + (0 * 100) + (2 * 10) + (0 * 1)

= 2000 + 0 + 20 + 0

= 2020

Now Introducing: The Binary System

The binary system, on the other hand, has place values of base 2s.

Base Two	2⁷	2⁶	2⁵	2⁴	2³	2²	2¹	2⁰
Place Values	128	64	32	16	8	4	2	1

And the system has only 2 states available, 0 and 1.

Counting Up in Binary

So let's start counting. The first two numbers are easy --

0, 1...

Great! We made it this far -- But how do we count to the next sequence of numbers if we're limited to using only 0s and 1s?

Just like with the decimal system, we assign a state to each place value.

So, to count to two, we'd place a 1 over the 2s place.

Binary Number	0	0	1	0
Place Values	8	4	2	1

Let's find the decimal value here:

(1 x 2) + ( 0 x 1)

= 2 + 0

= 2

And to count to three, we'd place a 1 over the 2s and the 1s place.

Binary Number	0	0	1	1
Place Values	8	4	2	1

(1 x 2) + ( 1 x 1)

= 2 + 1

= 3

Good job! We can count to three!

Counting with the Bits You're Given

Currently, our tables contain 4 bits. If we were to max out all 4 bits, our largest value would be 15, (and the largest number of values 4 bits can store is 16 when we include 0).

Binary Number	1	1	1	1
Place Values	8	4	2	1

(1 x 8) + (1 x 4) + (1 x 2) + ( 1 x 1)

= 8 + 4 + 2 + 1

= 15

Handy Tip: If you have ten fingers to count with, you can count up to 1023 in binary!

Converting Decimal Numbers to Binary

Now, let's try converting our year into binary. Because 2020 is such a large value and well over 15, it's going to require a lot more bits.

	2¹¹	2¹⁰	2⁹	2⁸	2⁷	2⁶	2⁵	2⁴	2³	2²	2¹	2⁰
Place Values	2048	1024	512	256	128	64	32	16	8	4	2	1

To do this, we'll find the place that is equal our target value or the largest valued place that is less than our target value, and assign it a state of 1.

We subtract that place value, 1024, from our target, 2020. And that leaves us with the remainder, 996.

Then we repeat the process with the following place values. If our remainder is ever less than the place value, we assign it a state of 0 and move on to the next place. We continue this algorithm until we're left with a remainder of 0.

Here's our result!

	0	1	1	1	1	1	1	0	0	1	0	0
Place Values	2048	1024	512	256	128	64	32	16	8	4	2	1

Now we know -- 2020 is equal to 011111100100 in binary!

Note: Don't be fooled by place values. It's easy to see as above the 12th place of the binary system (2¹¹) having a value of 2048 and underestimate the potential of 12 bits. Don't forget that with 12 bits, by using every available place we can actually count up to 2¹² - 1. That's 4095 in decimal and nearly double in value! So remember that when we're provided with more bits, we get substantially more room for data.

Applying Binary to Character Encoding

If you read my previous blog on character encoding, you'll see that the binary system is important here because it is the language that computers communicate in and therefore essential for character encoding. And with more bits available, the more space there is to accommodate for more as well as more complex data, such as additional language scripts.

So, coming up in later posts, I'll discuss ASCII's 7-bit and Unicode's 16-bit encoding systems, how those systems and charts work, and what potential each of those systems can offer technologically as well as socially.

The Future of AI, LLMs, and Observability on Google Cloud

Datadog sat down with Google’s Director of AI to discuss the current and future states of AI, ML, and LLMs on Google Cloud. Discover 7 key insights for technical leaders, covering everything from upskilling teams to observability best practices

Learn More