In this second article of my Unicode blog series, I'm going to introduce the binary system.
A computer accepts input, stores, processes, and outputs all data in binary -- that includes everything you consume on your computers, from the text and color on this page, to images, gifs, video, and sound.
So if we want to understand how Unicode is structured and how it's possible for us to have access and use over 150 scripts and over 3000 emojis, we need to understand a little about how binary works.
Binary data is composed of 1s and 0s.
A binary digit, or a bit, is the smallest unit or building block of information that can be stored in a computer, and it can have one of two values, either 0 or 1. Varying sequences of bits strung together can each represent unique data. So, with more bits, we can compose more complex pieces of information.
|Decimal Values||Decimal Measure Names||Decimal Symbols|
|Binary Values||Binary Measure Names||Binary Symbols|
A byte consists of 8 bits. We'll discuss in more depth this later on as we discuss ASCII and Unicode, but a byte is typically enough to store 1 typed character. That means your longest tweet will usually max out at 280 bytes, or that 500-word cover letter I recently slaved over for a certain company was likely just under 3000 bytes, nearly 3 kilobytes -- but I won't dwell on that.
An image is composed of pixels each represented by a binary number. A high-resolution image can contain millions of pixels, which means that an image of that quality could be anywhere between 1 to 5 megabytes.
A video, basically a collection of sequential frames or images, run between 24 to 30 frames per second, and that's a lot more data than just one image! A YouTube video in HD could require as much as 12 megabytes per minute. But if we were to stretch to movie-length, then that could require as much as 4 gigabytes.
I think it's easiest to explain the binary system by refreshing our memories with a system we're all already very familiar with.
When we learned how to count bigger numbers in school, we were taught to memorize base 10 place values from right to left -- 1s, 10s, 100s, 1000s, and so on. And we learned that each place fit one digit, and each digit value or state could range from 0 to 9.
So if we were to break down our present year, 2020, it would have a value of 2 thousands, 0 hundreds, 2 tens, and 0 ones.
|Digit / State||2||0||2||0|
So when we multiply the digits with their respective place values and sum them together, we get the value 2020.
Let's the math:
(2 * 1000) + (0 * 100) + (2 * 10) + (0 * 1)
= 2000 + 0 + 20 + 0
The binary system, on the other hand, has place values of base 2s.
And the system has only 2 states available, 0 and 1.
So let's start counting. The first two numbers are easy --
Great! We made it this far -- But how do we count to the next sequence of numbers if we're limited to using only 0s and 1s?
Just like with the decimal system, we assign a state to each place value.
So, to count to two, we'd place a 1 over the 2s place.
Let's find the decimal value here:
(1 x 2) + ( 0 x 1)
= 2 + 0
And to count to three, we'd place a 1 over the 2s and the 1s place.
(1 x 2) + ( 1 x 1)
= 2 + 1
Good job! We can count to three!
Currently, our tables contain 4 bits. If we were to max out all 4 bits, our largest value would be 15, (and the largest number of values 4 bits can store is 16 when we include 0).
(1 x 8) + (1 x 4) + (1 x 2) + ( 1 x 1)
= 8 + 4 + 2 + 1
Handy Tip: If you have ten fingers to count with, you can count up to 1023 in binary!
Now, let's try converting our year into binary. Because 2020 is such a large value and well over 15, it's going to require a lot more bits.
To do this, we'll find the place that is equal our target value or the largest valued place that is less than our target value, and assign it a state of 1.
We subtract that place value, 1024, from our target, 2020. And that leaves us with the remainder, 996.
Then we repeat the process with the following place values. If our remainder is ever less than the place value, we assign it a state of 0 and move on to the next place. We continue this algorithm until we're left with a remainder of 0.
Here's our result!
Now we know -- 2020 is equal to 011111100100 in binary!
Note: Don't be fooled by place values. It's easy to see as above the 12th place of the binary system (211) having a value of 2048 and underestimate the potential of 12 bits. Don't forget that with 12 bits, by using every available place we can actually count up to 212 - 1. That's 4095 in decimal and nearly double in value! So remember that when we're provided with more bits, we get substantially more room for data.
If you read my previous blog on character encoding, you'll see that the binary system is important here because it is the language that computers communicate in and therefore essential for character encoding. And with more bits available, the more space there is to accommodate for more as well as more complex data, such as additional language scripts.
So, coming up in later posts, I'll discuss ASCII's 7-bit and Unicode's 16-bit encoding systems, how those systems and charts work, and what potential each of those systems can offer technologically as well as socially.