The content in this post builds off my last piece, Bits, Bytes, Building With Binary. If you are new to this topic, I'd recommend reading that piece first!
There are some numbers that come into your life and turn everything upside down. There are some numbers that make you see things in a completely different perspective. There are some numbers that you become a little bit obsessed with and decide to write a blog post about.
You know which numbers I'm referring to, right?
No? Ah, well let me introduce you!
Last week, we learned about binary, a number system that is based on a simple principle: each digit can only have two possible values, 1 or 0. Because computers are comprised of transistors, or circuits, which, at the root of it, are just on/off switches, binary is a great number system and language for a computer. We obviously don't write in binary; there are levels of abstraction that do the compiling and converting for us.
But one of the things about binary that is crucial to all of those “levels of abstraction are the units of measurement that binary gives us. Eight digits in binary translate to a byte, and bytes can be strung together to form kilobytes, megabytes, gigabytes, terabytes, and on and on and on (I'm skipping some units in the middle there, but I think you get the idea).
Yet there's this slightly annoying issue with binary: you need a lot of digits to represent just one single byte. To be more specific, you'd need 8 digits, or 8 bits.
Okay, but hang on: 8 digits per character perhaps isn't so bad. But what about if you wanted to represent a word? How many bits would you need? What if you wanted to represent your name? What if we wanted to represent my name?
Well, let's try:
_“vaidehi”_compiled down directly to binary is
01110110 01100001 01101001 01100100 01100101 01101000 01101001
Yes, yes, but I'd like to capitalize my name, please. How do I represent that?
“Vaidehi” compiled down to binary directly is
01010110 01100001 01101001 01100100 01100101 01101000 01101001
Not great. Not that readable. Definitely not short. My name is equal to 56 bits (digits), or 7 bytes, or a really terrible headache, depending on how long you stare at all of those 1s and 0s (do not recommend!).
My point here is this: a byte is a powerful unit, sure, but remember: one byte can only represent a single character. So we can imagine the sheer number of bits that we'd need in order to represent an image or a gif or a video! Actually, that number is so big that I literally can't even imagine it.
All of this leads us to wonder if there is a better way to represent characters and words before they get compiled down to binary by the computer, right? There must be a way to represent these things so that they're both more human-readable but also can be converted by our machines?
Yep. There totally is a better way. In fact, there are a few ways, and a lot of them come up again and again in computer science. Remember those levels of abstraction? Well, there are some magical numbers that are the cornerstones of how we abstract away bits and bytes of information into easier-to-read pockets of information.
How on earth did I know how to represent my name in binary?! We only learned how to convert from decimals (base 10) to binary; how do you convert from letters to binary? What magic is this?!
Well, it's not magicâ€Š–â€Šit's an abstraction! And in this case, the abstraction we're dealing with is called encoding. Encoding is a standardized way of translating between two thingsâ€Š–â€Ša bit like a Rosetta Stone for different number systems instead of languages.
I really love how David Zentgraf explains and defines encoding in his blog:
To use bits to represent anything at all besides bits, we need rules. We need to convert a sequence of bits into something like letters, numbers and pictures using an encoding scheme, or encoding for short.
So, what set of rules did I follow to convert from letters into binary? I used an encoding scheme that you might have already heard of: ASCII encoding.
ASCII encoding is a set of rules that allows us to translate certain characters into decimal numbers.
There are 95 “human readable characters that ASCII lets you translate between: the numbers 0–9, the English alphabet (the letters a-z in lowercase and in uppercase), a few different punctuation marks, math symbols, and other special characters. Interestingly, you can also translate spaces, tabs, backspaces, deletes, and new lines, which are incredibly important (even if it might not seem immediately obvious) since computers need to know exactly when and where in a text these actions take place.
The ASCII encoding scheme allows for 128 possible “translations”, which means that everything in ASCII, when converted to decimals, has to fall between the numbers 0 and 255. But we'll come back to this in a little bit.
Using the ASCII table above, it's hopefully a little clearer how to translate my name (with a capital V). We can take it two steps at a time, converting first into decimal, and then into binary.
Vaidehi decoded from ASCII into decimals:
Vaidehi converted from decimals into binary:
Do you notice anything interesting in the binary representation of my name? Even if we didn't know how to convert from binary, there are two things that we can deduce just by looking at these numbers:
- Each character is represented in binary with 8 digits; that is to say, each character requires 8 bits or 1 byte of information to represent it.
- Each of these characters in binary start with the number 0.
Which means that each of these characters really only requires 7 bits then, yes? That first digit seems to always be going unused, in every single one of the letters of my name. Seems like an awful waste of an entire bit to me.
Well, it turns out that I'm not the only one to have had this thought.
If we take a second look at that ASCII table, one thing becomes pretty clear: that's really not a lot of characters to work with. Where is the Ã©? Or the Ã¸? And what about Ã¦? How will I ever represent SmÃ¸rbrÃ¸ad so that my computer can convert it to binary?
We already know that ASCII (as it was first created) only allowed for 128 possible permutations. In that case, the extra 0 at the beginning of each ASCII-converted binary byte in my name makes a bit more sense: you only need 7 bits to represent 128 different possibilities. (Remember those powers of 2? 2 to the power of 7 is 128, 7 bits used together at any given time will always result in 128 possibilities). So, that first 0 can just go unused, effectively.
But what if we didn't leave that first bit, that first digit, completely unused? What would happen?
Well, let's do the math:
7 bits is the same as 2 to the power of 7. Or, 128 possibilities.
8 bits is the same as 2 to the power of 8. Or, 256 possibilities.
And it was exactly that math that lead to the ASCII encoding scheme to be extended! Here's what it looks like:
The extended ASCII table added another 128 possibilities to the original ASCII encoding schema, and even left some room for more character possibilities in the future! If you think about it, this is pretty cool considering that we can pack in double the encodings with just one extra bit. Bits are powerful, my friends.
Before we go memorizing the ASCII table, it's worth mentioning that things can be encoded in many different ways! Sure, ASCII is a very popular encoding scheme, and you'll probably see it a lot because of the fact that it's easy to recognize by its leading 0's. However, beware: there are other ways to encoding characters, too! Not everything is encoded using ASCII, but one thing is for certain: all encoding schemes allow us to simplify how we convert between characters and binary by simply adding rules as a layer of abstraction right in between.
So 128 and 256 are both pretty rad numbers, and it's good to know where they might show up. But they were really just setting the stage for another number, probably one of the most important in software, that's hidden in all the little nooks and crannies of the web and even within your own machine.
The number I'm talking about is the number 16. And, hopefully by the time you get to the end of this post, you'll love it just as much as I do.
We know that encoding is one form of abstraction between specific characters and their binary translations. But there's another abstraction that's commonly used in computer science, and it comes from another number systemâ€Š–â€Šthe base 16 number system, or hexadecimals.
Similar to base 10 and base 2, the base 16 number system hasâ€Š–â€Šyou guessed itâ€Š–â€Š16 possible digits per place. You can have 16 possible digits in one place, and the digits go like this:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
I'm serious. Really! If you wanted to represent 12 items in base 16, you'd say I have C items over here (and, I guess, hope that the person who you were talking to knew hexadecimals, too). It might seem a little bit odd at first, but once you get used to it, the power of hexadecimals becomes clear.
In order to represent the decimal number 205 in binary (base 2), we need 8 digits. But, to represent it in hexadecimals (base 16), we only need 2 digits. The rule of thumb here is that the higher the base, the fewer place values you need in order to represent a number.
If you think about it, it make sense: if we only have 2 possible options per place (for example, in binary, when we can choose between 0 and 1), we need more digits to represent a higher number, because the permutations per place value are very small. On the other hand, if we have many possible options per place (like in hexadecimals, when we have 0-F), we have more permutations possible per place value, so we can represent larger numbers without having to add another digit.
Okay, so we need fewer digits. I guess that's cool. But why is that important?
It all comes back to this: 8 digits equal a byte. If we can represent a byte with fewer digits, we can contain a lot more potential pieces of information using a relatively small amount of space.
This is exactly what makes hexadecimals so powerful. And that's why, if we look in the right places, we can see them everywhere, out and about and in the wild of the internet.
If you've ever built a website, worked with a designer, or inspected a webpage, you probably have seen a hex code or two.
A hex is used to specify colors on a website, and can be used inline on the HTML of a webpage, or more commonly, within the CSS (the stylesheet) of a webpage.
Interestingly, hex's aren't the only way to specify colors; it's just one color model. We can also specify the color of something on a webpage or application by using the rgb color model. They're equivalent, and will always give us the same color, it's just that the syntax for writing the color is slightly different.
Computers always specify colors using amounts of red, green, and blue. They add together however much red, green, and blue we specify, and the combination of those three colors will render whatever color we want on the screen.
Okay, okay, what does this have to do with hexadecimals though? The answer to that is: well, apparently, everything!
Let's take a look at the relationship between hex codes and hexadecimals by unpacking one of my favorite colors: Medium Candy Apple Red!
The hex code for this shade of red is #EC152E. We could also represent this in the rgb model as rgb(236, 21, 46). If they're functionally equivalent, and will always give us the same color at the end of the day, how are those two ways of writing the same color connected?
I won't get into the how of converting hexadecimals into decimalsâ€Š–â€Šit's exactly the same as converting binary, but with many more digits per place)â€Š–â€Šbut this is essentially the relationship between the two: each consecutive pair of digits in the hex code is converted into a decimal in order to represent it in rgbformat.
And remember how we learned earlier how we can represent a single byte (8 binary digits) with just 2 digits in hexadecimal? Well, that's important here, too. Because that means that a single hex code contains an entire color value, but only takes up 3 bytes (or 24 bits) of information.
If you've ever wondered why the rgb color model only allows values between 0 and 255, now you know the reason why! It's because each of those numbers in rgb is one byte, or 8 bits. We already know that 8 bits results in 256 possible combinations, so now the logic behind the numerical limitation per color value (between 0–255) should hopefully be a bit more clear.
But what I think is the most cool thing about hexadecimals used to represent color is that in just 6 hexadecimal digits, we can account for over 16 million possible colors in the spectrum. Color me surprised, because I never knew that until now!
Another place you might have encountered hexadecimals is in magic debug values or hexspeak, which are really nothing more than hexadecimals that are written (or allocated) to memory when a program runs. The most common use case for magic debug values are when a program crashes; these hexadecimals are so commonly used by developers that they've become “reserved in a wayâ€Š–â€Šthat is to say, they are only ever used to indicate to the programmer who is running or debugging the code: “Something went wrong!”
There are lots of examples of these debug codes, but some of the more famous ones are “DEADBEEF”, “DEADC0DE”, and “D15EA5E”. And yes, this was part of where Leetspeak came from!
You can also find hexadecimals in the newest protocol version's formatting of IP addresses! The Internet Protocol version 6 (IPv6) was released in the late nineties, when it became clear that we were very quickly going to run out of unique IP addresses for every device on the planet! The new IPv6 way of representing an IP address is through eight group of four hexadecimal digits (2 bytes or 16 bits in each group), and might look something like this:
Just imagine converting that down into binary! That's exactly what happens in order for your machine to process each bit, and it all happens inconceivably fast. I don't know about you, but thinking about that makes my head spin and I feel super grateful to have the simple abstraction of the number 16.
If you enjoyed reading about hexes and magical numbers, check out these resources below to keep learning more!
- What every programmer absolutely, positively needs to know about encodings and character sets to work with text, David Zentgraf
- How does hexadecimal color work?, StackOverflow
- Hexadecimal Numbers, Peter Nayland
- Hexadecimal and character sets, BBC Bitesize
- Decimal to Hexadecimal, Khan Academy
This article originally published on my Medium publication