Original post on Kodumaro.
I’ve seen people not understanding the difference between binary and text.
In this context, binary and text are ways the data is represented.
The binary protocol uses bytes to represent data as directly as possible.
Let’s take the number 98 765. Converting from decimal to binary one gets:
C has a type named
int that represents integers as 4-byte data. Regrouping the data above in 8-bit chunks (bytes), and completing the 4 bytes with zeros:
Counter-intuitively, the simplier way to represent binary data isn’t binary, but hexadecimal. Each 4 bits turns directly into a hexadecimal digit, so every byte is represented as 2 hexadecimal digits. So we got:
The mathematical representation of numbers is big-endian, which means, the more representative digit comes first. Network uses big-endian too.
Most of the operating systems are little-endian, which means, the inverted order from big-endian:
Considering we’re building a binary protocol. It needs to supply the data type in order to identify which kind of data we’re talking about.
For simplicity (something like SDXF), let’s use 1-byte type tag. For informing integer, let’s use the number 2:
Text data are fully represented as strings. So the number 98 765 is represented as the characters needed to write the decimal number:
Now it needs a way to tell the system where the string ends. C uses the null character as stop one:
Pascal starts the string with 2 bytes telling the string length:
But, in text protocols, we cannot use non-text bytes (like 05₁₆), so we need to represent it as text too, using a separator (CR, for example) to distinguish the length from the data:
Let’s put it all in perspective:
And what about strings? Over a binary protocol, strings are easily represented, one just needs a type flag, the string length, and the string itself.
Or, in a network transmission:
Over a text protocol, it usually uses a terminator character on both ends. The most common is the quotation mark (
I hope you can see how text representations are heavier than binary ones. Things get worse when considering the protocol headers.
Everytime it’s possible, prefer binary representations over text.