ErrorGamer2000

Posted on

# Binary File Formats Explained

When I first began researching binary file formats, I was met with a complete absence of human-friendly explanations anywhere online. All of the resources that I came across were full of unfamiliar technical terms that made me feel like I was reading a textbook written for someone with twice my IQ. Hence, this article. This is a summary of my research, explained in English so that you can understand it without the trouble that I had. Now, without further delays, let's get into the learning!

## All of Those Confusing Technical Terms

As I said, there were a number of technical terms that I encountered in my research that I had never heard of before. these made little sense to me at the time, and took even more painful research. Here is a list of those terms, explained in friendly ways:

• Binary - binary is a number system that has a base of `2`. That means that the only digits it uses are `0` and `1`.
• Bit - a bit is the smallest unit of data in a computer, consisting of nothing more that a single binary digit (a `0` or `1`).
• Byte - a byte is the next unit of data in a computer. A byte consists of `8` individual bits.
• Signed Integer - a signed integer is an integer (whole number) that is associated with a sign that declares whether the number is positive or negative. Basically, it is an integer with a `+` or `-` attached to it.
• Unsigned integer - an unsigned integer is an integer that is not associated with a sign(`+` or `-`), and is always considered to be a positive value.

## What exactly is a file, anyway?

A file is a collection of bits that is stored within a computer's memory. Files are generally separated into bytes, and measured by the number of bytes that they contain (Kilo*bytes*, Mega*bytes*, Giga*bytes*, etc.).

## Data Types

Before we can truly begin working with binary file encoding, we need to understand the types of data that can be stored within a file. there are two main types of data in a file: integers and strings.

### Integers

Integers are separated into two sub-types, signed and unsigned, as I explained at the beginning of the article.

Unsigned Integers, thanks to the lack of a sign, can store a number with roughly twice the maximum value than that of a signed integer.

Note: unsigned integer is abbreviated as `Uint` in almost every programming situation.

Signed Integers(`Int`s) are pretty much exactly the same, aside from their reduced maximum value.

Both `Uint`s and `Int`s are found in different sizes in binary files, and are generally named based on how many bits they use to store their value. For example, `Uint8` and `Int8` both use `8` bits in the file, `Uint16` and `Int16` use `16` bits, and so on. These numbers have different value limits based on the number of bits that they use:

Type Minimum Value Maximum Value
`Uint8` 0 255
`Int8` -128 127
`Uint16` 0 65,535
`Int16` -32,768 32,767
`Uint32` 0 4,294,967,295
`Int32` -2,147,483,648 2,147,483,647
`Uint64` 0 18,446,744,073,709,551,615
`Int64` -9,223,372,036,854,775,808 9,223,372,036,854,775,807

These are not the only sizes that can be used, however, and you may sometimes encounter some oddball sizes. Here are a couple that I ran into:

Type Minimum Value Maximum Value
`Uint4` 0 15
`Uint24` 0 8,388,608

### Strings

Strings, as you may already know, are ordered sets of text characters. When a string is stored in a binary file, it is converted into a set of bytes, with each byte storing one `utf-8` character in each byte as a `Uint8` character ID.

## Endianness

Endianness is one of the more difficult concepts to grasp, so I'm going to take an example-based approach here. Let's say I have a variable that is a `Uint16` with the value of `255`. In order to store this variable in a binary file, if must first be converted into a set of bytes. Because our variable is a `Uint16`, taking up 16 bits of space, it requires two bytes in the file to store. But ... Which of those two bytes comes first in the file? This is where endianness comes into play.

### High and Low Bytes

Let's convert our `Uint16` variable into binary, making sure to keep 16 digits: `0000000011111111`. Notice how the number is split evenly between `0`s and `1`s. Each of those groups is a single byte. The byte on the left (the `0`s) is the high byte, meaning it comes first in the normal order of the value. The byte on the right (the `1`s) is called the low byte. As you go across the bytes from left to right, the bytes go from higher to lower "order". So, in this number, the left byte is of a higher order than the right byte.

Now, there are two types of "endianness" that a number can be encoded in: little endian and big endian format. A number that is encoded in little endian format will have the bytes ordered from highest to lowest, meaning that the lowest("littlest") byte comes at the end. Numbers encoded in big endian format are ordered in the reverse of little endian numbers, meaning that the highest("bigest") byte comes at the end instead.

Returning to our `Uint16` variable, here is what the number will look like when in the file:

Endianness Byte 0 Byte 1
little 00000000 11111111
big 11111111 00000000

## Generic File Format

From here on, we can only discuss the general patter of binary file formats, so keep in mind any file format you see does not need to adhere to what is discussed here.

In most binary files, the data is generally split into two main types of sections: the file header and binary data blocks.

### Data Blocks

The rest of a file, after the header, is generally devoted to data blocks. Each data block can be either a fixed or variable size, and will commonly have it's own header that tells the program that is parsing the file how to use the data inside the block, as well as the lenght of the block if it is a variable size.

Hope that I have successfully explained how binary file formatting works. If you have any questions or see an issue with a part of this article, feel free to drop a comment below and I will fix it as soon as I can. Happy Hacking!

## Feeling Generous?

I am a 17 year old, self-taught, web developer trying to make a living while stuck with oppressive parents, and trying to find a way to pay for college while not being allowed to have a job. I would appreciate any donations.

## Not Going to Donate?

Even if you don't donate, simply liking this post or sharing it with anyone that might find it useful is a huge help.