Big-Endian vs Little-Endian

#webdev #tutorial

A single byte has no ordering problem — it is just eight bits. But the moment you store a number that needs more than one byte, the machine has to decide which byte goes first. That decision is called endianness, and it quietly underpins everything from file formats to TCP packets.

What the two orders actually mean

Take the 32-bit hexadecimal value 0x12345678. It is four bytes: 12, 34, 56, and 78. The byte 12 is the most significant (it carries the largest place value), and 78 is the least significant. The only question is which one lands at the lowest memory address.

Big-endian stores the most significant byte first, at the lowest address. Reading memory from low to high, you see the bytes in the same order you would write them on paper:

address:  +0   +1   +2   +3
big-end:   12   34   56   78

Little-endian stores the least significant byte first, at the lowest address. The bytes appear reversed relative to how you write the number:

address:  +0   +1   +2   +3
little:    78   56   34   12

Both layouts represent the exact same value. Nothing is "backwards" in a moral sense — they are just two conventions, like driving on the left or the right. The names come from Jonathan Swift's Gulliver's Travels, where two factions go to war over which end of a boiled egg to crack first. Danny Cohen borrowed the metaphor in a 1980 paper to describe exactly this kind of arbitrary-but-consequential disagreement.

Where it matters in practice

For most code you will never type the word "endianness." Within one running program, the CPU reads and writes its own native order consistently, so the layout is invisible. The trouble starts when bytes cross a boundary and a different reader has to interpret them.

The big split is between hardware and the network. The dominant desktop and server architectures — x86 and x86-64 — are little-endian, and most ARM chips run little-endian too (ARM can technically switch, but in practice almost everything you touch is little-endian). Internet protocols, however, standardized on big-endian decades ago. That convention is so entrenched it has a name: network byte order is big-endian. IP addresses, port numbers, and packet length fields all travel most-significant-byte-first.

So a little-endian machine sending a 16-bit port number over the wire must flip the bytes, and the receiver must flip them back. The C standard library gives you named helpers for this so you never have to think about which direction to swap:

uint32_t net  = htonl(host_value);   // host  -> network (big-endian)
uint32_t host = ntohl(received);     // network -> host
// htons/ntohs do the same for 16-bit values

On a big-endian host these functions are no-ops; on a little-endian host they perform the swap. Writing them either way keeps your code portable without special-casing the architecture.

Endianness only matters when bytes are reinterpreted by something other than the CPU that wrote them — saved to a binary file, sent over a network, or shared with a different architecture. As long as a value lives and dies inside one machine's memory, you can stay blissfully unaware. Bugs appear when you serialize raw memory on one box and read it back on another with the opposite order.

The same applies to file formats. Some, like classic Windows BMP, are little-endian; others, like PNG and many network-oriented formats, are big-endian. A well-designed format simply documents its order and sticks to it, which is why robust serialization code reads and writes fields one byte at a time with explicit shifts rather than dumping a struct straight to disk.

How to stay safe

The reliable habit is to never assume the reader shares your byte order. When you control the format, pick an order, write it down, and convert explicitly at the edges. Most languages provide tools: Python's struct module takes a format prefix (> for big-endian, < for little-endian), and many languages expose byte-swap intrinsics. The cost of a byte swap is negligible; the cost of a silent corruption that only shows up across architectures is not.