Barcodes for Developers: What the Stripes Actually Encode

#webdev #beginners #programming #tutorial

Every barcode you've ever seen is just a number rendered as a pattern of lines. That's it. There's no image recognition, no machine learning, no computer vision involved in reading a standard barcode. A scanner shines a light, measures the reflections, decodes the widths of the bars and gaps into digits, and looks up those digits in a database. The intelligence is in the database, not the barcode.

But the encoding schemes are more interesting than most developers realize. Here's what's actually happening inside those stripes.

UPC and EAN: the ones on products

The barcode on a can of soda is almost certainly UPC-A (12 digits, used in North America) or EAN-13 (13 digits, used everywhere else). UPC-A is actually a subset of EAN-13 with a leading zero.

An EAN-13 barcode like 5901234123457 encodes:

First 3 digits: country/region prefix (590 = Poland)
Next 4-5 digits: manufacturer code
Next 4-5 digits: product code
Last digit: check digit

The check digit is a weighted sum of the other 12 digits, using alternating weights of 1 and 3. This catches single-digit errors and most transposition errors.

function ean13CheckDigit(digits) {
  let sum = 0;
  for (let i = 0; i < 12; i++) {
    sum += parseInt(digits[i]) * (i % 2 === 0 ? 1 : 3);
  }
  return (10 - (sum % 10)) % 10;
}

The barcode itself uses a binary encoding. Each digit is represented by seven modules (black or white units). The left half uses two different encoding schemes (odd and even parity), and the pattern of odd/even encodings encodes the first digit. This is an elegant space-saving trick: the first digit is implicit in the encoding pattern of the other six left-side digits rather than being explicitly drawn.

Code 128: the general-purpose barcode

For inventory systems, shipping labels, and internal applications, Code 128 is the standard. It can encode all 128 ASCII characters, including lowercase letters and symbols, which makes it far more versatile than UPC/EAN.

Code 128 has three character sets (A, B, C) and can switch between them mid-barcode:

Set A: uppercase letters, digits, control characters
Set B: uppercase and lowercase letters, digits, common symbols
Set C: pairs of digits (00-99), making it efficient for long numeric strings

A clever encoder will switch between sets to minimize barcode length. For example, if you're encoding ABC123456, it might use Set B for ABC1 and then switch to Set C for 23, 45, 67 to compress the numeric portion.

Start_B  A  B  C  1  Switch_C  23  45  6  Check  Stop

The check digit in Code 128 is a modulo-103 weighted sum where each character's value is multiplied by its position (1-indexed). This makes the encoding position-dependent, which catches insertion and deletion errors.

Code 39: the simple one

Code 39 is older and less dense than Code 128, but it has one advantage: it's self-checking. Each character is encoded independently without reference to adjacent characters, and the encoding is designed so that a misread character doesn't produce a valid different character. This means Code 39 doesn't technically require a check digit, though one is often added.

Code 39 can encode digits, uppercase letters, and a handful of symbols (-, ., space, $, /, +, %). Each character uses five bars and four gaps, three of which are wide (hence the name: 3 of 9 elements are wide).

*CODE39*

The asterisk is used as a start/stop character and is not considered part of the data.

ITF (Interleaved Two of Five)

This encoding is used almost exclusively on corrugated shipping cartons. You've seen it on the outside of every cardboard box that comes off a shipping truck.

ITF encodes pairs of digits by interleaving one digit in the bars and the next digit in the spaces between them. This makes it very space-efficient but requires an even number of digits.

Generating barcodes programmatically

In JavaScript, the most popular library is JsBarcode:

import JsBarcode from 'jsbarcode';

// UPC-A
JsBarcode("#barcode", "012345678901", {
  format: "UPC",
  width: 2,
  height: 100
});

// Code 128
JsBarcode("#barcode", "Hello World", {
  format: "CODE128",
  width: 2,
  height: 100
});

In Python, python-barcode handles generation:

import barcode
from barcode.writer import ImageWriter

ean = barcode.get('ean13', '5901234123457', writer=ImageWriter())
ean.save('barcode')

The key decision is choosing the right format:

Products sold at retail: UPC-A or EAN-13 (requires a GS1 membership for official codes)
Internal inventory: Code 128 (most flexible)
Simple alphanumeric labels: Code 39 (widest scanner compatibility)
Shipping cartons: ITF-14
Healthcare: Code 128 with GS1-128 application identifiers

Common mistakes

Insufficient quiet zones. Every barcode needs blank space on both sides (the "quiet zone"). UPC-A requires at least 9 module widths of white space. If the barcode is too close to other printed elements, scanners can't find the start/stop patterns.
Poor contrast. Barcodes work by measuring the reflectance difference between bars and spaces. Black on white is ideal. Dark blue on white works. Red bars on white fails completely because barcode scanners typically use red laser light, and red bars reflect red light just like the white spaces do.
Scaling without maintaining proportions. Stretching a barcode horizontally changes the bar widths relative to each other, making it unreadable. If you need a larger barcode, scale both axes proportionally. Better yet, regenerate at the target size.
Wrong check digits. If you're constructing barcodes manually, an incorrect check digit makes the barcode scan but fail validation. Most point-of-sale systems reject invalid check digits.
Using 1D barcodes when you need 2D. If you need to encode URLs, vCards, or more than about 20 characters of data, you want a QR code or Data Matrix, not a linear barcode. Linear barcodes are best for short numeric or alphanumeric identifiers.

Barcodes vs QR codes

QR codes are two-dimensional. They encode data in a grid of black and white modules rather than in a single line of bars. A QR code can hold up to 4,296 alphanumeric characters. A Code 128 barcode becomes impractically long after about 20 characters.

QR codes also have built-in error correction (Reed-Solomon encoding) that lets them remain readable even if up to 30% of the code is obscured. Linear barcodes have no error correction -- any damage to any bar can prevent scanning.

Use linear barcodes when: you need maximum scanner compatibility, the data is short (under 20 characters), and industrial scanners are the primary reader. Use QR codes when: the data is longer, smartphones might be the reader, or error tolerance is important.

For quickly generating barcodes in any common format without setting up a library, I built a barcode generator at zovo.one/free-tools/barcode-generator that supports UPC, EAN, Code 128, Code 39, and ITF with adjustable dimensions.

Understanding barcode encoding isn't something most developers need daily, but when a scanning integration breaks or a label prints incorrectly, knowing what the bars actually represent makes debugging dramatically faster. It's one of those pieces of knowledge that pays for itself the first time you need it.

I'm Michael Lip. I build free developer tools at zovo.one. 350+ tools, all private, all free.