Eresh Gorantla

Posted on Mar 21

I Built a QR Code Encoder From Scratch in TypeScript — Here's How It Works

#javascript #typescript #webdev #programming

Try it yourself: devprix.dev/tools/qr-code-generator

This is part of DevPrix — 56 free developer tools that run entirely in your browser. No sign-up, no tracking, no server calls.

Most developers reach for a library when they need QR codes. I decided to implement the entire QR encoding pipeline from scratch — Reed-Solomon error correction, Galois Field arithmetic, mask pattern optimization, and all. Here's a deep dive into what it took.

The full implementation is of TypeScript with zero dependencies, running entirely in the browser at devprix.dev/tools/qr-code-generator.

What a QR Code Actually Is

A QR code is a 2D barcode that encodes data in a grid of black and white squares called "modules." But there's a lot more going on than just turning data into dots. The encoding pipeline has six stages:

Data analysis — determine what you're encoding
Bit stream encoding — convert data to a binary stream
Error correction — add redundancy using Reed-Solomon codes
Matrix construction — place structural patterns and data
Masking — apply patterns to improve readability
Format information — encode metadata about the QR code itself

Let's walk through each one.

Stage 1: Version Selection

QR codes come in versions 1 through 40. Version 1 is 21x21 modules, and each version adds 4 modules per side (Version 2 = 25x25, Version 10 = 57x57, etc.). The version determines how much data you can store.

Each version also has four error correction levels:

L (Low) — recovers ~7% damage
M (Medium) — recovers ~15% damage
Q (Quartile) — recovers ~25% damage
H (High) — recovers ~30% damage

Higher error correction means less data capacity but more resilience. My implementation supports versions 1-10, which handles up to ~420 characters at the lowest error correction — more than enough for URLs, WiFi credentials, and contact cards.

The encoder automatically picks the smallest version that fits the data:

function chooseVersion(dataLen: number, ecLevel: number): number {
  for (let v = 0; v < VERSION_CAPACITY.length; v++) {
    if (dataLen <= VERSION_CAPACITY[v][ecLevel]) return v + 1;
  }
  return -1; // data too long
}

Stage 2: Bit Stream Encoding

Data gets converted to a binary bit stream. The stream starts with a 4-bit mode indicator (I use byte mode = 0100), followed by a character count indicator (8 bits for versions 1-9), then the actual data bytes.

After the data, the stream gets:

A terminator — up to 4 zero bits
Byte padding — zeros to reach a byte boundary
Fill padding — alternating 0xEC and 0x11 bytes until the stream reaches the exact capacity

That alternating padding pattern isn't random — it's part of the QR spec and ensures the bit stream doesn't accidentally form patterns that confuse scanners.

// Pad to exact capacity with alternating bytes
while (codewords.length < capacity) {
  codewords.push(padToggle ? 0xec : 0x11);
  padToggle = !padToggle;
}

Stage 3: Reed-Solomon Error Correction

This is where it gets mathematically interesting. Reed-Solomon codes add redundancy so the QR code can be read even if parts are damaged or obscured. The math happens in GF(256) — a Galois Field with 256 elements.

Galois Field Arithmetic

Normal arithmetic doesn't work for error correction because we need every operation to stay within 0-255 (one byte). Galois Fields solve this by redefining multiplication using logarithm tables.

First, I precompute exponential and logarithm lookup tables using the primitive polynomial 0x11d:

const GF_EXP = new Uint8Array(512);
const GF_LOG = new Uint8Array(256);

let x = 1;
for (let i = 0; i < 255; i++) {
  GF_EXP[i] = x;
  GF_LOG[x] = i;
  x = x << 1;
  if (x >= 256) x ^= 0x11d; // reduce by primitive polynomial
}
// Extend table for easy wraparound
for (let i = 255; i < 512; i++) GF_EXP[i] = GF_EXP[i - 255];

Now multiplication becomes addition of logarithms:

function gfMul(a: number, b: number): number {
  if (a === 0 || b === 0) return 0;
  return GF_EXP[(GF_LOG[a] + GF_LOG[b]) % 255];
}

This is elegant — we've turned expensive multiplication into a table lookup and an addition. The extended GF_EXP table (512 entries) avoids modulo operations during polynomial math.

Building the Generator Polynomial

The Reed-Solomon generator polynomial is built iteratively. For n error correction codewords, we multiply (x - a^0)(x - a^1)...(x - a^(n-1)) where a = 2 in GF(256):

function rsGenPoly(degree: number): Uint8Array {
  let gen = new Uint8Array([1]);
  for (let i = 0; i < degree; i++) {
    const next = new Uint8Array(gen.length + 1);
    for (let j = 0; j < gen.length; j++) {
      next[j] ^= gen[j];
      next[j + 1] ^= gfMul(gen[j], GF_EXP[i]);
    }
    gen = next;
  }
  return gen;
}

Encoding

The actual encoding is polynomial long division in GF(256). We divide the data polynomial by the generator polynomial, and the remainder becomes our error correction codewords:

function rsEncode(data: Uint8Array, ecCount: number): Uint8Array {
  const gen = rsGenPoly(ecCount);
  const result = new Uint8Array(ecCount);
  for (let i = 0; i < data.length; i++) {
    const coeff = data[i] ^ result[0];
    result.copyWithin(0, 1);
    result[ecCount - 1] = 0;
    for (let j = 0; j < ecCount; j++) {
      result[j] ^= gfMul(gen[j + 1], coeff);
    }
  }
  return result;
}

Block Interleaving

For larger QR codes, data is split into multiple blocks, each encoded independently with its own error correction. The blocks are then interleaved — codewords are taken round-robin from each block. This distributes errors across blocks, so localized damage doesn't wipe out a single block's worth of data.

Stage 4: Matrix Construction

The QR matrix is built in layers:

Finder Patterns

Three 7x7 squares at the top-left, top-right, and bottom-left corners. These are how scanners locate and orient the QR code. Each has a concentric pattern: 7x7 dark border, 5x5 white, 3x3 dark center.

Alignment Patterns

Starting from version 2, smaller 5x5 patterns appear at calculated positions. These help scanners correct for perspective distortion (like scanning at an angle).

Timing Patterns

Alternating dark/light modules along row 6 and column 6, connecting the finder patterns. These establish the grid spacing so scanners know module positions.

Data Placement

Data bits are placed in a zigzag pattern, starting from the bottom-right corner. The path moves in column pairs (two columns wide), alternating between upward and downward passes. It skips the timing pattern column (column 6) and all reserved areas:

// Zigzag right-to-left, alternating up/down
let bitIdx = 0;
for (let right = size - 1; right >= 1; right -= 2) {
  if (right === 6) right = 5; // skip timing column
  for (let vert = 0; vert < size; vert++) {
    const r = upward ? size - 1 - vert : vert;
    for (const c of [right, right - 1]) {
      if (!reserved[r][c] && bitIdx < bits.length) {
        matrix[r][c] = bits[bitIdx++];
      }
    }
  }
  upward = !upward;
}

Stage 5: Masking — The Optimization Problem

Here's where QR encoding becomes an optimization problem. Raw data placement can create patterns that confuse scanners — large uniform areas, patterns that look like finder squares, or uneven dark/light ratios.

The spec defines 8 mask patterns, each a mathematical function that determines whether to flip a module:

Mask 0: (row + col) % 2 === 0         — checkerboard
Mask 1: row % 2 === 0                  — horizontal stripes
Mask 2: col % 3 === 0                  — vertical stripes
Mask 3: (row + col) % 3 === 0         — diagonal
Mask 4: (row/2 + col/3) % 2 === 0     — block grid
Mask 5: (r*c)%2 + (r*c)%3 === 0       — product-based
Mask 6: ((r*c)%2 + (r*c)%3) % 2 === 0 — product variation
Mask 7: ((r+c)%2 + (r*c)%3) % 2 === 0 — mixed

Masks are only applied to data areas — structural patterns are preserved.

Penalty Scoring

Each masked result is scored with four penalty rules:

Rule 1: Consecutive modules — 5+ same-color modules in a row/column scores 3 + (count - 5). This penalizes long runs that blur together.

Rule 2: 2x2 blocks — Each 2x2 block of the same color scores 3. Large uniform areas are bad for scanning.

Rule 3: Finder-like patterns — The sequence 1011101 0000 (or reversed) scores 40 per occurrence. These look like finder patterns and confuse the position detection.

Rule 4: Dark balance — Deviation from 50% dark modules is penalized. The formula rounds the dark percentage to the nearest 5% boundaries and penalizes based on distance from 50%.

// Rule 4: Dark module balance
const darkCount = matrix.flat().filter(Boolean).length;
const total = size * size;
const pct = (darkCount * 100) / total;
const prev5 = Math.floor(pct / 5) * 5;
const next5 = prev5 + 5;
penalty += Math.min(Math.abs(prev5 - 50), Math.abs(next5 - 50)) / 5 * 10;

The encoder evaluates all 8 masks and picks the one with the lowest total penalty. This brute-force approach guarantees the optimal mask for readability.

Stage 6: Format Information

The last step encodes the error correction level and mask pattern as a 15-bit BCH code, placed at two locations around the top-left finder pattern. This is how scanners know which EC level and mask were used.

The format information is pre-computed as a lookup table (32 combinations: 4 EC levels x 8 masks), avoiding BCH calculation at runtime.

The Result

The complete pipeline — from a string like "https://devprix.dev" to a scannable QR matrix — runs in under 50ms in the browser. The output is rendered as SVG for crisp scaling at any size, with PNG export via Canvas API.

The tool also generates specialized payloads:

WiFi — WIFI:T:WPA;S:NetworkName;P:password;;
vCard — structured contact card format
Email — mailto: with subject and body
Phone — tel: format

What I Learned

Galois Field arithmetic is beautiful. Converting multiplication to table lookups via logarithms is one of those elegant mathematical tricks that feels like cheating. The entire GF(256) setup is 15 lines of code.

Penalty scoring is clever engineering. The four rules model human readability heuristics — too much uniformity, finder-like patterns, and dark/light imbalance all make codes harder to scan. Evaluating all 8 masks is brute force but guaranteed optimal.

The zigzag data placement is weird but purposeful. Moving in column pairs, alternating direction, skipping column 6 — it seems arbitrary until you realize it maximizes the distance between adjacent data bits, improving error resilience.

Zero dependencies was worth it. The implementation is 1,161 lines. A library would be smaller to import but adds supply chain risk, bundle size, and a black box. Building from scratch means I understand every bit of the output.

DEV Community