yinhao

Posted on Mar 31

How prefix-sum binary search makes text line-breaking O(log n) with zero allocation

#javascript #typescript #algorithms #performance

Most text layout implementations find line breaks by scanning forward character by character, accumulating widths until they exceed the container. That's O(n) per line and it allocates intermediate results along the way.

There's a better approach using prefix sums and binary search. I used it in a text layout engine I built called ZeroText, and the technique is general enough to be useful anywhere you need to partition a sequence by cumulative weight.

The naive approach

let x = 0;
for (let i = start; i < text.length; i++) {
  x += getWidth(text[i]);
  if (x > containerWidth) {
    // break at i-1, start new line
    break;
  }
}

This is O(n) per line. For a 10,000-character document with 500 lines, you're touching every character at least once per layout. And getWidth might itself allocate — Canvas measureText returns a new TextMetrics object every call. DOM-based measurement triggers synchronous reflow.

Prefix sums turn range queries into subtraction

Compute the prefix sum array in a single pass:

const prefixSum = new Float64Array(n);
prefixSum[0] = width(text[0]);
for (let i = 1; i < n; i++) {
  prefixSum[i] = prefixSum[i - 1] + width(text[i]);
}

Now the total width of any substring [a, b) is:

prefixSum[b - 1] - (a > 0 ? prefixSum[a - 1] : 0)

O(1) for any range. No scanning.

Binary search for line breaks

To find where a line starting at index start overflows container width W, you need the largest end where:

prefixSum[end] - prefixSum[start - 1] <= W

Rearrange:

prefixSum[end] <= W + prefixSum[start - 1]

The prefix sum array is monotonically non-decreasing (all widths >= 0), so this is a standard upper-bound binary search:

function findBreakPoint(prefixSum, start, width) {
  const target = width + (start > 0 ? prefixSum[start - 1] : 0);
  let lo = start;
  let hi = prefixSum.length - 1;

  while (lo < hi) {
    const mid = (lo + hi + 1) >>> 1;
    if (prefixSum[mid] <= target) {
      lo = mid;
    } else {
      hi = mid - 1;
    }
  }
  return lo;
}

Complexity: O(n) to build the prefix sum once, then O(log n) per line break. For that 10,000-character document with 500 lines: one linear pass plus ~500 binary searches of ~13 steps each. About 6,500 comparisons total for all line breaking.

Making it zero-allocation

The prefix sum array is the obvious allocation. Two techniques eliminate it:

1. Arena pool with cursor reset

Pre-allocate a Float64Array once, sized for the maximum expected text length:

const MAX_LEN = 8192;
const arena = {
  prefixSum: new Float64Array(MAX_LEN),
  cursor: 0,
};

function reset() {
  arena.cursor = 0; // no deallocation, no GC
}

On each layout call, write into the existing buffer. Between frames, reset the cursor. The same memory is reused indefinitely. For texts exceeding MAX_LEN, a single overflow buffer is lazily allocated and reused:

let overflow = null;

function getPrefixSum(len) {
  if (len <= MAX_LEN) return arena.prefixSum;
  if (!overflow || overflow.length < len) {
    overflow = new Float64Array(len); // happens once
  }
  return overflow;
}

2. ASCII fast-path glyph lookup

A perfect hash table maps codepoints to widths, but most English text is ASCII. A direct-index array for codepoints 0–127 skips hashing entirely:

const asciiCache = new Float32Array(128);

function getWidth(codepoint) {
  if (codepoint < 128) return asciiCache[codepoint];
  return hashTableLookup(codepoint);
}

No object creation. No string conversion. The hash table itself is a pair of typed arrays (keys + values) using open addressing.

Caching repeated layouts with FNV-1a

If the same text is laid out at the same width (common during scrolling, animation, or React re-renders), skip everything.

Hash the input codepoints directly as integers using FNV-1a:

function fnv1a(data, len, seed) {
  let h = 0x811c9dc5 ^ seed;
  for (let i = 0; i < len; i++) {
    h ^= data[i];
    h = Math.imul(h, 0x01000193);
  }
  return h >>> 0;
}

The key is fnv1a(codepoints, length, containerWidth) — a single 32-bit number. No string concatenation, no JSON.stringify, no object key.

The LRU cache is a standard doubly-linked list with a Map<number, Node>:

class LRUCache {
  constructor(capacity) {
    this.capacity = capacity;
    this.map = new Map();
    this.head = { prev: null, next: null };
    this.tail = { prev: null, next: null };
    this.head.next = this.tail;
    this.tail.prev = this.head;
  }

  get(key) {
    const node = this.map.get(key);
    if (!node) return null;
    this._moveToFront(node);
    return node.value;
  }

  put(key, value) {
    if (this.map.has(key)) {
      const node = this.map.get(key);
      node.value = value;
      this._moveToFront(node);
      return;
    }
    if (this.map.size >= this.capacity) {
      const lru = this.tail.prev;
      this._remove(lru);
      this.map.delete(lru.key);
    }
    const node = { key, value, prev: null, next: null };
    this._addToFront(node);
    this.map.set(key, node);
  }
}

Cache hit returns the previous layout result — an object pointing into the arena's Structure of Arrays (SoA) buffers. The hot path is: hash → Map.get → pointer swap → return. Measured at ~100ns.

Handling Unicode line break rules

The binary search finds where width overflows, but you can't break in the middle of a word. Unicode Standard Annex #14 defines line break classes for every codepoint.

A DFA classifies each codepoint:

const BreakClass = { SP: 0, BA: 1, HY: 2, ZW: 3, BK: 4, AL: 5 /* ... */ };

function getBreakClass(cp) {
  if (cp === 0x0020) return BreakClass.SP;    // space
  if (cp === 0x002D) return BreakClass.HY;    // hyphen
  if (cp === 0x200B) return BreakClass.ZW;    // zero-width space
  if (cp === 0x000A) return BreakClass.BK;    // newline
  // ... ranges for CJK, punctuation, etc.
  return BreakClass.AL;                        // alphabetic (default)
}

After the binary search finds the overflow point, scan backwards to the nearest valid break:

let breakPos = overflowIndex;
while (breakPos > lineStart) {
  const cls = getBreakClass(text[breakPos]);
  if (cls === SP || cls === BA || cls === HY || cls === ZW) {
    break;
  }
  breakPos--;
}

This backward scan is bounded by average word length (~5 characters in English), making it effectively O(1). For CJK text, every character is a valid break point, so it terminates immediately.

Forced line breaks

Before width-based breaking, scan for \n, \r, or \r\n:

let forcedBreak = -1;
for (let i = start; i < len; i++) {
  if (text[i] === 0x000A || text[i] === 0x000D) {
    forcedBreak = i;
    break;
  }
}

If a newline appears within the available width, break there. If width overflows before reaching the newline, do normal width-based wrapping. Newline characters get zero width in the prefix sum so they don't affect measurements.

CR+LF pairs are consumed as a single break:

if (text[forcedBreak] === 0x000D && text[forcedBreak + 1] === 0x000A) {
  start = forcedBreak + 2; // skip both
}

Putting it all together

The full layout function:

function layoutText(text, containerWidth, glyphTable) {
  const len = text.length;
  const prefixSum = getPrefixSum(len);

  // 1. Build prefix sums (O(n), zero-alloc)
  let cum = 0;
  for (let i = 0; i < len; i++) {
    cum += getWidth(glyphTable, text[i]);
    prefixSum[i] = cum;
  }

  // 2. Find line breaks
  const lines = [];
  let start = 0;
  while (start < len) {
    // Check for forced breaks first
    let forced = scanForNewline(text, start, len);

    // Binary search for width-based break (O(log n))
    let end = findBreakPoint(prefixSum, start, containerWidth);

    // Resolve: use whichever comes first
    if (forced >= 0 && forced <= end) {
      end = forced;
    } else {
      // Scan back for word boundary (O(1) amortized)
      end = findWordBreak(text, start, end);
    }

    lines.push({ start, end });
    start = end + 1;
  }

  return lines;
}

Results

On a typical paragraph (~200 characters, ~10 lines):

Metric	Value
Cold layout	~5.6μs
Hot layout (cached)	~100ns
Bundle size	~5KB minzipped
GC pauses	0

The prefix-sum technique isn't new — it appears in database query engines, computational geometry, and competitive programming. But applying it to text layout, combined with arena allocation and numeric-keyed caching, produces something measurably faster than the scan-and-accumulate approach that most implementations use.

You can try an interactive demo that runs each algorithm step in the browser, or read the source.

DEV Community