Building a word-alignment tool with no database and making image exports that match the screen

#svelte #showdev #webdev #typescript

I make small tools for linguistics and conlanging on the side. A while back I built one called Word Aligner. It draws which word matches which between a sentence and its translation, with curved connectors, and you can stack extra rows for a gloss or an IPA transcription.

In the conlang community people post word-by-word alignments of their languages all the time, and there was no easy way to make them. Folks were lining up arrows in Paint or PowerPoint. I wanted a page where you click two words and a connector appears, so I built one. It caught on in the conlang subreddit, and it turned out language teachers and linguists wanted the same thing.

This post is about two decisions that shaped the codebase: keeping all state in the URL, and getting exports to look exactly like the preview. The second one sent me further down the font rabbit hole than I expected.

The stack, briefly: SvelteKit with Svelte 5, TypeScript, Tailwind v4, and Flowbite for UI. It runs on adapter-node. Most pages are static.

No database. The diagram lives in the URL

There are no accounts and no backend storage. The entire project, the lines of text, the links between words, the per-line fonts, the colors, all of it, is encoded into the page URL after every edit. Open the link and you get the same diagram back. That is also the share feature: there is nothing else to share.

I like this for a free tool. No login, no storage cost, no table of other people's sentences to worry about. The privacy story writes itself, because the data never reaches a server I control.

The encoding has three steps. First I build a compact form of the state that only stores what differs from the defaults, with short keys and sorted fields and rounded floats. Then I deflate it. Then I
base64url it.

import { deflateSync, strToU8 } from 'fflate';

export function encodeState(state: AppStateV2): string {
  return deflateBase64url(toCompactJSON(state));
}

export function deflateBase64url(s: string): string {
  const bytes = deflateSync(strToU8(s), { level: 9 });
  return toBase64url(bytes); // + / = swapped for - _ and stripped
}

The compaction matters more than the compression for short diagrams. A two-line alignment with a few links produces a tiny payload because almost nothing deviates from the defaults, so most fields are simply absent. Deflate then earns its keep on the big interlinear examples with custom fonts and many rows.

Decoding has a guard I added after thinking about what a hostile link could do: inflate, but bail if the decompressed string is over 2 MB, and return null on any parse error rather than throwing.

const MAX_DECOMPRESSED_BYTES = 2 * 1024 * 1024;

export function inflateBase64url(s: string): string | null {
  try {
    const out = strFromU8(inflateSync(fromBase64url(s)));
    if (out.length > MAX_DECOMPRESSED_BYTES) return null;
    return out;
  } catch {
    return null;
  }
}

The honest trade-off is URL length. A heavy diagram makes a long link. For the cases people actually share it stays well within what browsers and chat apps accept, and the compaction keeps the common case short, so I have not needed a fallback store. If I ever do, it slots in behind the same encode and decode functions.

Exports that match the preview

People export these diagrams into slides, papers, and worksheets, so an export that looks even slightly different from the screen is a bug report waiting to happen. The preview is SVG. So are the exports, built from the same layout and the same link geometry that the preview uses. That part was easy.

PNG and PDF are where it got fiddly. PNG is the SVG drawn onto a canvas and read back as a blob:

const dataUrl = `data:image/svg+xml;charset=utf-8,${encodeURIComponent(svg)}`;
const img = new Image();
// await onload, then drawImage onto a scaled canvas, then canvas.toBlob(...)

PDF I do as a raster page. The SVG goes to a canvas, the canvas to a PNG, and the PNG into a single-page jsPDF sized to the image. I tried the vector route first and gave up on it. The SVG-to-PDF libraries fought with Vite over CommonJS interop, and a raster page at 2x is good enough for what these diagrams are. I would rather ship the boring version that always works than babysit a build issue for a feature nobody asked to be vector.

The font problem

Here is the part that ate a weekend.

When you rasterize an SVG by loading it through an <img>, any @font-face with a data URL races the image decode. For Google Fonts as small woff2 files it usually wins, so I leave those embedded as woff2 data URLs and they render fine. For a font the user uploaded, often a custom conlang script or something the web does not ship, it usually loses. The first and only frame gets drawn in a fallback family, and the exported PNG looks wrong while the screen looked right.

The fix is to take the font out of the raster path entirely. Before rasterizing, I find every <text> element that uses an uploaded font and replace it with vector outlines, so the export no longer depends on the font loading at all.

Getting the outlines right meant two libraries doing two jobs:

harfbuzzjs does the shaping. It is the same engine family browsers use, so ligatures, contextual alternates, and right-to-left runs come out the way the preview shows them. I pass the usual feature set and let it tell me which glyph sits where.

const EXPORT_HB_FEATURES = 'kern,liga,rlig,clig,calt,ccmp';

opentype.js turns each shaped glyph id into a path with glyph.getPath. I tried using HarfBuzz's own glyphToPath and the text came out upside down, because HarfBuzz works in Y-up typographic coordinates and SVG is Y-down. Rather than flip everything, I let opentype.js produce the paths. It is pinned at 1.3.4, which is the version whose parsing and path output I trust here.

If HarfBuzz fails to load for some reason, there is a fallback that outlines straight from opentype.js without proper shaping. It is worse for complex scripts, but it beats a broken export.

The result is that an uploaded font survives into the SVG, the PNG, and the PDF as exact shapes, and the file matches what you saw in the browser. That sounds small written down. It was the difference between the tool being usable for conlang scripts and not.

Odds and ends

The same SVG path also feeds a server-rendered preview image for social cards, using resvg. And there is a small HTTP API plus an MCP server, so an agent can generate a diagram from a phrase and get a link back. I hand-rolled the MCP side as plain JSON-RPC instead of pulling in the SDK, because it is one stateless tool and the SDK's transport assumed a Node request and response shape that SvelteKit does not hand me. That is probably its own post.

Where it is

The tool is at aligner.tinygods.dev if you want to poke at it. It is free and there is nothing to sign up for. Happy to answer questions about any of the above in the comments, the font part especially, since I could not find a clean writeup when I was stuck on it.