DEV Community

SEN LLC
SEN LLC

Posted on

Writing a Regex Parser and Tree Visualizer in ~400 Lines of Vanilla JS

Writing a Regex Parser and Tree Visualizer in ~400 Lines of Vanilla JS

A small tool that turns a regex like ^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$ into a color-coded nested tree you can actually read. Plus the old trick of letting the browser's RegExp do the heavy lifting.

Writing regex: manageable. Reading someone else's regex a month later: brutal. regex101 is great for testing behavior but doesn't really show you the shape of the pattern. regexper.com does, but it's starting to feel dated and doesn't quite fit my needs.

So I built a smaller version. Browser-only, zero dependencies, about 400 lines of JavaScript.

🔗 Live demo: https://sen.ltd/portfolio/regex-viz/
📦 GitHub: https://github.com/sen-ltd/regex-viz

Screenshot

The interesting part of the project is the "smart half-measure" at the center of it: parse the regex yourself for visualization, but execute it with the built-in RegExp. I'll walk through each piece.

Three files, three jobs

src/
├── parser.js     # regex string → AST
├── renderer.js   # AST → HTML
└── matcher.js    # built-in RegExp → match ranges
Enter fullscreen mode Exit fullscreen mode

That's the whole engine. The rest is just DOM wiring and CSS.

Part 1: A recursive-descent parser

JavaScript's regex grammar is small and regular enough that a plain recursive-descent parser handles it directly:

regex        = alternation
alternation  = sequence ('|' sequence)*
sequence     = quantified*
quantified   = atom quantifier?
quantifier   = '?' | '*' | '+' | '{n[,m]}' ('?')?
atom         = literal | charClass | group | anchor | predefined | escape
group        = '(' ('?:' | '?<name>')? alternation ')'
charClass    = '[' '^'? char* ']'
Enter fullscreen mode Exit fullscreen mode

Each rule becomes one function in parser.js:

function parseAlternation(state) {
  const branches = [parseSequence(state)]
  while (peek(state) === '|') {
    state.pos++
    branches.push(parseSequence(state))
  }
  if (branches.length === 1) return branches[0]
  return { type: 'alternation', branches }
}

function parseSequence(state) {
  const children = []
  while (state.pos < state.input.length && peek(state) !== '|' && peek(state) !== ')') {
    children.push(parseQuantified(state))
  }
  if (children.length === 1) return children[0]
  return { type: 'sequence', children }
}
Enter fullscreen mode Exit fullscreen mode

The detail I like most: if a sequence has exactly one child, return the child directly instead of wrapping it in a sequence node. Same for alternation. The AST stays small, and the renderer never has to special-case "degenerate" nodes with one child.

Character class parsing was the only slightly annoying bit — - is a literal in some positions and a range separator in others, escapes work differently inside [], and [^] is "everything" despite looking ambiguous. The rule I settled on: check whether the character after - is ] before treating - as a range. Works for every realistic case.

if (state.input[state.pos] === '-' && state.input[state.pos + 1] !== ']') {
  state.pos++ // consume -
  to = state.input[state.pos]
  // ...
}
Enter fullscreen mode Exit fullscreen mode

Part 2: Don't do layout math, let CSS handle it

I almost reached for SVG. It's the "right" medium for a diagram. But then I realized: I'd have to compute the bounding box of every node, lay out children left-to-right or top-to-bottom, position them absolutely, and handle wrapping on narrow screens. That's a lot of code.

Instead, the renderer emits nested <div>s with class names, and CSS does the layout:

case 'sequence': {
  const inner = node.children.map(renderNode).join('')
  return `<div class="rv-node rv-sequence">
    <div class="rv-sequence-row">${inner}</div>
  </div>`
}
case 'alternation': {
  const parts = []
  for (let i = 0; i < node.branches.length; i++) {
    if (i > 0) parts.push('<div class="rv-or-label">OR</div>')
    parts.push(renderNode(node.branches[i]))
  }
  return `<div class="rv-node rv-alternation">
    <div class="rv-alternation-col">${parts.join('')}</div>
  </div>`
}
Enter fullscreen mode Exit fullscreen mode

And the CSS:

.rv-sequence-row { display: flex; flex-direction: row; flex-wrap: wrap; gap: 4px; }
.rv-alternation-col { display: flex; flex-direction: column; gap: 6px; }
.rv-group { border: 1px solid var(--group-color); border-radius: 8px; }
Enter fullscreen mode Exit fullscreen mode

Flexbox handles sequence-as-row, alternation-as-column, nesting, and wrapping on narrow viewports. Zero layout math in JavaScript.

Node coloring is done with border-left: 3px solid var(--literal-color) and different CSS variables per type. Swapping the theme later is one CSS file change.

Part 3: Let the browser execute it

This is where the project gets cheap in the best way. The matcher is this short:

export function matchAll(pattern, flags, testString) {
  const effectiveFlags = flags.includes('g') ? flags : flags + 'g'
  let re
  try {
    re = new RegExp(pattern, effectiveFlags)
  } catch (e) {
    return { ok: false, error: e.message }
  }
  const results = []
  for (const m of testString.matchAll(re)) {
    const start = m.index
    const text = m[0]
    if (text.length === 0) continue  // zero-width match → skip
    results.push({ start, end: start + text.length, text })
  }
  return { ok: true, matches: results }
}
Enter fullscreen mode Exit fullscreen mode

Parsing the regex myself means I can show its structure. Executing it myself would mean writing an NFA/DFA engine, handling backtracking, and almost certainly being slower and buggier than V8's. The two jobs are independent, so I just... don't do the second one.

Two subtleties worth noting:

  1. Force the g flag before calling matchAll. Without it, the call throws. Users don't always remember to include it when their only goal is "highlight every match".
  2. Skip zero-width matches. Patterns like ^, $, (?=foo) match without consuming any characters; if you naively iterate matchAll you'll either loop forever or crash. continue-ing on text.length === 0 is the simplest fix.

Highlighting matches

Given the match ranges, producing highlighted HTML is a straight one-pass stringify:

export function highlight(testString, matches) {
  if (!matches || matches.length === 0) return escapeHtml(testString)
  const out = []
  let pos = 0
  for (const m of matches) {
    if (m.start > pos) out.push(escapeHtml(testString.slice(pos, m.start)))
    out.push(`<mark>${escapeHtml(testString.slice(m.start, m.end))}</mark>`)
    pos = m.end
  }
  if (pos < testString.length) out.push(escapeHtml(testString.slice(pos)))
  return out.join('')
}
Enter fullscreen mode Exit fullscreen mode

The assumption here is that matches are non-overlapping and sorted by position, which matchAll gives us for free.

Tests

51 cases on node --test, grouped into three files for parser / renderer / matcher. No test framework, no dependencies. A lot of the parser tests are one-liners:

test('quantifier {3,5}', () => {
  const ast = okAst('a{3,5}')
  assert.equal(ast.min, 3)
  assert.equal(ast.max, 5)
})
Enter fullscreen mode Exit fullscreen mode

The renderer tests check that the right CSS classes appear in the output HTML (smoke-level, not pixel-perfect). That's enough: if the classes are right, CSS does the rest.

What it doesn't handle

  • Lookahead / lookbehind ((?=...), (?!...), (?<=...), (?<!...))
  • Unicode property escapes (\p{Letter})
  • Backreferences (\1, \2)

All three can still run through the RegExp matcher — they just won't show up in the tree. Parsing them would each require a small extension to parseAtom/parseEscape. On my list if I get requests.

Why rebuild a thing that already exists

Same answer as every portfolio project: fitting the whole codebase in my head is worth more than feature completeness. 400 lines for a regex visualizer means I can reason about every edge case, explain the whole thing in a blog post, and trust that nothing surprising is happening. Every extra abstraction I skip is one less thing a reader has to unpack.

Closing

This is entry #3 in a 100+ portfolio series by SEN LLC. Previous entries:

Feedback, bug reports, and nasty regexes that break it are all welcome.

Top comments (0)