SEN LLC

Posted on May 15

Reimplementing path-to-regexp in 100 Lines — Why /users/:id? Almost Never Works the Way You Expect

#javascript #express #regex #webdev

app.get('/users/:id', ...) is one of those one-liners every Node developer types a hundred times before wondering what it actually does. The answer: Express hands the string to path-to-regexp, which compiles it to /^\/users\/([^/]+)$/. That whole pipeline fits in 100 lines of vanilla JS. Reimplementing it surfaces the subtle bugs you've probably hit at least once — the :id? modifier swallowing the leading slash, the regex-meta character that wasn't escaped, the inline regex with an unbalanced paren. The result is a browser-only Express route tester with 23 unit tests pinning the boundaries.

$regex-route UI: dark theme. Pattern input shows raw `/users/:id(\d+)` endraw with a small status line below it:$

🌐 Demo: https://sen.ltd/portfolio/regex-route/
📦 GitHub: https://github.com/sen-ltd/regex-route

Why rewrite path-to-regexp

The standard answer is "you don't need to, just use Express." That's correct until the day you hit one of these:

/users/:id matches /users/42/extra in your mental model, but it actually doesn't — :id is exactly one segment.
/users/:id? doesn't match /users in your test, because you forgot that the optional ? modifier has to absorb the leading slash too.
You upgraded Express 4 → 5, which switched from path-to-regexp@6 to @7, and now /users/:id parses differently in a couple of subtle cases.

Walking through the parser by hand makes all three problems go away. And the parser is small enough to actually finish.

A 4-state direct-style parser

The supported grammar:

Syntax	Meaning
`/foo/bar`	Static segments; regex meta chars are escaped.
`/users/:id`	Named param; matches one non-slash segment.
`/users/:id?`	Named param, optional. The leading `/` is absorbed too.
`/users/:id(\d+)`	Named param constrained to an inline regex.
`/files/*`	Wildcard; captures the rest of the path.

Walking the pattern character-by-character:

while (i < pattern.length) {
  const ch = pattern[i];

  if (ch === ":") {
    // Named param: ':name', ':name(regex)', ':name?', or combinations.
    // ...
  } else if (ch === "*") {
    keys.push({ name: "wild", modifier: "*", custom: null });
    re += "(.*)";
    i++;
  } else if (REGEX_META.includes(ch)) {
    // Static text containing a regex meta char — escape it so the
    // generated regex still matches literally.
    re += "\\" + ch;
    i++;
  } else if (ch === "/") {
    re += "\\/";    // optional for engine, but makes the printed regex copy/pasteable
    i++;
  } else {
    re += ch;
    i++;
  }
}

Two points worth calling out:

Always escape regex meta chars in static text. /foo.bar without escaping the . would silently match /fooXbar. The article version of this bug usually shows up months later when an unexpected URL hits an unexpected handler.
Always write \/, not / in the generated regex. It's identical to the engine, but it lets me print the regex to the UI status line and have it be copy-pasteable into a new new RegExp(...) call.

The `:id?` optional-segment trap

The naïve implementation makes /users/:id? into ^\/users\/([^/]+)?$. That matches /users/ (with trailing slash) but not /users. Almost certainly not what the user wanted.

The fix: when you see the ? modifier, walk back and absorb the preceding \/ into the optional group:

if (modifier === "?") {
  if (re.endsWith("\\/")) {
    re = re.slice(0, -2) + `(?:\\/(${seg}))?`;
  } else {
    re += `(${seg})?`;
  }
}

Now /users/:id? becomes ^\/users(?:\/([^/]+))?$, which correctly matches both /users and /users/42. Pinned in the tests:

test("compilePath: :param? makes the segment + leading slash optional", () => {
  const c = compilePath("/users/:id?");
  assert.equal(matchPath(c, "/users").params.id, null);
  assert.equal(matchPath(c, "/users/42").params.id, "42");
});

This single mistake is responsible for a non-trivial fraction of "why isn't my optional route working" Stack Overflow questions about Express.

Inline regex with paren balance

:id(\d+) is a parameter constrained to a custom regex. Naïve indexOf(")") breaks the moment someone writes a nested group like :date((\d{4})-(\d{2})). Use a depth counter and respect backslash escapes:

export function findMatchingParen(s, start) {
  if (s[start] !== "(") return -1;
  let depth = 0;
  for (let i = start; i < s.length; i++) {
    if (s[i] === "\\") { i++; continue; }   // skip the escaped char
    if (s[i] === "(") depth++;
    else if (s[i] === ")") {
      depth--;
      if (depth === 0) return i;
    }
  }
  return -1;
}

The if (s[i] === "\\") { i++; continue; } line is the one that gets forgotten: without it, :id(\\)) would be parsed as ending at the \) instead of the real close. Tested:

test("findMatchingParen respects backslash escapes", () => {
  // (\)) — string length 4. Inner \) is escaped; the trailing ) closes.
  assert.equal(findMatchingParen("(\\))", 0), 3);
});

Stripping query and hash before matching

/users/42?include=author should match /users/:id because Express does. But path-to-regexp itself doesn't strip the query — Express does that in middleware. For a standalone tester, we have to do the stripping ourselves before running the match:

const hashPos = url.indexOf("#");
const noHash = hashPos === -1 ? url : url.slice(0, hashPos);
const qPos = noHash.indexOf("?");
const pathPart = qPos === -1 ? noHash : noHash.slice(0, qPos);
const query = qPos === -1 ? "" : noHash.slice(qPos + 1);

The stripped query goes into the result object so the UI can show "query: ?include=author" as a separate line, without affecting whether the match succeeded.

URL-decoding captured values, safely

/users/%E5%B1%B1%E7%94%B0 is the percent-encoding for /users/山田. The captured group is the raw %E5%B1%B1%E7%94%B0, and the user wants to see the kanji. Run it through decodeURIComponent — but guard against malformed input:

try {
  params[key.name] = decodeURIComponent(raw);
} catch {
  // Lone '%' or other malformed percent-encoding raises URIError.
  // Don't fail the match; surface the raw value instead.
  params[key.name] = raw;
}

decodeURIComponent throws URIError on invalid sequences. Without the catch, anyone who pastes /users/foo%bar into the URL field would see the entire results table blow up.

test("matchPath: malformed percent-encoding returns the raw value", () => {
  const c = compilePath("/users/:name");
  const r = matchPath(c, "/users/foo%bar");
  assert.ok(r !== null);
  assert.ok(typeof r.params.name === "string");
});

The full API in five exports

// route.js (~100 lines)
export class CompileError extends Error { /* with pos */ }
export function findMatchingParen(s, start) { /* depth-tracking */ }
export function compilePath(pattern) { /* → {regex, keys, source, generated} */ }
export function matchPath(compiled, url) { /* → {url, pathPart, query, params} | null */ }
export function testRoute(pattern, url) { /* compile + match shortcut, no throws */ }

script.js wires the DOM: pattern input → debounce 80 ms → compile → re-render the results table. URL textarea → split on newline → match each line. The output is a side-by-side educational table that makes the route → regex mapping obvious.

TL;DR

The static parts of route patterns need regex-meta escaping. /foo.bar matching /fooXbar is a classic silent bug.
:id? has to absorb the leading slash into the optional group, otherwise /users won't match.
Inline-regex paren balance needs a depth counter and backslash-escape handling.
Strip ?query and #fragment before matching; surface the query separately.
Always wrap decodeURIComponent in a try/catch — malformed percent-encoding throws.

Source: https://github.com/sen-ltd/regex-route — MIT, ~350 lines of JS, 23 unit tests, no build step, zero runtime dependencies.

🛠 Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.

DEV Community

Reimplementing path-to-regexp in 100 Lines — Why /users/:id? Almost Never Works the Way You Expect

Why rewrite path-to-regexp

A 4-state direct-style parser

The `:id?` optional-segment trap

Inline regex with paren balance

Stripping query and hash before matching

URL-decoding captured values, safely

The full API in five exports

TL;DR

Top comments (0)

Why rewrite path-to-regexp

A 4-state direct-style parser

The :id? optional-segment trap

Inline regex with paren balance

Stripping query and hash before matching

URL-decoding captured values, safely

The full API in five exports

TL;DR

The `:id?` optional-segment trap