SEN LLC

Posted on Jun 15

Explaining SQL in Plain English with a Recursive-Descent Parser — and Why Logical Execution Order Is the Real Lesson

#javascript #sql #parsing #webdev

"What does this SQL actually do?" — you could paste it into an LLM, but then it won't work offline, it ships your schema to a third party, and it drifts between runs. I wrote a deterministic explainer instead: a hand-written tokenizer + recursive-descent parser for a subset of SQL SELECT, turning the AST into a plain-language, step-by-step explanation (Japanese or English). Two implementation hinges: (1) expressing operator precedence — AND binds tighter than OR — through the recursive-descent call hierarchy, and (2) narrating the explanation in logical execution order (FROM → WHERE → GROUP BY → SELECT), not written order. The second is what makes SQL's "weird rules" finally click.

🌐 Demo: https://sen.ltd/portfolio/sql-explainer/
📦 GitHub: https://github.com/sen-ltd/sql-explainer

Why a hand-written parser

To merely explain a query you could paste it into ChatGPT. But that means no offline use, sending possibly-sensitive schemas to a third party, and non-deterministic output. A small deterministic parser fixes all three — and for a SELECT subset, it's about 200 lines.

Three layers

tokenizer.js → parser.js → explainer.js

tokenizer: source → token stream (keywords, identifiers, strings, numbers, operators)
parser: tokens → AST (recursive descent)
explainer: AST → ordered natural-language steps

Each layer is DOM-free, so all 35 tests run under Node.

Tokenizer: SQL lexing gotchas

Keywords get uppercased and tagged; identifiers keep their original case:

const word = sql.slice(i, j);
if (KEYWORDS.has(word.toUpperCase())) {
  tokens.push({ type: "kw", value: word.toUpperCase() });
} else {
  tokens.push({ type: "ident", value: word }); // preserve case
}

SQL-specific traps worth handling:

Escaped quotes in strings: 'O''Brien' → O'Brien (two single quotes = one literal quote)
Quoted identifiers: "select" and `from` are identifiers even though they're reserved words
Comments: -- line and /* */ block

if (sql[j] === "'" && sql[j + 1] === "'") { val += "'"; j += 2; continue; }

Parser: precedence as call hierarchy

The heart of recursive descent: operator precedence becomes the function call depth. AND binds tighter than OR, so OR is the outer level and AND the inner:

parseCondition() { return this.parseOr(); }

parseOr() {
  let left = this.parseAnd();
  while (this.isKw("OR")) {
    this.next();
    left = { type: "or", left, right: this.parseAnd() };
  }
  return left;
}

parseAnd() {
  let left = this.parseNot();
  while (this.isKw("AND")) {
    this.next();
    left = { type: "and", left, right: this.parseNot() };
  }
  return left;
}

parseNot() {
  if (this.isKw("NOT")) { this.next(); return { type: "not", operand: this.parseNot() }; }
  return this.parsePredicate();
}

parseOr calls parseAnd calls parseNot. That descending depth is the precedence. Verified:

test("AND binds tighter than OR", () => {
  const ast = parse("SELECT * FROM t WHERE a=1 OR b=2 AND c=3");
  // parses as a=1 OR (b=2 AND c=3)
  assert.equal(ast.where.type, "or");
  assert.equal(ast.where.right.type, "and");
});

The OR node sits on top with the AND nested in its right branch. Parentheses become explicit group nodes in parsePredicate, overriding precedence.

The predicate level also handles SQL's special comparisons:

parseComparison() {
  const left = this.parseExpr();
  if (this.isKw("IS")) { /* IS [NOT] NULL */ }
  let negated = this.isKw("NOT") ? (this.next(), true) : false;
  if (this.isKw("IN"))      { /* [NOT] IN (...) */ }
  if (this.isKw("LIKE"))    { /* [NOT] LIKE pattern */ }
  if (this.isKw("BETWEEN")) { /* [NOT] BETWEEN a AND b */ }
  // plain = < > <= >= <> !=
}

Note the NOT lookahead feeding NOT IN / NOT LIKE / NOT BETWEEN, while IS NOT NULL is handled separately (the NOT comes after IS).

Explainer: logical execution order

The headline. SQL is written in one order and executed in another:

written:  SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY → LIMIT
logical:  FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

SELECT runs third from last. The explainer narrates in this logical order:

test("full query yields all clauses in logical order", () => {
  const ast = parse(`SELECT country, COUNT(*) FROM users
    JOIN accounts ON users.id = accounts.user_id
    WHERE active = 1 GROUP BY country HAVING COUNT(*) > 10
    ORDER BY country LIMIT 5`);
  assert.deepEqual(explain(ast, "en").map((s) => s.clause),
    ["FROM", "JOIN", "WHERE", "GROUP BY", "HAVING", "SELECT", "ORDER BY", "LIMIT"]);
});

Why this is the real lesson: logical order explains SQL's "weird rules":

You can't use a SELECT alias in WHERE — WHERE (3) runs before SELECT (6), so the alias doesn't exist yet.
Filter aggregates with HAVING, not WHERE — WHERE (3) runs before grouping (4); the aggregate isn't computed yet.
You can use a SELECT alias in ORDER BY — ORDER BY (7) runs after SELECT (6).

The tool isn't really "explain this SQL" — it's "internalize SQL's execution model."

AST → prose

The condition tree is stringified recursively, branching on language because word order differs:

function condText(c, lang) {
  switch (c.type) {
    case "and": return lang === "ja"
      ? `(${condText(c.left, lang)}) かつ (${condText(c.right, lang)})`
      : `(${condText(c.left, lang)}) AND (${condText(c.right, lang)})`;
    case "compare": return lang === "ja"
      ? `${exprText(c.left, lang)} が ${exprText(c.right, lang)} ${OP_JA[c.op]}`
      : `${exprText(c.left, lang)} ${OP_EN[c.op]} ${exprText(c.right, lang)}`;
    // ...
  }
}

Aggregates become words too: COUNT(*) → "count", AVG(price) → "the average of price".

Errors: no silent partial parse

If parsing fails, throw ParseError. A half-interpreted query producing a wrong explanation is worse than an error:

test("missing FROM throws", () => assert.throws(() => parse("SELECT *"), ParseError));
test("trailing junk throws", () => assert.throws(() => parse("SELECT * FROM t garbage"), ParseError));
test("non-select throws", () => assert.throws(() => parse("DELETE FROM t"), ParseError));

The key is expectEnd() — verifying every token was consumed catches trailing garbage like SELECT * FROM t garbage.

Architecture

tokenizer.js ← SQL lexer
parser.js    ← recursive-descent SELECT parser → AST (DOM-free)
explainer.js ← AST → logical-order steps, ja/en (DOM-free)
app.js       ← UI glue

35 tests across all three layers.

Try it

Demo: https://sen.ltd/portfolio/sql-explainer/
GitHub: https://github.com/sen-ltd/sql-explainer

Pick the "multiple JOIN + compound condition" example and toggle Japanese/English. A nested condition like (u.active = 1 OR u.role = 'admin') AND p.published IS NOT NULL becomes prose while preserving the parenthesized precedence.

Takeaways

You don't need an LLM to explain SQL. A tokenizer + recursive-descent parser for a SELECT subset is ~200 lines.
Operator precedence = call hierarchy. The depth of parseOr → parseAnd → parseNot is the binding strength.
Narrate in logical execution order (FROM → WHERE → GROUP BY → SELECT → ORDER BY) and SQL's weird rules become obvious.
Don't silent-partial-parse. Throw, and use expectEnd() to catch trailing junk.
Separate lexer / parser / semantics so each layer stays DOM-free and testable.

This is OSS portfolio #263 from SEN LLC (Tokyo). https://sen.ltd/portfolio/

DEV Community