SEN LLC

Posted on Apr 16

Parsing Prisma Schema By Hand: A 300-Line CLI That Emits Mermaid ER Diagrams

#typescript #prisma #mermaid #tutorial

Parsing Prisma Schema By Hand: A 300-Line CLI That Emits Mermaid ER Diagrams

Prisma has no official ER diagram generator. The popular third-party one pulls Puppeteer to rasterise a PNG. I wanted a diagram I could paste into a README. So I wrote a recursive descent parser for schema.prisma and had it emit Mermaid text. Here's what I learned about why Prisma's schema language is pleasantly parser-friendly — and how the whole thing fits in a few hundred lines of TypeScript with zero runtime dependencies.

📦 GitHub: https://github.com/sen-ltd/prisma-erd

The problem

Prisma is the TypeScript ecosystem's default "you already have a schema, now what" tool. Its schema.prisma file defines models, scalar fields, enums, and relations in a small domain-specific language. You'd expect an ER diagram generator in the box. There isn't one.

The well-known community plugin, prisma-erd-generator, solves this by booting Puppeteer, asking Mermaid to render, and writing out a PNG. That works. But it's a lot of machinery for what should be a text transformation, and it's painful in CI (apt-get install chromium, cached Docker layers, the usual story). More importantly: the output is a raster image. You can't diff it, you can't grep it, and you can't paste it into a README.

GitHub, GitLab, dev.to, Obsidian, Notion, and any modern Markdown renderer all speak Mermaid natively. If I hand GitHub this:

```mermaid
erDiagram
  User ||--o{ Post : author
```

GitHub renders the diagram inline. No image host, no PNG, no Puppeteer. Just text.

So what I actually want is a CLI that reads schema.prisma and prints Mermaid. That's it. And once I started sketching it, it turned into something I keep recommending as a "learn to write parsers" exercise.

Why Prisma's schema is pleasant to parse

Parsing is a thing people think is hard. Sometimes it is. But Prisma's schema language happens to land in the sweet spot where recursive descent — the simplest possible parsing strategy — just works. A few reasons:

Block-structured. Everything lives inside model X { ... }, enum X { ... }, datasource X { ... }, or generator X { ... }. The outermost grammar is a loop over these blocks.
No expression evaluation needed. Prisma's attribute arguments contain expressions like @default(autoincrement()) or @relation(fields: [authorId], references: [id]), but for an ER diagram, I don't have to evaluate them. I just need to find @relation and look at whether it has fields: in it. Raw text matching on captured argument strings is fine.
Whitespace is meaningful enough to be a cue, loose enough to be forgiving. Fields are newline-terminated, but attribute arguments can span lines. That's exactly the kind of thing a hand-written tokenizer handles gracefully.
The identifier set is small. Keywords like model, enum, datasource, generator are just identifiers that the parser checks for contextually. There's no precedence table. There's no operator grammar. There is, delightfully, no pattern matching syntax.

If you've ever bounced off Bison or tree-sitter documentation, Prisma is a good place to just write the parser yourself and learn why it works.

Step one: a tiny lexer

The lexer's only job is to turn the source text into a stream of { kind, value, line, col } tokens. Here's the core loop:

while (i < n) {
  const ch = source[i]!;

  // Line comments
  if (ch === '/' && source[i + 1] === '/') {
    while (i < n && source[i] !== '\n') i++;
    continue;
  }

  // Newlines are significant — they terminate a field declaration.
  if (ch === '\n') { push('newline', '\n', line, col); i++; line++; col = 1; continue; }

  // @@attribute vs @attribute
  if (ch === '@') {
    if (source[i + 1] === '@') { push('double_at', '@@', line, col); i += 2; col += 2; }
    else { push('at', '@', line, col); i++; col++; }
    continue;
  }

  if (ch === '"')  { /* read string until closing quote */ }
  if (isIdentStart(ch)) { /* read ident until non-ident-continuation */ }
  // ...punctuation, numbers, errors
}

Two things worth pointing out.

Newlines are tokens. Most lexers skip whitespace. This one keeps \n as a real token, because a field declaration like email String @unique is terminated by a newline, not a semicolon. If I collapsed whitespace, the parser would have no way to tell where one field ends and the next begins. By contrast, whitespace inside attribute arguments is fine to collapse, because the parser captures those as opaque text.

@ and @@ are different token kinds. In Prisma, @id attaches to a single field, while @@id([a, b]) declares a composite primary key at the model level. Distinguishing them at the lexer layer means the parser never has to look ahead.

The lexer is about 150 lines of straight-line TypeScript. No regex engines, no backtracking, no libraries. Every throw carries line/column coordinates so error messages point users to the bad character.

Step two: recursive descent for blocks

The parser is a bit longer — around 250 lines — but it's still just a set of functions that call each other.

export function parse(source: string): Schema {
  const cursor = new Cursor(tokenize(source));
  const models: Model[] = [];
  const enums: EnumDecl[] = [];

  while (cursor.peek().kind !== 'eof') {
    cursor.eatNewlines();
    const t = cursor.peek();
    if (t.kind === 'eof') break;
    if (t.kind !== 'ident') throw new ParseError(`unexpected ${t.kind} at top level`, t.line, t.col);
    switch (t.value) {
      case 'model':      models.push(parseModel(cursor)); break;
      case 'enum':       enums.push(parseEnum(cursor));  break;
      case 'datasource':
      case 'generator':
      case 'type':
      case 'view':       skipNamedBlock(cursor);          break;
      default:           throw new ParseError(`unknown top-level keyword "${t.value}"`, t.line, t.col);
    }
  }
  return { models, enums };
}

That's the entire top-level grammar. Loop over tokens; each keyword dispatches to a helper. skipNamedBlock is a special case: datasource and generator are interesting to Prisma but not to me, so I consume their tokens until I see the matching } and drop everything in between.

parseModel is where it gets interesting:

function parseModel(cursor: Cursor): Model {
  const kw = cursor.next();                    // 'model'
  const nameTok = cursor.expect('ident', 'model name');
  cursor.expect('lbrace');
  const model: Model = { name: nameTok.value, fields: [], attributes: [], line: kw.line };

  for (;;) {
    cursor.eatNewlines();
    const t = cursor.peek();
    if (t.kind === 'rbrace') { cursor.next(); return model; }
    if (t.kind === 'eof')    throw new ParseError('unexpected EOF inside model', t.line, t.col);
    if (t.kind === 'double_at') { model.attributes.push(parseAttribute(cursor)); continue; }
    if (t.kind === 'ident')     { model.fields.push(parseField(cursor)); continue; }
    throw new ParseError(`unexpected ${t.kind} inside model body`, t.line, t.col);
  }
}

The loop is simple: eat leading blank lines, look at the next token, decide which branch. Closing brace? Done. Double-at? It's a block attribute like @@index([a, b]). Ident? It's a field declaration starting with its name.

A field is <name> <type>[?|[]] <attributes...>. Parsing it is mechanical: read an identifier, read another identifier for the type, check for ? and [] in either order, then consume @attribute clauses until a newline ends the declaration.

The one fun trick is how I handle attribute arguments. Prisma attributes can contain anything — nested arrays, quoted strings, function calls, commas inside brackets. Rather than implement full expression parsing, I use a balanced-paren capture that just reads tokens until the depth hits zero, writing them into a flat string:

function captureBalanced(cursor: Cursor): string {
  const open = cursor.expect('lparen');
  let depth = 1;
  const parts: string[] = [];
  while (depth > 0) {
    const t = cursor.next();
    if (t.kind === 'eof')    throw new ParseError('unterminated attribute args', open.line, open.col);
    if (t.kind === 'lparen') { depth++; parts.push('('); continue; }
    if (t.kind === 'rparen') { depth--; if (depth === 0) break; parts.push(')'); continue; }
    if (t.kind === 'string') { parts.push(JSON.stringify(t.value)); continue; }
    parts.push(t.value);
  }
  return parts.join(' ').replace(/\s+/g, ' ').trim();
}

So @relation(fields: [authorId], references: [id]) becomes the string fields : [ authorId ] , references : [ id ]. Later, the analyzer pattern-matches on that with a simple regex. It doesn't need to understand "authorId is a field name that references User.id" — it just needs to know that fields: is present, which tells it this side owns the foreign key. That's the whole signal.

Is this elegant? No. Is this right? Also no — a real parser would produce a structured attribute argument tree. But the escape hatch buys me enormous simplicity for a diagram generator, which only cares about three attributes total: @id, @relation, and @unique. Full AST construction would triple the parser size for features I don't use.

Step three: relation pairing

The analyzer takes the parsed AST and builds an ER graph. Models become nodes. Fields that reference other models become edges. The tricky part is that Prisma represents each relation twice — once on each side:

model User {
  posts Post[]          // <-- one side
}
model Post {
  author   User @relation(fields: [authorId], references: [id])  // <-- other side
  authorId Int
}

I don't want two edges in my diagram. I want one, labelled correctly, with the right cardinality. So I walk every model field that references another model, collect them all into a RawRelation[], and then pair them up:

for (let i = 0; i < raw.length; i++) {
  if (used.has(i)) continue;
  const a = raw[i]!;
  let matched = -1;
  for (let j = i + 1; j < raw.length; j++) {
    if (used.has(j)) continue;
    const b = raw[j]!;
    if (b.field.type !== a.modelName) continue;
    if (a.field.type !== b.modelName) continue;
    if (a.relationName !== b.relationName) continue;   // support named relations
    matched = j;
    break;
  }
  used.add(i);
  if (matched >= 0) { used.add(matched); pairs.push({ a, b: raw[matched]! }); }
  else              { pairs.push({ a }); }  // dangling — referenced model is filtered out
}

Once paired, I know which side owns the FK (the one with fields: in its @relation attribute), so I make that the "from" of the edge. Each side's cardinality comes from the field: a list is "many", an optional scalar is "zero-or-one", a required scalar is "one".

Implicit many-to-many is the special case where both sides are lists. Prisma handles those with an invisible join table, and Mermaid has notation for it:

const manyToMany = leftCard === 'many' && rightCard === 'many';

That single line buys me correct rendering for Post[] <-> Tag[] patterns.

Mermaid output

With the graph in hand, emitting Mermaid is a formatting exercise:

const CARDINALITY_LEFT = { 'one': '||', 'zero-or-one': '|o', 'many': '}o' };
const CARDINALITY_RIGHT = { 'one': '||', 'zero-or-one': 'o|', 'many': 'o{' };

function formatEdge(edge: RelationEdge): string {
  const left = CARDINALITY_LEFT[edge.fromCardinality];
  const right = CARDINALITY_RIGHT[edge.toCardinality];
  return `  ${edge.from} ${left}--${right} ${edge.to} : ${edge.label}`;
}

Mermaid's ER crow's-foot syntax is pleasingly symmetric: || is "exactly one", o| is "zero or one", o{ is "zero or many". The left and right glyphs mirror each other, which means the same cardinality reads as a different pair depending on which side of the edge it's on. That's why there are two tables. It took me one debugging session to stop flipping them.

Models become blocks with field names, types, and optional PK/FK/UK/NULL markers:

erDiagram
  Post {
    Int id PK
    String title
    Int authorId FK
  }
  User {
    Int id PK
    String email UK
  }

  Post ||--o{ User : author

One Mermaid gotcha: the field type column can't contain ? or [], because Mermaid's own parser rejects those characters. So the formatter strips modifier punctuation from the rendered type and encodes nullability via the separate NULL marker. Subtle, but catches you once.

Testing something pure

The whole pipeline is pure functions, which makes it an unusually nice thing to test. No mocks, no fixtures that mutate state, no "run a CLI and capture stdout". I wrote 45 vitest tests across the layers:

Lexer (8): model header tokens, @ vs @@, list and optional modifiers, string escapes, comments, line tracking, error cases.
Parser (6): blog schema model extraction, field type modifiers, relation attribute capture, dotted attribute names (@db.VarChar), datasource skipping, parse error line/col.
Analyzer (11): 1:1 and 1:N and M:N detection, edge deduplication, enum fields as scalar-like, PK/FK detection, --include/--exclude filters, join-table pattern (two 1:N relations), deterministic sort order.
Formatters (8): Mermaid header, crow's-foot notation for 1:N, }o--o{ for M:N, --no-types flag, type sanitisation, DOT output, JSON round-trip.
Main (12): argument parsing for each flag, exit codes for missing file / bad format / parse error / success, --help and --version, --include filtering via CLI.

Every test is fast because every stage is pure. Full test run is under 300 ms. That speed matters for the parser-tinkering loop — you change a lexer rule, hit enter, immediately see 45 green dots or a useful failure.

Tradeoffs (the honest list)

This tool is deliberately limited. Things it doesn't do:

It doesn't evaluate attribute expressions. @default(now()) is parsed as an opaque string. If you want Prisma's own semantic analysis you need the real Prisma CLI. For a diagram, I don't need to know what now() returns.
It doesn't support multi-file schemas. Prisma has a preview feature where a schema folder can split models across files. The CLI takes a single file. Fixing this is a day of work: resolve // imports, stitch files together, run the same parser. I haven't needed it yet.
It doesn't render referential actions in the diagram. onDelete: Cascade is in the JSON output but not in the Mermaid output, because Mermaid's ER syntax has no good place for it. If you care, use --format json and render your own.
It's permissive on error recovery. If your schema has a typo, you get one error at the first failure point and the process exits. A real linter would keep going and report everything. A diagram generator doesn't need to.

Those are intentional. Every one of them could be added — none of them are worth the complexity for a tool that fits on one laptop screen.

Try it in 30 seconds

git clone https://github.com/sen-ltd/prisma-erd
cd prisma-erd
docker build -t prisma-erd .
docker run --rm -v "$PWD/tests/fixtures:/work" prisma-erd /work/blog.prisma

That prints a Mermaid ER diagram for the test fixture. Pipe it to a file, wrap it in triple-backtick mermaid fences, paste it into your README. The diagram renders on GitHub.

# Filter to a subset of your models:
docker run --rm -v "$PWD/tests/fixtures:/work" prisma-erd /work/blog.prisma --include User,Post

# Get structured JSON for a custom renderer:
docker run --rm -v "$PWD/tests/fixtures:/work" prisma-erd /work/blog.prisma --format json

# Graphviz DOT if you want a rasterised image after all:
docker run --rm -v "$PWD/tests/fixtures:/work" prisma-erd /work/blog.prisma --format dot | dot -Tpng > erd.png

The runtime image is 136 MB, non-root, and has zero runtime dependencies beyond Node's built-ins. The parser is ~300 lines. The whole project is under a thousand lines of TypeScript including tests.

What I took away

Writing a parser for a real DSL from scratch is a healthy exercise for a working TypeScript engineer. Prisma's schema is small enough to fit in your head, structured enough that recursive descent handles it, and useful enough that the result is something you'll actually run. It's not a toy. It talks to a production ORM schema you probably already have in front of you.

And the next time someone tells you parsing is hard, push back gently: parsing a full programming language is hard. Parsing a well-behaved configuration DSL is a weekend. Writing your own gives you a level of understanding of the language you simply don't get from using the official tools.

GitHub: https://github.com/sen-ltd/prisma-erd — MIT, zero runtime deps, 45 tests, paste the Mermaid straight into your README.

DEV Community

Parsing Prisma Schema By Hand: A 300-Line CLI That Emits Mermaid ER Diagrams

Parsing Prisma Schema By Hand: A 300-Line CLI That Emits Mermaid ER Diagrams

The problem

Why Prisma's schema is pleasant to parse

Step one: a tiny lexer

Step two: recursive descent for blocks

Step three: relation pairing

Mermaid output

Testing something pure

Tradeoffs (the honest list)

Try it in 30 seconds

What I took away

Top comments (0)