Sualeh Fatehi

Posted on May 26 • Edited on Jun 2

SchemaSpy vs SchemaCrawler - Which Database Documentation Tool is Right for You?

#database #devops #sql #opensource

Maintainer insights on CI/CD and linting

Both SchemaSpy and SchemaCrawler are free, open-source tools for documenting and analysing relational databases over JDBC. Both have been around for over 20 years. Both can generate entity-relationship diagrams. Yet the two tools are more different than they look.

Disclosure: I work on SchemaCrawler, so take this with appropriate scepticism. I have tried to represent SchemaSpy fairly.

What SchemaSpy Does Best

SchemaSpy's primary strength is its interactive HTML report. After a single run, you get a navigable website: clickable table pages, hyperlinked foreign keys, anomaly reports, and embedded ER diagrams for every table. It is exactly the kind of output you hand to a non-technical stakeholder, a consultant, or a new team member who needs to understand the data model quickly.

SchemaSpy also detects implied relationships - potential foreign keys that are not formally declared in the schema. It provides an orphan table page that surfaces tables with no relationships. These are genuinely useful for legacy databases.

If your goal is a shareable, browsable report that looks great in a browser, SchemaSpy delivers.

What SchemaCrawler Does Best

SchemaCrawler's strength is everything a developer needs before and after the report: searching, diffing, linting, scripting, and integration.

Diff-able text output

SchemaCrawler's "schema" command produces clean, structured text output - not HTML. Run it against production and staging, diff the outputs in git, and see exactly what changed. This is the foundation of schema change tracking in CI/CD.

Schema lint

The "lint" command catches design problems automatically: missing primary keys, nullable columns in unique constraints, redundant indices, tables with no relationships, and more. The lints can be extended to enforce your organization's rules, such as specific naming conventions.

Grep - regex search across the entire schema

--grep-tables and --grep-columns let you search all tables, columns, stored procedures, triggers, and foreign keys by regular expression. Find every column referencing a concept across a 500-table database in a single command. Combine it with --parents and --children to pull the related tables automatically.

Multiple output formats

Text, HTML, JSON, CSV, Markdown, and ER diagrams (via Graphviz). The Markdown output is useful for documentation-as-code; the JSON output is useful for tooling.

Schema extension with PlantUML and dbdiagram.io

SchemaCrawler can generate output in PlantUML and dbdiagram.io formats directly from your live database. This means you can start from what is actually in the database and then edit the diagram to model proposed additions or changes - something neither SchemaSpy nor most ERD tools support directly.

Scripting - Python, JavaScript

--command=script runs a script against live schema metadata. Generate custom reports, validate naming conventions, transform output - without writing a Java application.

Full Java API

SchemaCrawler is a JDBC metadata API. Embed it in a Java application and work with tables, columns, indexes, foreign keys, and routines as Java objects. SchemaSpy has no public API.

GitHub Actions integration

There is an official SchemaCrawler GitHub Action in the marketplace. Run lint, diff, and schema documentation generation as part of any CI/CD workflow. SchemaSpy has no equivalent.

Feature Comparison

Capability	SchemaCrawler	SchemaSpy
Interactive HTML report	✅	✅
Clickable navigation between tables	✅	✅
ER diagrams	✅	✅
Diff-able text output	✅	❌
Extensible schema lint / design checks	✅	❌
Grep / regex search across schema	✅	❌
Markdown, JSON, CSV output	✅	❌
PlantUML and dbdiagram.io output	✅	❌
Scripting (Python, JS, Groovy)	✅	❌
Java API	✅	❌
GitHub Actions integration	✅	❌
Implied relationship detection	✅	✅
Orphan table detection	✅	✅

Decision Guide

Choose SchemaSpy if…

Your primary output is a shareable, interactive HTML report for non-technical stakeholders
You want clickable navigation between related tables out of the box
You need implied/ virtual foreign key detection for a legacy schema with missing FK declarations

Choose SchemaCrawler if…

You need to track schema changes in version control - diff text output between environments
You want to catch design problems automatically - schema lint in CI
You need to search across a large schema - find all tables or columns matching a pattern
You are building schema checks into a CI/CD pipeline - GitHub Actions integration
You need output in Markdown, JSON, or CSV as well as HTML
You want to model future schema designs in PlantUML or dbdiagram.io, starting from your live database
You want to write scripts that process schema metadata programmatically
You are building a Java application that needs database metadata as objects

Can You Use Both?

Yes. They serve genuinely different workflows.

Use SchemaSpy to generate the stakeholder-facing HTML report. Use SchemaCrawler for diff, lint, and grep in your development and CI/CD workflow. The two tools are not competitors - they complement each other.

Try SchemaCrawler

The full documentation is at schemacrawler.com. The source is at github.com/schemacrawler/SchemaCrawler.

Top comments (8)

adriens • May 30

Currently added SC to a developennt pipeline and stroytelling I'm working on :

Sualeh Fatehi • May 30

@adriens SchemaCrawler also generates Mermaid ER diagrams which can be directly embedded into GitHub Markdown.

adriens • May 30

Maybe implementing a pure LaTeX like fig exports too?

adriens • May 30

Excllent. My current usecase is to embed the chart into a R-driver Quarto Notebook

adriens • May 30

github.com/schemacrawler/SchemaCra...

adriens • May 27

... and linting is so efficent too

adriens • May 27

Sure it's the best, and super well maintained ! About to use it live on our CI to continuously attach schemas as assets🤩

Mykola Kondratiuk • Jun 2

used SchemaSpy on a legacy Oracle DB once. it's surprisingly useful when you inherit a database with zero documentation - that HTML report made the handoff readable. hadn't given SchemaCrawler API mode much thought before this.

Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more