I've watched agents query information_schema over and over, spending 4-6 turns just to figure out which tables exist, what columns they have, and how they join. On a 500-table database, the full DDL is around 93,000 tokens. Most questions touch 3-5 tables. On a complex multi-table join, I measured a 64% token reduction by just giving the agent the schema upfront.
That's what dbdense does.
I built dbdense to fix this.
What it does
dbdense is a three-step offline pipeline: extract, compile, serve.
Extract connects to your database once and snapshots the schema into a portable JSON file (
ctxexport.json). Tables, columns, types, primary keys, foreign keys, indexes -- everything an LLM needs to write correct queries.-
Compile turns that snapshot into two artifacts:
- A lighthouse -- a compact table map (~4K tokens for 500 tables). It looks like this:
T:users|J:orders,sessions T:orders|E:payload,shipping|J:payments,shipments,users T:payments|J:ordersEvery table, its FK neighbors, and embedded docs. 23x smaller than full DDL. This stays in the agent's context so it always knows what's available.
-
Full DDL -- standard
CREATE TABLEstatements with constraints, rendered on demand only for the specific tables the agent asks about.
Serve (optional) exposes the lighthouse as an MCP resource and the DDL via an MCP
slicetool. The agent reads the map, picks the tables it needs, and gets back just those definitions.
After the extract, everything runs locally. The compiled artifacts are plain text you can commit to your repo. No database connection needed at runtime.
No credentials in the agent runtime
The export step is the only step that touches the database. After that, compile and serve work from the local snapshot. Your production database credentials never need to be in the agent's environment. The tool works offline and air-gapped.
The numbers
I ran an agentic benchmark: n=3, same 5 questions, same seeded Postgres database (20K+ rows, 8 tables), same model (Claude Sonnet 4). One arm had only a Postgres MCP tool. The other had the same tool plus dbdense schema context injected into the prompt.
| Metric | Without schema context | With dbdense | Delta |
|---|---|---|---|
| Correct answers | 13/15 | 13/15 | equal |
| Avg turns | 4.1 | 2.2 | -46% |
| Tokens per run | 285,922 | 187,603 | -34% |
Same accuracy. 34% fewer tokens. 46% fewer turns.
The savings scale with query complexity. On simple single-table filters, both arms performed about the same. On a complex multi-table join, the baseline agent spent 6+ turns querying information_schema to discover the schema. dbdense answered in 2 turns, using 64% fewer tokens for that query.
The two wrong answers (both on the same question, in both arms) returned identical incorrect results, pointing to question ambiguity rather than a schema context issue.
Sidecar enrichment
Databases lie by omission. A column named status with type text tells the LLM nothing about what values are valid. The agent either guesses or wastes a SELECT DISTINCT turn to find out.
dbdense supports a dbdense.yaml sidecar file where you annotate columns with descriptions and enum values:
entities:
payments:
fields:
status:
values: ["pending", "authorized", "paid", "failed", "refunded"]
orders:
fields:
status:
description: "Order lifecycle status."
values: ["pending", "confirmed", "shipped", "delivered", "cancelled"]
These annotations merge into the compiled DDL as inline SQL comments. The LLM sees -- Values: pending, authorized, paid, failed, refunded right next to the column definition. No extra queries needed.
This also works for documenting JSONB structures, MongoDB embedded documents, or anything else the raw schema doesn't capture.
What it doesn't do
The snapshot is static. If your schema changes, re-run export. This is intentional -- schemas are stable; questions change.
The slice tool still depends on the LLM picking the right tables from the lighthouse. dbdense reduces the context problem; it doesn't solve table selection for the model.
It's not a pg_dump --schema-only replacement. The renderer covers columns, PKs, FKs, NOT NULL, defaults, unique constraints, and indexes, but skips triggers, RLS policies, and custom types.
Try it
go install github.com/valkdb/dbdense/cmd/dbdense@latest
dbdense export --driver postgres --db "postgres://user:pass@localhost:5432/mydb" --schemas public
dbdense compile --mode lighthouse --in ctxexport.json --out lighthouse.txt
dbdense compile --in ctxexport.json --out schema.sql
You now have two files: a lighthouse map and full DDL. Point your agent at them. If you use Claude Code, dbdense init-claude writes the MCP config for you.
The project is open source at github.com/valkdb/dbdense.
Top comments (0)