Stop Sending 93K Tokens of Schema to Your LLM Agent!

#ai #productivity #opensource #programming

I've watched agents query information_schema over and over, spending 4-6 turns just to figure out which tables exist, what columns they have, and how they join. On a 500-table database, the full DDL is around 93,000 tokens. Most questions touch 3-5 tables. On a complex multi-table join, I measured a 64% token reduction by just giving the agent the schema upfront.

That's what dbdense does.

I built dbdense to fix this.

What it does

dbdense is a three-step offline pipeline: extract, compile, serve.

Extract connects to your database once and snapshots the schema into a portable JSON file (ctxexport.json). Tables, columns, types, primary keys, foreign keys, indexes -- everything an LLM needs to write correct queries.
Compile turns that snapshot into two artifacts:
- A lighthouse -- a compact table map (~4K tokens for 500 tables). It looks like this:
```
 T:users|J:orders,sessions
 T:orders|E:payload,shipping|J:payments,shipments,users
 T:payments|J:orders
```
Every table, its FK neighbors, and embedded docs. 23x smaller than full DDL. This stays in the agent's context so it always knows what's available.
- Full DDL -- standard CREATE TABLE statements with constraints, rendered on demand only for the specific tables the agent asks about.
Serve (optional) exposes the lighthouse as an MCP resource and the DDL via an MCP slice tool. The agent reads the map, picks the tables it needs, and gets back just those definitions.

After the extract, everything runs locally. The compiled artifacts are plain text you can commit to your repo. No database connection needed at runtime.

No credentials in the agent runtime

The export step is the only step that touches the database. After that, compile and serve work from the local snapshot. Your production database credentials never need to be in the agent's environment. The tool works offline and air-gapped.

The numbers

I ran an agentic benchmark: n=3, same 5 questions, same seeded Postgres database (20K+ rows, 8 tables), same model (Claude Sonnet 4). One arm had only a Postgres MCP tool. The other had the same tool plus dbdense schema context injected into the prompt.

Metric	Without schema context	With dbdense	Delta
Correct answers	13/15	13/15	equal
Avg turns	4.1	2.2	-46%
Tokens per run	285,922	187,603	-34%

Same accuracy. 34% fewer tokens. 46% fewer turns.

The savings scale with query complexity. On simple single-table filters, both arms performed about the same. On a complex multi-table join, the baseline agent spent 6+ turns querying information_schema to discover the schema. dbdense answered in 2 turns, using 64% fewer tokens for that query.

The two wrong answers (both on the same question, in both arms) returned identical incorrect results, pointing to question ambiguity rather than a schema context issue.

Sidecar enrichment

Databases lie by omission. A column named status with type text tells the LLM nothing about what values are valid. The agent either guesses or wastes a SELECT DISTINCT turn to find out.

dbdense supports a dbdense.yaml sidecar file where you annotate columns with descriptions and enum values:

entities:
  payments:
    fields:
      status:
        values: ["pending", "authorized", "paid", "failed", "refunded"]
  orders:
    fields:
      status:
        description: "Order lifecycle status."
        values: ["pending", "confirmed", "shipped", "delivered", "cancelled"]

These annotations merge into the compiled DDL as inline SQL comments. The LLM sees -- Values: pending, authorized, paid, failed, refunded right next to the column definition. No extra queries needed.

This also works for documenting JSONB structures, MongoDB embedded documents, or anything else the raw schema doesn't capture.

What it doesn't do

The snapshot is static. If your schema changes, re-run export. This is intentional -- schemas are stable; questions change.

The slice tool still depends on the LLM picking the right tables from the lighthouse. dbdense reduces the context problem; it doesn't solve table selection for the model.

It's not a pg_dump --schema-only replacement. The renderer covers columns, PKs, FKs, NOT NULL, defaults, unique constraints, and indexes, but skips triggers, RLS policies, and custom types.

Try it

go install github.com/valkdb/dbdense/cmd/dbdense@latest
dbdense export --driver postgres --db "postgres://user:pass@localhost:5432/mydb" --schemas public
dbdense compile --mode lighthouse --in ctxexport.json --out lighthouse.txt
dbdense compile --in ctxexport.json --out schema.sql

You now have two files: a lighthouse map and full DDL. Point your agent at them. If you use Claude Code, dbdense init-claude writes the MCP config for you.

The project is open source at github.com/valkdb/dbdense.