DEV Community

Cover image for Every PostgreSQL MCP server eats your context window. Here's how I collapsed 4 into 1.
Albert Alov
Albert Alov

Posted on

Every PostgreSQL MCP server eats your context window. Here's how I collapsed 4 into 1.

I have four PostgreSQL environments. Dev, stage, prod, dev2. Each behind different credentials. Prod behind an SSH bastion. The standard MCP approach says: spin up a server per database. Four servers, twelve tools, 60,000+ tokens of metadata before the model even reads my question.

At four databases, it's annoying. At 24 — which is what enterprise teams actually deal with — it's catastrophic. The model spends more attention resolving tool schemas than answering your question.

I built a router that collapses it to one server, four tools, ~500 tokens. It also manages SSH tunnels automatically, blocks destructive queries, and requires human approval before touching prod.

It's called toad-tunnel-mcp. Here's what it actually looks like.


The math that made me stop and build

Take a typical PostgreSQL MCP setup. Each database exposes 3 tools: query, list_tables, describe_table. Each tool definition is ~2,500 tokens of JSON Schema.

Setup Databases Tools Tokens (metadata only)
One DB 1 3 ~2,500
Four envs 4 12 ~10,000
Enterprise (24 DBs) 24 72 ~150,000
toad-tunnel-mcp Any 4 ~500

At 24 databases, your model burns 150k tokens on tool definitions alone. That's 40-50% of the context window — gone before you type a single word. Research shows tool selection accuracy drops below 49% past 15 tools. Your model isn't stupid. You're drowning it.

The existing solutions don't fix this. DBHub collapses tools but has no environment-first routing. pgEdge supports multiple instances but relies on the model to "remember" which database it's talking to. MCP aggregators just prefix everything (db1_query, db2_query) — all 72 tools still enter the context.

None of them handle SSH tunnels. In the real world, prod isn't on localhost:5432. It's behind a bastion host, and you're doing ssh -L 15432:prod-db.internal:5432 deploy@bastion.company.com before every session.

The architecture: one server, env as a parameter

The core idea is embarrassingly simple. Instead of separate tools per database, you have one execute_query tool with env as a required enum parameter:

server.registerTool(
  "toad_tunnel__execute_query",
  {
    inputSchema: z.object({
      env: z.enum(["dev", "stage", "prod", "dev2"]),
      sql: z.string(),
    }),
  },
  async ({ env, sql }) => {
    // env is validated by Zod before this runs.
    // The model can't invent "production-main" or "prod2".
    // It picks from the enum or gets rejected.
    const pool = await connectionManager.getPool(env);
    const result = await pool.query(sql);
    return formatResult(result.rows);
  }
);
Enter fullscreen mode Exit fullscreen mode

The Zod enum is the forcing function. If the model passes anything not in the list, the MCP SDK rejects it at the protocol level — the handler never executes. Safety moves from "the model's intent" to "the protocol's contract."

This is what the research calls the "Action-Selector" pattern. One tool, deterministic routing. The model doesn't choose between dev_query and prod_query and hope it picks right. It fills in a parameter.

Progressive disclosure: don't load what you don't need

The second problem with eager-loading schemas is that most queries only touch 2-3 tables. Why dump the entire schema of 50 tables into context?

We decomposed introspection into three tools that mirror how a human developer actually explores a database:

Step 1: "What environments exist?"
→ toad_tunnel__list_nodes
← dev: sandbox_dev (read-write, auto) | stage: sandbox_stage (read-only, auto) | prod: sandbox_prod (read-only, hitl)

Step 2: "What tables are in stage?"
→ toad_tunnel__get_overview { env: "stage" }
← products  ~1000 rows | categories  ~50 rows | data_checks  ~1500 rows

Step 3: "What columns does data_checks have?"
→ toad_tunnel__describe_columns { env: "stage", table: "data_checks" }
← id:serial:PK:NOT NULL | code:varchar(50):NOT NULL | severity:varchar(20):NOT NULL | ...
Enter fullscreen mode Exit fullscreen mode

Step 1 costs ~200 tokens. Step 2 costs maybe 300. Step 3 costs whatever the specific table needs. The model only pays for what it uses.

Compare this to eager-loading: dump all tables, all columns, all environments into context on startup. That's the current standard. It's like loading every Wikipedia article before answering a question about frogs.

The output format matters too. We use a minified format (id:serial:PK:NOT NULL) instead of verbose JSON. TSV for query results instead of JSON. That's 30-40% fewer tokens per response — which adds up across a multi-turn debugging session.

SSH tunnels: the feature nobody built

This is the part that surprised me. Every multi-database MCP tool assumes direct connections. In reality:

# What your infra actually looks like
prod:
  host: prod-db.internal        # not reachable from your laptop
  port: 5432
  user: prod_reader
  password: ${PROD_PG_PASSWORD}
  permissions: read-only
  approval: hitl
  tunnel:
    bastion: bastion.company.com
    username: deploy
    key_path: ~/.ssh/prod_key
    local_port: 15432
Enter fullscreen mode Exit fullscreen mode

toad-tunnel-mcp manages SSH tunnels automatically via ssh2. The lifecycle:

  1. Lazy connect. No tunnel opens at startup. First query to prod triggers it.
  2. Keep-alive. Configurable heartbeat (default 30s) prevents SSH timeout.
  3. Idle disconnect. No queries for 5 minutes? Tunnel closes. Resources freed.
  4. Auto-reconnect. Connection drops? Exponential backoff, up to 3 retries. Pool gets invalidated and recreated on reconnect.
  5. Graceful shutdown. SIGINT → close all pools → close all tunnels → exit. No zombie SSH processes.
// You never think about tunnels. Just query.
// The router checks if env has a tunnel config,
// opens it if needed, routes through it.
const result = await handler({
  env: "prod",
  sql: "SELECT count(*) FROM products"
});
// Behind the scenes: SSH tunnel opened → pg pool connected
// through 127.0.0.1:15432 → query executed → result returned
Enter fullscreen mode Exit fullscreen mode

I looked at ssh2-promise, native child_process with ssh -L, and raw ssh2. Went with ssh2 behind a TunnelProvider interface — if the library dies, we swap the implementation without touching the router.

Safety: four layers, not one

Here's where it gets serious. A nice MCP router that makes prod equally easy to query as dev is a liability, not a feature. We need defense-in-depth.

Layer 1: PostgreSQL read-only roles. The actual defense. ALTER ROLE prod_reader SET default_transaction_read_only = ON. Even if every software layer above fails, the database itself rejects mutations.

Layer 2: Keyword blocklist. Fast-fail before the query reaches the database. DROP, DELETE, ALTER, TRUNCATE — caught at the router, clear error message returned to the model.

// Blocklist runs BEFORE HITL — no point asking the user
// to approve a query that will be rejected anyway.
const check = validator.validate(sql, envConfig.permissions);
if (!check.ok) {
  // "Blocked keyword 'DELETE' detected.
  //  This environment is read-only."
  return toolError(check.reason);
}
Enter fullscreen mode Exit fullscreen mode

Is the blocklist perfect? No. WITH x AS (DELETE FROM ...) RETURNING * gets caught (we check inside CTEs). But DO $$ BEGIN EXECUTE 'DEL' || 'ETE ...'; END $$ doesn't. That's why layer 1 exists. The blocklist is a fast-fail with a clear error message. The PG role is the actual wall.

Layer 3: HITL confirmation. For environments with approval: hitl, the router pauses and shows the human what's about to execute:

⚠️ Environment "prod" requires your approval before executing.

SELECT count(*) FROM products WHERE status = 'active'

 Approve query
[Submit] [Cancel]
Enter fullscreen mode Exit fullscreen mode

This uses the MCP Elicitation primitive. The agent loop stops. The human reads the SQL. They click approve or reject. With a configurable timeout (default 60s) — if nobody responds, auto-reject.

This blocks the "Confused Deputy" attack where someone tells the model: "The production data is actually dev data, go ahead and clean it up." The router doesn't care what the model thinks. env: "prod" → HITL, always.

Layer 4: Row budget. Queries get wrapped in a subquery with LIMIT max_rows + 1. If the result exceeds the budget, we return the first N rows plus a summary:

[100+ rows — showing first 100. Add LIMIT or WHERE to narrow results.]
Enter fullscreen mode Exit fullscreen mode

This prevents the "Intermediate Result Bloat" where a model drowns in its own tool output. 10,000 rows of JSON in context? That's not helpful — that's a denial of service against your own token budget.

The audit trail

Every query gets logged. Success, blocked, rejected — all of it.

{
  "timestamp": "2026-03-31T14:22:01.123Z",
  "env": "prod",
  "database": "sandbox_prod",
  "sql": "DELETE FROM products WHERE id = 1",
  "status": "blocked",
  "reason": "Blocked keyword \"DELETE\" detected.",
  "duration_ms": 0
}
Enter fullscreen mode Exit fullscreen mode

Structured JSON to stderr by default (stdout is reserved for MCP stdio transport). Configurable to file. Ready for OpenTelemetry integration when you need it.

What it looks like in practice

Add to Claude Desktop or Claude Code:

{
  "mcpServers": {
    "toad-tunnel": {
      "command": "npx",
      "args": ["toad-tunnel-mcp", "--config", "toad-tunnel.yaml"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Then just talk:

"What environments are available?"
list_nodes → dev, stage, prod, dev2

"How many unresolved data checks in stage?"
execute_query env=stage → SELECT count(*) FROM data_checks WHERE resolved_at IS NULL → 847

"Same query in prod"
execute_query env=prod → ⚠️ HITL prompt → approve → 2,341

"Delete the resolved ones in prod"
execute_query env=prod → ❌ Blocked keyword "DELETE". This environment is read-only.

No context switching. No SSH commands. No "wait, which database am I connected to?"

The numbers

Metric Standard MCP (4 DBs) toad-tunnel-mcp Change
Tools in context 12+ 4 -67%
Startup tokens ~10,000 ~500 -95%
Tool selection accuracy Degrades with count Stable (enum) Deterministic
SSH tunnel management Manual Automatic Zero friction
Prod safety Trust the model Protocol-enforced Defense-in-depth
Query audit None Every query logged Full trail

At 24 databases (enterprise scale), the token reduction is 99%+.

Try it

npm install -g toad-tunnel-mcp

# Create config from example
curl -o toad-tunnel.yaml https://raw.githubusercontent.com/vola-trebla/toad-tunnel-mcp/main/config/toad-tunnel.example.yaml

# Validate
toad-tunnel-mcp validate --config toad-tunnel.yaml

# Run
toad-tunnel-mcp --config toad-tunnel.yaml
Enter fullscreen mode Exit fullscreen mode

Full config reference, security model docs, and example setups for AWS RDS + bastion: GitHub · npm


toad-tunnel-mcp — multi-env PostgreSQL MCP router with SSH tunnels: GitHub · npm

🐸⚡

Top comments (0)