Suraj Goyal

Posted on May 28

I built an MCP server that gives AI persistent memory of your SQL database

#python #ai #mcp #showdev

A while ago I tried to build a local coding assistant. I downloaded Qwen3, fired it up on my MacBook with 16GB of RAM, and within a day realized the output quality was nowhere close to Claude or GPT-5. The model could fit. It just couldn't compete.

So I changed the question.

If I can't make the model smarter on my hardware, can I make what I feed it smarter?

Where the tokens actually go

I started watching where my Claude / Cursor / Copilot sessions actually spent their tokens. The surprise: most of it wasn't reasoning. It was lookup.

Every fresh chat about my company's database re-discovered the same things:

What does status = 3 mean? (cancelled)
How does orders join to users? (orders.user_id → users.id)
What's that cryptic JobStatus enum? (a dozen integer codes nobody remembers)

The model figured it out, the session ended, and tomorrow it figured it out again. Same tokens, same latency, every single time. The expensive part of working with an AI wasn't the thinking — it was re-teaching it things it had already learned yesterday.

There's a lot of attention right now on trimming AI output tokens (talk like a caveman, strip the pleasantries, etc.). But in my workflow the bigger leak was on the input side: paying full token cost every session to re-establish context that never changed.

"Memory" isn't a feature, it's an architecture question

AI clients are starting to bolt on "memory" features. But they're proprietary, opaque, and locked to one tool. Claude's memory doesn't help Cursor. Cursor's doesn't help Copilot. You can't inspect it, you can't share it with a teammate, and you can't diff it.

What I actually wanted was an explicit, inspectable, shareable context layer that any AI client could read deterministically — same answer every time, same file my team could hand off.

I picked the highest re-learn cost in my world to start with: SQL databases.

Enter amnesic

amnesic is an open-source MCP server that gives any AI client persistent semantic memory of your SQL databases. The name is ironic — it's anything but amnesic. It remembers.

You (or the AI) annotate a table or column once:

db_annotate(
    table="orders",
    column="status",
    column_description="Order lifecycle state",
    enum_values={"1": "pending", "2": "shipped", "3": "cancelled", "4": "delivered"},
)

…and it's stored in a local SQLite file. Every future db_get_schema call merges those annotations back into the response — across sessions, across AI clients, forever:

You:  How many cancelled orders this month?
AI:   [calls db_get_schema("orders")]
      → status column: enum {"3": "cancelled", ...}
      [writes correct SQL immediately, no re-discovery]
      SELECT COUNT(*) FROM orders WHERE status = 3 AND ...

No re-explaining. No wasted turns. The annotation persisted.

The technical decisions I'd defend

A few choices that might interest people building similar tools:

SQLite FTS5 over a vector DB

I started with ChromaDB for search — "find the table that handles payments." Then I ripped it out. SQLite's built-in FTS5 with BM25 ranking covers the "find the right table/column" use case at zero dependency cost. No embeddings, no model download, no external service. For a tool that's supposed to be a lightweight local layer, pulling in a 50MB+ vector stack was the wrong trade.

db_search("payment")
# → ranked: orders.payment_method, consumerpayments table, ...
#   all from a local FTS5 index, no network, no embeddings

Two-layer read-only enforcement

amnesic connects to production databases, so the AI must never be able to mutate anything. Two independent layers:

Static SQL analysis — reject anything that isn't SELECT / WITH; catch write keywords, SELECT ... INTO OUTFILE, and writes smuggled inside CTEs.
Transaction rollback — every query runs inside a transaction that's immediately rolled back. Even if a write slipped past layer 1, nothing commits.

Belt and suspenders. The AI shouldn't be able to drop your table even by accident.

One SQLite file per connection

Schema cache + annotations + FK relationship graph + the FTS5 index all live in one SQLite file per database connection. Portable, inspectable, chmod 600. Want to hand your accumulated knowledge to a teammate? It's a single file.

Data minimization as a side effect

A nice property fell out of the design: a well-annotated schema means the AI answers most questions from the local knowledge file — without ever querying the database. "What does status=3 mean?" resolves from the annotation. "How do orders join users?" resolves from the FK graph. That's measurably less row data leaving your machine than a "naked" SQL MCP that runs SELECT DISTINCT status FROM orders every time it's confused.

What it's not

It doesn't make the model smarter.
It doesn't do natural-language-to-SQL.
It's not a replacement for execution-focused MCP servers — those handle query execution and live introspection well. amnesic's only job is the persistence/annotation layer I couldn't find in any of them.

Try it

pip install amnesic
amnesic init          # interactive setup wizard

Then add it to your AI client's mcp.json and restart. Works with PostgreSQL, MySQL, MSSQL, and SQLite. MIT-licensed, on PyPI, and registered on the official Linux Foundation MCP Registry.

GitHub: github.com/SurajKGoyal/amnesic

The takeaway

Not every AI problem needs a smarter model. Sometimes the win is an external context layer that's deterministic, inspectable, and shared — so the model never has to learn the same thing twice.

I'd love feedback, especially on the read-only enforcement — that's the part that has to be bulletproof. Issues and PRs welcome.

Top comments (7)

xulingfeng • May 28

The "Memory isn't a feature, it's an architecture question" line hit hard. I've been building a local-first memory system (MemBridge, SQLite-based) and came to the same conclusion — bolting memory onto the client doesn't scale when you have multiple agents that need to share context.

One question: how do you handle schema drift? If the SQL schema changes, does the MCP server re-discover the structure or do you maintain a mapping layer?

Suraj Goyal • May 28

Thanks! And yeah — the multi-agent sharing problem is what pushed me there too. Client-bolted memory means every tool re-implements it and none of them can read each other's. A file on disk that any MCP client reads is the only thing that scales across Claude + Cursor + whatever's next.

On schema drift — honest answer: partially handled today, fully addressed in the next release.

Today (v0.1.x):

Schema is cached per connection. db_get_schema serves from cache; you pass force_refresh=True to re-fetch from the live DB when you know it changed. So drift detection is currently manual — the caller decides when to refresh.
Annotations are keyed by fully-qualified name + column. If a column gets renamed or dropped, nothing breaks — but the annotation orphans (stays in the store, no longer matches live schema). No auto-migration.
v0.2 (already scoped):

db_detect_drift(connection) — diffs stored annotations against the live schema and surfaces orphaned annotations (annotated column no longer exists) + undocumented tables (in schema, never annotated).
db_deprecate(...) — soft-delete so the AI sees a deprecation note instead of silently stale data.
I deliberately didn't build auto-migration of annotations (e.g. "column renamed A→B, move the annotation") — the rename-vs-drop-and-add ambiguity isn't safely inferable without human intent. Drift detection + explicit deprecate felt like the honest boundary.

Genuinely curious how MemBridge handles it — do you snapshot the schema and diff, or version the mapping layer? The orphan-vs-rename problem is the part I keep going back and forth on.

xulingfeng • May 28

That's exactly the pain point — every tool reinventing its own memory layer with zero interoperability. We went the same route with MemBridge: a shared SQLite file that both our agents read from directly. It's surprisingly elegant once you stop trying to bolt memory onto each agent separately.

Curious what the next release schema drift handling looks like — optimistic merge or something more explicit?

Suraj Goyal • May 28

explicit, not optimistic merge — drift gets detected and surfaced, then a human (or the AI with visible intent) decides whether to re-annotate or deprecate. I didn't want silent auto-migration quietly corrupting the one thing the tool's supposed to keep correct.

The full v0.2 plan lives in the issues tab — db_detect_drift and db_deprecate are scoped there: github.com/SurajKGoyal/amnesic/issues

I've got a direction I'm fairly settled on, but if you landed on something cleaner I'd genuinely want to hear it — drop a comment on the issue. The last round of questions here already turned into a real fix, so I'm taking these seriously. 🙂

xulingfeng • May 28

That "client-bolted memory" line hits home — we ran into exactly that pain running two Hermes agents that couldn't read each other's context. The "file on disk that any MCP tool can read" is basically what we converged on too: a shared SQLite store with unified entity extraction and trust scoring. The hidden win was that having a single source of truth also made debugging way easier — no more "which agent remembered what" guesswork. Did you run into any write-contention issues with multiple agents accessing the same store?

Suraj Goyal • May 28

Ha — "which agent remembered what" guesswork is exactly the mess a single source of truth eliminates. Good to know you landed on the same shape.

On write contention: the store was already built for concurrency — SQLite in WAL mode so reads never block, and a per-process lock serializing writes within a process. Where your question caught me was the cross-process case: two MCP clients (say Claude + Cursor) pointed at the same knowledge file, both writing the same instant. I'd been leaning on writes being rare enough not to collide — but "rare" isn't "handled," and a multi-agent setup like yours is exactly where it'd surface.

So I added a busy_timeout — a contending writer now queues up to 5s for the lock instead of failing with SQLITE_BUSY. Shipped it as v0.1.15 a few minutes ago. Honestly your question is what made me close that gap; I'd been hand-waving it as "writes are infrequent." Appreciate the nudge.

The workload still does most of the work: annotations are tiny and occasional — you annotate a column once, not in a hot loop — so the timeout rarely even has to engage. A constant-write store like yours leans on it far harder.

Trust scoring is the piece I don't have and keep circling back to — how are you weighting it? If it's worth comparing notes, the design discussions live in the issues tab: github.com/SurajKGoyal/amnesic/issues

DentistEmailList • Jun 1

Good⚡️