I built an MCP server that gives AI persistent memory of your SQL database

Suraj Goyal — Thu, 28 May 2026 03:58:56 +0000

A while ago I tried to build a local coding assistant. I downloaded Qwen3, fired it up on my MacBook with 16GB of RAM, and within a day realized the output quality was nowhere close to Claude or GPT-5. The model could fit. It just couldn't compete.

So I changed the question.

If I can't make the model smarter on my hardware, can I make what I feed it smarter?

Where the tokens actually go

I started watching where my Claude / Cursor / Copilot sessions actually spent their tokens. The surprise: most of it wasn't reasoning. It was lookup.

Every fresh chat about my company's database re-discovered the same things:

What does status = 3 mean? (cancelled)
How does orders join to users? (orders.user_id → users.id)
What's that cryptic JobStatus enum? (a dozen integer codes nobody remembers)

The model figured it out, the session ended, and tomorrow it figured it out again. Same tokens, same latency, every single time. The expensive part of working with an AI wasn't the thinking — it was re-teaching it things it had already learned yesterday.

There's a lot of attention right now on trimming AI output tokens (talk like a caveman, strip the pleasantries, etc.). But in my workflow the bigger leak was on the input side: paying full token cost every session to re-establish context that never changed.

"Memory" isn't a feature, it's an architecture question

AI clients are starting to bolt on "memory" features. But they're proprietary, opaque, and locked to one tool. Claude's memory doesn't help Cursor. Cursor's doesn't help Copilot. You can't inspect it, you can't share it with a teammate, and you can't diff it.

What I actually wanted was an explicit, inspectable, shareable context layer that any AI client could read deterministically — same answer every time, same file my team could hand off.

I picked the highest re-learn cost in my world to start with: SQL databases.

Enter amnesic

amnesic is an open-source MCP server that gives any AI client persistent semantic memory of your SQL databases. The name is ironic — it's anything but amnesic. It remembers.

You (or the AI) annotate a table or column once:

db_annotate(
    table="orders",
    column="status",
    column_description="Order lifecycle state",
    enum_values={"1": "pending", "2": "shipped", "3": "cancelled", "4": "delivered"},
)

…and it's stored in a local SQLite file. Every future db_get_schema call merges those annotations back into the response — across sessions, across AI clients, forever:

You:  How many cancelled orders this month?
AI:   [calls db_get_schema("orders")]
      → status column: enum {"3": "cancelled", ...}
      [writes correct SQL immediately, no re-discovery]
      SELECT COUNT(*) FROM orders WHERE status = 3 AND ...

No re-explaining. No wasted turns. The annotation persisted.

The technical decisions I'd defend

A few choices that might interest people building similar tools:

SQLite FTS5 over a vector DB

I started with ChromaDB for search — "find the table that handles payments." Then I ripped it out. SQLite's built-in FTS5 with BM25 ranking covers the "find the right table/column" use case at zero dependency cost. No embeddings, no model download, no external service. For a tool that's supposed to be a lightweight local layer, pulling in a 50MB+ vector stack was the wrong trade.

db_search("payment")
# → ranked: orders.payment_method, consumerpayments table, ...
#   all from a local FTS5 index, no network, no embeddings

Two-layer read-only enforcement

amnesic connects to production databases, so the AI must never be able to mutate anything. Two independent layers:

Static SQL analysis — reject anything that isn't SELECT / WITH; catch write keywords, SELECT ... INTO OUTFILE, and writes smuggled inside CTEs.
Transaction rollback — every query runs inside a transaction that's immediately rolled back. Even if a write slipped past layer 1, nothing commits.

Belt and suspenders. The AI shouldn't be able to drop your table even by accident.

One SQLite file per connection

Schema cache + annotations + FK relationship graph + the FTS5 index all live in one SQLite file per database connection. Portable, inspectable, chmod 600. Want to hand your accumulated knowledge to a teammate? It's a single file.

Data minimization as a side effect

A nice property fell out of the design: a well-annotated schema means the AI answers most questions from the local knowledge file — without ever querying the database. "What does status=3 mean?" resolves from the annotation. "How do orders join users?" resolves from the FK graph. That's measurably less row data leaving your machine than a "naked" SQL MCP that runs SELECT DISTINCT status FROM orders every time it's confused.

What it's not

It doesn't make the model smarter.
It doesn't do natural-language-to-SQL.
It's not a replacement for execution-focused MCP servers — those handle query execution and live introspection well. amnesic's only job is the persistence/annotation layer I couldn't find in any of them.

Try it

pip install amnesic
amnesic init          # interactive setup wizard

Then add it to your AI client's mcp.json and restart. Works with PostgreSQL, MySQL, MSSQL, and SQLite. MIT-licensed, on PyPI, and registered on the official Linux Foundation MCP Registry.

GitHub: github.com/SurajKGoyal/amnesic

The takeaway

Not every AI problem needs a smarter model. Sometimes the win is an external context layer that's deterministic, inspectable, and shared — so the model never has to learn the same thing twice.

I'd love feedback, especially on the read-only enforcement — that's the part that has to be bulletproof. Issues and PRs welcome.

DEV Community: Suraj Goyal