A while ago I tried to build a local coding assistant. I downloaded Qwen3, fired it up on my MacBook with 16GB of RAM, and within a day realized the output quality was nowhere close to Claude or GPT-5. The model could fit. It just couldn't compete.
So I changed the question.
If I can't make the model smarter on my hardware, can I make what I feed it smarter?
Where the tokens actually go
I started watching where my Claude / Cursor / Copilot sessions actually spent their tokens. The surprise: most of it wasn't reasoning. It was lookup.
Every fresh chat about my company's database re-discovered the same things:
- What does
status = 3mean? (cancelled) - How does
ordersjoin tousers? (orders.user_id → users.id) - What's that cryptic
JobStatusenum? (a dozen integer codes nobody remembers)
The model figured it out, the session ended, and tomorrow it figured it out again. Same tokens, same latency, every single time. The expensive part of working with an AI wasn't the thinking — it was re-teaching it things it had already learned yesterday.
There's a lot of attention right now on trimming AI output tokens (talk like a caveman, strip the pleasantries, etc.). But in my workflow the bigger leak was on the input side: paying full token cost every session to re-establish context that never changed.
"Memory" isn't a feature, it's an architecture question
AI clients are starting to bolt on "memory" features. But they're proprietary, opaque, and locked to one tool. Claude's memory doesn't help Cursor. Cursor's doesn't help Copilot. You can't inspect it, you can't share it with a teammate, and you can't diff it.
What I actually wanted was an explicit, inspectable, shareable context layer that any AI client could read deterministically — same answer every time, same file my team could hand off.
I picked the highest re-learn cost in my world to start with: SQL databases.
Enter amnesic
amnesic is an open-source MCP server that gives any AI client persistent semantic memory of your SQL databases. The name is ironic — it's anything but amnesic. It remembers.
You (or the AI) annotate a table or column once:
db_annotate(
table="orders",
column="status",
column_description="Order lifecycle state",
enum_values={"1": "pending", "2": "shipped", "3": "cancelled", "4": "delivered"},
)
…and it's stored in a local SQLite file. Every future db_get_schema call merges those annotations back into the response — across sessions, across AI clients, forever:
You: How many cancelled orders this month?
AI: [calls db_get_schema("orders")]
→ status column: enum {"3": "cancelled", ...}
[writes correct SQL immediately, no re-discovery]
SELECT COUNT(*) FROM orders WHERE status = 3 AND ...
No re-explaining. No wasted turns. The annotation persisted.
The technical decisions I'd defend
A few choices that might interest people building similar tools:
SQLite FTS5 over a vector DB
I started with ChromaDB for search — "find the table that handles payments." Then I ripped it out. SQLite's built-in FTS5 with BM25 ranking covers the "find the right table/column" use case at zero dependency cost. No embeddings, no model download, no external service. For a tool that's supposed to be a lightweight local layer, pulling in a 50MB+ vector stack was the wrong trade.
db_search("payment")
# → ranked: orders.payment_method, consumerpayments table, ...
# all from a local FTS5 index, no network, no embeddings
Two-layer read-only enforcement
amnesic connects to production databases, so the AI must never be able to mutate anything. Two independent layers:
-
Static SQL analysis — reject anything that isn't
SELECT/WITH; catch write keywords,SELECT ... INTO OUTFILE, and writes smuggled inside CTEs. - Transaction rollback — every query runs inside a transaction that's immediately rolled back. Even if a write slipped past layer 1, nothing commits.
Belt and suspenders. The AI shouldn't be able to drop your table even by accident.
One SQLite file per connection
Schema cache + annotations + FK relationship graph + the FTS5 index all live in one SQLite file per database connection. Portable, inspectable, chmod 600. Want to hand your accumulated knowledge to a teammate? It's a single file.
Data minimization as a side effect
A nice property fell out of the design: a well-annotated schema means the AI answers most questions from the local knowledge file — without ever querying the database. "What does status=3 mean?" resolves from the annotation. "How do orders join users?" resolves from the FK graph. That's measurably less row data leaving your machine than a "naked" SQL MCP that runs SELECT DISTINCT status FROM orders every time it's confused.
What it's not
- It doesn't make the model smarter.
- It doesn't do natural-language-to-SQL.
- It's not a replacement for execution-focused MCP servers — those handle query execution and live introspection well. amnesic's only job is the persistence/annotation layer I couldn't find in any of them.
Try it
pip install amnesic
amnesic init # interactive setup wizard
Then add it to your AI client's mcp.json and restart. Works with PostgreSQL, MySQL, MSSQL, and SQLite. MIT-licensed, on PyPI, and registered on the official Linux Foundation MCP Registry.
GitHub: github.com/SurajKGoyal/amnesic
The takeaway
Not every AI problem needs a smarter model. Sometimes the win is an external context layer that's deterministic, inspectable, and shared — so the model never has to learn the same thing twice.
I'd love feedback, especially on the read-only enforcement — that's the part that has to be bulletproof. Issues and PRs welcome.
Top comments (2)
The "Memory isn't a feature, it's an architecture question" line hit hard. I've been building a local-first memory system (MemBridge, SQLite-based) and came to the same conclusion — bolting memory onto the client doesn't scale when you have multiple agents that need to share context.
One question: how do you handle schema drift? If the SQL schema changes, does the MCP server re-discover the structure or do you maintain a mapping layer?
Thanks! And yeah — the multi-agent sharing problem is what pushed me there too. Client-bolted memory means every tool re-implements it and none of them can read each other's. A file on disk that any MCP client reads is the only thing that scales across Claude + Cursor + whatever's next.
On schema drift — honest answer: partially handled today, fully addressed in the next release.
Today (v0.1.x):
Schema is cached per connection. db_get_schema serves from cache; you pass force_refresh=True to re-fetch from the live DB when you know it changed. So drift detection is currently manual — the caller decides when to refresh.
Annotations are keyed by fully-qualified name + column. If a column gets renamed or dropped, nothing breaks — but the annotation orphans (stays in the store, no longer matches live schema). No auto-migration.
v0.2 (already scoped):
db_detect_drift(connection) — diffs stored annotations against the live schema and surfaces orphaned annotations (annotated column no longer exists) + undocumented tables (in schema, never annotated).
db_deprecate(...) — soft-delete so the AI sees a deprecation note instead of silently stale data.
I deliberately didn't build auto-migration of annotations (e.g. "column renamed A→B, move the annotation") — the rename-vs-drop-and-add ambiguity isn't safely inferable without human intent. Drift detection + explicit deprecate felt like the honest boundary.
Genuinely curious how MemBridge handles it — do you snapshot the schema and diff, or version the mapping layer? The orphan-vs-rename problem is the part I keep going back and forth on.