DEV Community: Sean Markwei

You probably don't need a vector database for agent memory

Sean Markwei — Sat, 25 Jul 2026 00:47:39 +0000

Every guide to giving an AI agent memory starts the same way. Pick an embedding model, stand up a vector database, chunk your data, tune your retrieval. I assumed I would do exactly that. Then, before I wrote a single line of it, I looked hard at the problem I was actually trying to solve, and I could not find the part that needed any of it.

This is the reasoning that led me to build a memory API for agents with no vector database, no embeddings, and no model in the loop. I am also going to tell you exactly when that decision is wrong, because it often is, and you should know which side of the line you are on before you copy anyone's architecture, including mine.

The question nobody was asking

Here is what agents actually kept failing at. Not "find me something similar in meaning to this." They failed at "remember the specific thing I was told last session."

Take a coding agent, the case most people reading this have felt. The agent that helped you on Monday has no idea on Tuesday that you already decided to use Postgres, that the deploy step runs through a specific script, that the client's name is spelled a particular way. Every session starts from zero. You re-explain the same context every morning.

That is not a search problem. There is nothing fuzzy about it. You know the exact thing you want back and you know what to call it. It is a key, a value, and the ability to read it later, unchanged.

Vector search answers a different question: "what are the things most similar in meaning to this query." That is a genuinely hard and genuinely useful capability. It is also not what "remember that we chose Postgres for this project" needs. That needs store under a name, get it back by that name, later, reliably.

Two kinds of memory that get blurred together

Once I separated them, the whole design fell out.

There is semantic memory: retrieval over a large body of text where you do not know the exact item you want, only roughly what it is about. "What did the design doc say about rate limits?" You want the relevant passage even if you cannot name it. This is what embeddings and vector databases are for, and they are very good at it.

Then there is named memory: facts and state you can point at. "The user prefers metric units." "This project deploys on Fridays." "The last invoice number was 1043." You are not searching by meaning. You stored something specific and you want it back by name.

A lot of what people call "agent memory" is the second kind wearing the first kind's clothes. The reflex is to reach for semantic search because that is what the tutorials show, when the actual need is a reliable place to put named facts and read them back.

When a vector database is the right call

This is the important part, and it is the reason I can write the rest of this honestly.

If your problem is retrieval over an unstructured corpus, you want embeddings. If an agent needs to answer questions about a hundred documents it has never been told how to index, if recall has to be fuzzy, if "find me the relevant thing" is the whole job, then a vector database is the correct tool and a key-value store is the wrong one. Do not let a simplicity pitch talk you out of the right architecture. If that is your shape, close this tab and go set up your embeddings. You will be happier.

The mistake is not using vector search. The mistake is defaulting to it for a problem that was never semantic to begin with.

What you get back by not adding one

Deciding my problem was named memory, not semantic memory, took an entire category of machinery off the table. No embedding step on every write. No index to tune. No re-embedding when you change models. No debugging why a retrieval that should have been obvious ranked third. No second piece of infrastructure to run and pay for.

What is left is boring in the best way. You name a key, you store a value, you read it back exactly as you left it. It has a TTL if you want facts to expire on their own. You can list what an agent knows and do a literal text search over it. That is close to the entire surface area.

And there is a point about trust hiding in that simplicity. This is memory. It is the thing your agent believes about the world. I would rather that be a store I can inspect completely and reason about with certainty than a similarity ranking I have to interrogate. For this specific job, the boring inspectable version is not a compromise. It is a feature.

The turn: where this beats your own Postgres table

The honest objection at this point is: fine, if it is just keys and values, why not a table in the Postgres I already run? For a single agent, you should. That is not a business, that is a CREATE TABLE.

It changes the moment you have more than one agent that need to read each other's writes. A planner hands off to an executor. A research agent leaves findings for a writer agent. Now you are not storing memory, you are sharing state across processes, and you are suddenly building the boring hard parts yourself: namespacing so agents do not clobber each other, scoping so the right agents see the right memory, a permission model, key management. That is the part worth not writing again. Shared memory across agents, handed to you, is the actual reason to reach for a service here rather than a column in a table you already have.

Durability and TTLs are commodity. Shared state with the access model already solved is not.

What it deliberately is not, and what is next

It does not do semantic search, and it will not pretend to. It does not read your documents and decide what matters. You direct what gets remembered. If you want meaning-based retrieval over a corpus, this is the wrong layer and I will tell you so.

The one thing I know it is missing, and I only know because someone pushed me on it publicly, is a lifecycle state on memories: a way to mark a fact as retired rather than deleted, with a link to what replaced it, so an agent can tell which memories are still load-bearing and which are just history. That is a temporal problem, not a semantic one, which is exactly why it belongs in a store like this without dragging in embeddings. It is the next real thing I am building.

If your agent memory need is genuinely semantic, use a vector database. If it is named facts and shared state, you may have been reaching for far more machinery than the problem asked for. That was the whole realization, and I built the tool I wished I had found instead of the one every tutorial pointed me at.

I would rather be told where this reasoning breaks than agreed with, so if you see the hole, say so.

How I built AgentRAM: a memory API for AI agents without a vector DB

Sean Markwei — Thu, 28 May 2026 18:18:36 +0000

I'm a solo developer in Accra, Ghana, and I just shipped my first real product. It's called AgentRAM (agentram.dev), and it's a memory API for AI agents. This is the build story and the stack.

The problem I kept seeing

Over the last year, AI agents have gone from research toys to actual things people ship. But every agent that needs to remember anything across sessions runs into the same wall: where does the memory go?

The existing answers all felt heavy for what they were doing:

Mem0, Zep, Letta want you to set up embedding pipelines and vector databases. Powerful for RAG-style semantic search, but overkill if you just need "remember that user X likes dark mode."
OpenAI's Assistants API memory is locked to their platform and billed per-token, which means costs are unpredictable as conversation length grows.
Rolling your own with Postgres or Redis works, but it's a real chunk of infrastructure to maintain for each agent project, including auth, multi-tenancy, TTLs, and an HTTP layer.

I wanted something that handled the 70% case ("remember this fact about this agent") without the 100% solution's setup cost. So I built it.

What AgentRAM actually does

One HTTP call to store. One to retrieve. Scoped by agent ID, with optional TTLs and shared namespaces.

# Store a memory
curl -X POST https://api.agentram.dev/memory \
  -H "x-api-key: YOUR_KEY" \
  -d '{"agent_id":"my-agent","key":"user_pref","value":"dark mode"}'

# Retrieve it
curl "https://api.agentram.dev/memory?agent_id=my-agent&key=user_pref"
# {"value":"dark mode"}

That's the whole interaction model. No embeddings, no vector similarity, no semantic chunking, no token accounting. Just durable key-value memory scoped per agent.

Other endpoints fill in the practical needs: list all memories for an agent, full-text search across them, shared namespaces so multiple agents can read from a common pool, and atomic credit-based usage tracking so cost is predictable.

Why not just Redis or Postgres?

This is the question I keep wrestling with, so let me be honest about it.

If you're already running infrastructure for your product, you should absolutely just add a memory table to your existing database. AgentRAM isn't for you.

But for everyone else, and there are a lot of "everyone else" right now in the vibe-coding and agent-prototyping era, AgentRAM removes some real friction:

No new infra to provision
No auth layer to write for your agents
No HTTP wrapper to build around your DB
No multi-tenant logic to get right
Built-in TTLs, search, and shared namespaces
Predictable per-operation pricing (no per-token surprises)

It's a Twilio-style argument: yes, you could roll your own SMS gateway, but for most people, paying per-message is cheaper than the time cost.

The stack

Nothing exotic, all standard boring tech:

API: Node.js with Express, deployed on Railway. Auto-deploys from GitHub.
Database: Supabase (Postgres), with atomic credit-update logic so concurrent payments and reads don't race.
Payments: Paystack. I'm in Ghana, Stripe doesn't operate here yet, but Stripe acquired Paystack in 2020. Paystack handles cards globally plus mobile money and Apple Pay, so coverage is actually broader than Stripe-alone for some users.
Email: Resend, with Cloudflare Email Routing for inbound on hello@agentram.dev.
DNS: Cloudflare, with WAF and rate limiting.
Frontend: Static HTML on Netlify, with Cabinet Grotesk self-hosted. No framework, no build step, just hand-written HTML and CSS.

The whole thing is six HTML pages, one Node server, one Supabase project. It's not a lot. That's intentional.

The pricing model: credits, not subscriptions

After agonising over this, I went with credit-based pricing instead of a monthly subscription.

1000 free credits on signup, no card required
1 credit per operation (read, write, delete, search all count as one)
Top-ups: $5 for 50,000 ops, $15 for 200,000 ops, $40 for 600,000 ops
Founding member tier: $249 one-time for 500,000 ops plus 20% off all future top-ups, for as long as the account is active

Why credits over subscriptions:

Aligns with how AI agent usage actually varies (bursty, unpredictable)
No "wasted" subscription months for users who weren't building that month
No churn anxiety on my side
The unit price is easy to reason about: 1¢ per 100 operations at the Starter tier

The downside is it's slightly weirder to project revenue against. But I'd rather have a model my users actually feel good about.

The "did it actually work" moment

I shipped this today. The full deployment took:

Pushing the API to Railway
Pointing api.agentram.dev at Railway via Cloudflare CNAME
Deploying the static site to Netlify
Pointing agentram.dev at Netlify via Cloudflare CNAME
Verifying Let's Encrypt SSL provisioned on both domains

Then I made my first real $5 test charge to my own account. Watched the credit count tick from 1000 to 51,000. Confirmation email landed. The whole pipeline worked end to end.

That's the moment that justifies all the work that came before.

What I'm building next

The build is done. Now the harder thing: distribution. Things I'm planning over the next few weeks:

MCP server wrapper. Anthropic's Model Context Protocol is becoming the standard for how AI tools discover and use external services. An MCP server for AgentRAM means Claude Desktop and Cline users can add persistent memory with one config line.
LangChain memory backend. Implement BaseMemory so AgentRAM works as a drop-in memory layer for any LangChain agent.
LlamaIndex memory module and AutoGen / CrewAI integrations for the same reason.
Official SDKs for Python and TypeScript so the curl examples become idiomatic library calls.

What I'd love feedback on

Genuinely, not as a marketing-friendly closer. If you've built agents that need memory, I want to know:

Does the API surface feel complete enough, or is something missing?
What would make you pick this over rolling your own with Postgres?
Which integration would actually move the needle for you?
What would make you suspicious of this as a solo dev's project?

agentram.dev if you want to poke at it. 1000 free credits if you want to try the API. Comments welcome here, or email me directly at hello@agentram.dev.