DEV Community

Cover image for Your AI agent isn’t slow. your database is.
<devtips/>
<devtips/>

Posted on

Your AI agent isn’t slow. your database is.

We gave LLMs superpowers but left our Postgres schemas in survival mode.

Intro: The quiet bottleneck nobody wants to debug

Everyone thinks their AI agent is slow because “the model is overloaded.”

It’s never the model.

It’s your database wheezing in the corner.

Over the last year, we’ve all become part-time AI architects. One minute you’re shipping a normal SaaS app, the next you’re wiring up embeddings from OpenAI, gluing things together with LangChain, maybe using Supabase because it feels nice and modern. Suddenly you’ve got a RAG pipeline and you’re calling it an “agent.”

And it works. Technically.

But then the latency creeps in. Responses take longer. Costs climb. Logs get noisy. You start blaming OpenAI. Or rate limits. Or “LLMs being unpredictable.”

Meanwhile, your Postgres schema still looks like a 2016 CRUD tutorial.

We gave LLMs superpowers but left our database in survival mode.

It’s like dropping a Ferrari engine into a car with bicycle tires. Or asking a librarian to find a book in a library where everything is stacked randomly on the floor. The brain is brilliant. The storage system is chaos.

I learned this the hard way.

The first time I deployed a RAG feature, I was convinced the embedding API was slow. I tweaked prompt sizes. Reduced tokens. Even considered switching models. Then I finally ran EXPLAIN ANALYZE.

Sequential scan.

On a table that had grown embarrassingly large.

It wasn’t OpenAI. It wasn’t pgvector. It was me. And my missing index.

If you haven’t read the Supabase post on Postgres best practices for AI agents, do it. Then read the official Postgres indexing docs:
https://www.postgresql.org/docs/current/indexes.html

And if you’re using vectors without understanding how pgvector indexing works:
https://github.com/pgvector/pgvector

Also, OpenAI’s embeddings docs are worth bookmarking before you build your third half-broken retrieval system.

TL;DR

Your AI agent doesn’t feel slow because the model is bad. It feels slow because your database wasn’t designed for thinking, searching, filtering, and remembering at scale. Schema design, indexing, and retrieval strategy matter more than the model switch you’re considering.

Your schema was not designed for thinking

Most of us learned database design in the CRUD era.

Users table.
Posts table.
Comments table.
Foreign keys.
Call it a day.

That design works beautifully when humans are clicking buttons and filtering dashboards.

It falls apart the second you introduce an AI agent.

Because agents don’t query like humans.

They don’t say,

“Give me posts from last week sorted by likes.”

They say,

“What did the user mention about payment failures across their last three sessions, and is that related to the refund policy update?”

That’s not a single SELECT with a neat WHERE clause. That’s multi-hop reasoning over structured data, semi-structured logs, and unstructured text.

And if your schema wasn’t built for that, you’ll feel it.

The CRUD schema vs the agent schema

Traditional app schema:

  • Highly normalized
  • Optimized for transactional integrity
  • Clear entity boundaries
  • Minimal duplication

Agent-ready schema:

  • Often partially denormalized
  • Embeddings stored alongside data
  • Metadata-heavy
  • Built for retrieval patterns, not just writes

You suddenly need:

  • A documents table with embedding vector
  • A chunking strategy (which, by the way, is a whole discipline on its own)
  • Metadata columns for filtering (user_id, source, created_at, tags)
  • Possibly an event log table for long-term memory

And then someone says, “Let’s just throw it all in JSONB.”

Ah yes. The universal escape hatch.

I’ve done it. You’ve done it. We all have a data JSONB column somewhere that started as a shortcut and ended as a lifestyle choice.

JSONB is powerful. The Postgres docs show how to index it properly. But if you don’t add the right GIN index, you’ve basically built a black hole. Your agent will query it. Postgres will nod politely and scan the entire thing.

You won’t notice at first. Then your table hits a few hundred thousand rows.

Then you notice.

Embeddings aren’t just “another column”

If you’re using pgvector (and if you’re on Postgres, you probably are), you’re no longer just storing rows. You’re storing high-dimensional space.

That means:

  • You need to understand vector index types (ivfflat, hnsw)
  • You need to think about distance metrics
  • You need to batch inserts properly
  • You need to consider vacuuming and maintenance

The pgvector repo explains indexing clearly:
https://github.com/pgvector/pgvector

But most people don’t read it. They paste example code from a blog post and hope for the best.

I remember building my first RAG table. I shoved embeddings into a column, skipped the index because “it’s just a prototype,” and moved on. It worked perfectly… until it didn’t.

The agent wasn’t slow because embeddings are heavy. It was slow because every query was effectively asking Postgres to walk across Skyrim instead of fast traveling.

Schema design is not sexy. It doesn’t get you upvotes on Hacker News.

But if you’re building AI agents, your schema is no longer plumbing.

It’s the nervous system.

Indexes are your real AI accelerator

Everyone on tech Twitter is arguing about models.

GPT-4 vs Claude vs open weights.
MoE vs dense.
Context window size flexing.

Meanwhile, your database is doing a sequential scan like it’s 2005.

If you’ve never run EXPLAIN ANALYZE on your agent queries, I’m going to say something slightly mean but true:

You’re guessing.

And guessing is expensive.

The lie we tell ourselves

When latency spikes, we say:

“The embedding API must be slow.”

“Maybe the LLM is overloaded.”

“Vector search is just expensive.”

Sometimes that’s true.

But most of the time?

You forgot an index.

The official Postgres docs on indexing are not thrilling bedtime reading, but they are pure power:
https://www.postgresql.org/docs/current/indexes.html

Understanding the difference between:

  • B-tree (great for equality, ranges)
  • GIN (great for JSONB, full-text)
  • Vector indexes like ivfflat in pgvector

…is the difference between “AI feels magical” and “why is this taking six seconds?”

Indexes are fast travel points

Think of indexes like fast travel in Skyrim.

Without them, your database walks from one edge of the map to the other every time you ask a question.

With them, it teleports directly to the village you care about.

For AI agents, you often need:

  • A vector index on your embedding column
  • A B-tree index on user_id
  • Possibly a composite index on (user_id, created_at)
  • Partial indexes for filtered subsets

And if you’re mixing vector search with metadata filters, things get even more interesting.

The pgvector docs explain how to create an ivfflat index properly:
https://github.com/pgvector/pgvector

But here’s the catch nobody talks about:

If you create the index but never tune the number of lists properly for your dataset size, performance still degrades.

It’s not magic. It’s math.

The most humbling optimization of my career

I once spent days tweaking prompts because our agent felt “sluggish.”

Reduced context.
Switched models.
Adjusted chunk sizes.

Nothing helped.

Then I looked at the query plan.

Sequential scan on a 500k-row embeddings table.

No index.

I added a single vector index and a B-tree on user_id.

Response time dropped dramatically.

It felt like discovering a cheat code.

And the embarrassing part? We were about to upgrade hardware. More CPU. Bigger instance. More cost.

Instead, we just… read the query plan.

Hybrid search is where people really mess up

If you’re doing proper RAG, you’re probably mixing:

  • Vector similarity
  • Metadata filtering
  • Maybe full-text search

That means you’re stacking operators in one query.

If any part of that query isn’t indexed correctly, the whole thing suffers.

There’s a Reddit thread floating around where someone complains that pgvector is “slow at scale.” You scroll down far enough and someone inevitably asks:

“Did you index it?”

Silence.

Indexes are not glamorous. They don’t get you VC funding. They don’t show up in demo videos.

But they are your real AI accelerator.

You can swap models ten times.

Or you can open EXPLAIN ANALYZE once and change everything.

Agent memory is a database problem, not a prompt problem

Whenever an agent hallucinates, the first instinct is to tweak the prompt.

“Be more factual.”
“Use only provided context.”
“Think step by step.”

We treat prompts like spellbooks.

But most hallucinations I’ve seen in production weren’t prompt failures.

They were retrieval failures.

RAG retrieval-augmented generation is not magic. It’s plumbing. You embed content, store it, retrieve the “most relevant” chunks, and feed them into the model.

If the wrong chunks come back, the model will confidently build a castle on sand.

And you’ll blame the castle.

Garbage retrieval in, garbage reasoning out

Imagine giving someone puzzle pieces from three different puzzles and asking them to assemble a single picture.

They’ll do their best.
They’ll sound confident.
But the result will be… weird.

That’s what happens when:

  • Your chunks are too large
  • Your metadata is missing
  • Your filters are weak
  • Your similarity threshold is poorly tuned

OpenAI’s embeddings docs are clear about how embeddings work conceptually:
https://platform.openai.com/docs/guides/embeddings

But they don’t design your schema for you.

That’s your job.

chunking is architecture, not formatting

When I first built a RAG pipeline, I chunked documents arbitrarily by character count.

Because that’s what every quick-start example does.

It worked fine in tests.

Then real users showed up.

The agent started referencing half-sentences. Mixing sections. Pulling outdated policy fragments because they were semantically similar but contextually wrong.

The fix wasn’t a better prompt.

It was:

  • Smaller, semantically coherent chunks
  • Metadata columns like document_version, section_type
  • Filtering by recency
  • Hybrid search (vector + keyword)

pgvector supports combining vector search with metadata filters directly in SQL. That’s powerful but only if you index those metadata columns properly.

Without that, you’re basically asking your database to search everything and then filter everything.

That scales… badly.

Memory design is where agents feel “smart”

Good agent memory feels intentional.

It prioritizes:

  • Recent interactions
  • High-signal content
  • User-specific context
  • Clean separation between users

Bad memory feels like a goldfish.

It vaguely remembers something. Not sure what. But it tries anyway.

The funny part? We’ll spend hours tuning system prompts but maybe five minutes designing the memory_entries table.

That’s backwards.

Your database decides what the model sees.

The model just reacts.

If your retrieval layer is messy, your agent’s “intelligence” is capped before it even starts generating text.

Memory is not a prompt hack.

It’s a database architecture decision.

And once you realize that, you stop arguing about wording and start redesigning tables.

Multi-tenant AI systems will expose your bad database habits

Everything works beautifully…

Until you add more users.

Your single-user AI agent is adorable. Fast. Responsive. Almost magical.

Then you launch.

Suddenly every user has:

  • Their own embeddings
  • Their own chat history
  • Their own memory store
  • Their own retrieval queries

And your database quietly starts sweating.

This is where multi-tenancy stops being a theoretical architecture concept and becomes a survival skill.

The naive pattern

Most first versions look like this:

SELECT *
FROM documents
WHERE user_id = $1
ORDER BY embedding <-> $2
LIMIT 5;

Seems fine.

But under load?

If user_id isn’t indexed properly, or if your vector index doesn’t play nicely with your filter, Postgres has to juggle both.

And juggling doesn’t scale.

Supabase’s row-level security (RLS) model is actually elegant:
https://supabase.com/docs/guides/auth/row-level-security

But RLS doesn’t magically fix poor indexing. It enforces isolation. It doesn’t optimize queries.

If every user triggers heavy vector searches at once, you’ll discover something new:

Connection pooling matters.

PgBouncer exists for a reason:
https://www.pgbouncer.org/

I once ran a beta where everything seemed stable in testing. Then a small wave of users logged in simultaneously.

Connections spiked.
Queries queued.
Response times crawled.

Nothing was “broken.” We just underestimated how many concurrent vector queries hit the same table.

It felt like turning a single-player game into an MMO overnight.

embeddings multiply faster than you think

Here’s the quiet danger:

Every new document → multiple chunks
Every chunk → one embedding
Every embedding → another row

Ten users becomes thousands of rows quickly.
Hundreds of users becomes millions.

And now your storage, indexes, and vacuum cycles matter.

If you’re not thinking about:

  • Index maintenance
  • Autovacuum tuning
  • Archiving old embeddings
  • Caching frequent queries

…you’re building a time bomb.

Cost is a database story too

We talk about AI costs like they’re purely model-driven.

But slow queries mean:

  • Longer open connections
  • Higher compute usage
  • Bigger instances
  • More scaling layers

Sometimes the cheapest optimization isn’t a smaller model.

It’s a smarter schema.

Multi-tenant AI systems don’t just test your prompts.

They expose your database habits.

And if those habits were built in the “just ship it” era, production will let you know.

The boring database just became the main character

For years, databases were background characters.

Necessary.
Important.
But not exciting.

The cool stuff was in the frontend. Then it was in cloud infra. Then it was in ML. Now it’s in AI agents.

But here’s the twist nobody expected:

The boring layer is back in charge.

Because AI agents don’t just compute. They retrieve. They filter. They remember. They isolate users. They manage context. They operate on history.

And all of that runs through your database.

Postgres, in particular, has quietly evolved into something kind of wild:

  • JSONB for semi-structured data
  • Full-text search
  • Vector search via pgvector
  • Row-level security
  • Extensions for almost everything

It’s no longer just “a relational database.”

It’s becoming an AI-native data engine.

And that’s slightly controversial, because everyone wants a new shiny “AI database.”

But most AI startup problems I’ve seen recently?

Not model problems.

Data layout problems.
Index problems.
Isolation problems.
Query planning problems.

The model is the brain.

Your database is the nervous system.

If the nervous system is miswired, the brain can’t save you.

What’s next

We’re entering a phase where:

  • Backend engineers matter again
  • Query planners are suddenly cool
  • Schema design decisions shape user experience directly
  • Infra literacy separates stable products from hype demos

And I kind of love that.

Because the real skill shift isn’t “learn prompt engineering.”

It’s:

  • Understand retrieval
  • Understand indexing
  • Understand concurrency
  • Understand isolation

The devs who win in the agent era won’t just know how to call an API.

They’ll know how to design systems that let intelligence actually flow.

And maybe that’s the quiet plot twist of this whole AI wave:

The database didn’t get replaced.

It got promoted.

Helpful resources

If you want to go deeper without relying on vibes:

Top comments (0)