Dexmac

Posted on Jan 1

The RAG Illusion: Why PostgreSQL Beats Vector Search for Most AI Applications

#ai #rag #postgres

A Contrarian View on the Most Hyped Technology in AI Infrastructure

"The best solution to a problem is often realizing you don't have the problem."

The Uncomfortable Question

Before you spin up another Pinecone instance, embed another million documents, and debug another "why didn't it retrieve the right chunk?" issue, ask yourself:

Do my data actually need semantic search, or am I using RAG because everyone else is?

After building a multi-agent AI system for narrative generation, we discovered something counterintuitive: replacing our RAG pipeline with PostgreSQL didn't just simplify our architecture—it made the output better.

This article explains why, and when you should consider the same.

A Brief History: Why RAG Exists

RAG (Retrieval-Augmented Generation) was born from necessity. In 2022-2023, context windows were tiny:

GPT-3.5:     4,096 tokens  (~3,000 words)
GPT-4:       8,192 tokens  (~6,000 words)
Claude 1:    9,000 tokens  (~7,000 words)

If your knowledge base was larger than ~5,000 words, you had a problem. You couldn't fit it in context. RAG was the elegant solution:

Chunk your documents into small pieces
Embed each chunk into a vector
Store vectors in a specialized database
Retrieve the top-k most "similar" chunks to the query
Inject those chunks into the prompt
Generate a response based on the retrieved context

Brilliant. A necessity became a pattern, the pattern became a best practice, and the best practice became... an assumption.

The World Changed, The Assumption Didn't

Fast forward to December 2025:

GPT-5.2:           400,000 tokens (~300,000 words)
Claude Opus 4.5:   200,000 tokens (~150,000 words)
Claude Sonnet 4.5: 1,000,000 tokens (beta, tier 4+)
Gemini 3 Pro:      1,000,000 tokens (~750,000 words)

That's not a typo. One million tokens. You can fit entire codebases, complete documentation sets, or full novels in a single context window.

Yet we're still chunking, embedding, and retrieving as if it's 2022.

The Hidden Costs of RAG

RAG isn't free. It comes with costs that are rarely discussed in the "RAG is amazing!" tutorials.

1. Chunking Destroys Context

ORIGINAL DOCUMENT:
"The patient presented with symptoms consistent with Type 2 diabetes, 
including elevated blood glucose levels (HbA1c: 8.2%). However, given 
the patient's age (12) and rapid onset, we conducted additional testing 
which revealed GAD65 antibodies, confirming Type 1 diabetes instead."

AFTER CHUNKING (500 token chunks):
Chunk 1: "The patient presented with symptoms consistent with Type 2 
          diabetes, including elevated blood glucose levels (HbA1c: 8.2%)."

Chunk 2: "However, given the patient's age (12) and rapid onset, we 
          conducted additional testing which revealed GAD65 antibodies, 
          confirming Type 1 diabetes instead."

Query: "What type of diabetes does the patient have?"

If your retrieval returns only Chunk 1, your LLM confidently answers "Type 2 diabetes."

Wrong.

The correction was in Chunk 2, which may not have been "semantically similar" enough to the query to make the top-k cut.

2. Embedding Loses Nuance

Embeddings compress meaning into ~1,500 floating point numbers. They're remarkably good at capturing gist, but they lose:

Negation: "This is good" and "This is not good" often have similar embeddings
Specificity: "Apple Inc." and "apple fruit" may cluster together
Relationships: "A loves B" vs "B loves A" look identical to embeddings
Recency: A 2020 policy and a 2024 update might have similar embeddings

3. Top-K is Arbitrary

Why retrieve the top 5 chunks? Why not 3? Or 10? Or 50?

The answer is usually: "because that's what the tutorial did."

Top-k retrieval means:

You might miss the 6th most relevant chunk (which was the one you needed)
You definitely include the 5th chunk (even if it's barely relevant)
There's no way to know if you got the right ones

4. "Lost in the Middle"

Research shows that LLMs pay less attention to information in the middle of long contexts. If your most relevant chunk ends up sandwiched between less relevant ones, the model might ignore it.

5. The Verification Problem

Here's the question nobody wants to answer:

If I still need to verify the LLM's response (because RAG retrieval might have missed something), what exactly did I gain?

For low-stakes applications, approximate answers are fine. For legal, medical, financial, or any domain requiring accuracy, you're back to manual verification anyway.

The Alternative: Structured State Management

What if, instead of treating your knowledge as "documents to search," you treated it as "state to manage"?

This is the key insight: For most applications, you don't need semantic search. You need structured queries.

The PostgreSQL Approach

-- Instead of: "Find chunks about customer complaints from Q3"
-- You write: 

SELECT 
    c.complaint_id,
    c.description,
    c.resolution,
    cu.name as customer_name,
    p.name as product_name
FROM complaints c
JOIN customers cu ON c.customer_id = cu.id
JOIN products p ON c.product_id = p.id
WHERE c.created_at BETWEEN '2024-07-01' AND '2024-09-30'
  AND c.status = 'resolved'
ORDER BY c.severity DESC
LIMIT 20;

This query is:

Deterministic: Same query, same results
Debuggable: EXPLAIN shows exactly what happened
Complete: You get ALL matching records, not "top-k similar"
Relational: You can join customers, products, resolutions in one query
Fast: Milliseconds, not seconds

Full Context Injection

Now here's the key: once you have your structured data, you don't "retrieve" it into a tiny context. You dump it entirely into the prompt.

def build_context(scene_requirements):
    """Build complete context for this scene."""

    context = db.query("""
        SELECT json_build_object(
            'characters', (
                SELECT json_agg(c.*) 
                FROM characters c 
                WHERE c.name = ANY(%s)
            ),
            'location', (
                SELECT row_to_json(l.*) 
                FROM locations l 
                WHERE l.name = %s
            ),
            'recent_events', (
                SELECT json_agg(e.*) 
                FROM events e 
                WHERE e.chapter >= %s - 1
            ),
            'relationships', (
                SELECT json_agg(r.*) 
                FROM relationships r 
                WHERE r.character_a = ANY(%s) 
                  AND r.character_b = ANY(%s)
            )
        )
    """, [characters, location, current_chapter, characters, characters])

    return context  # ~2,000-10,000 tokens, COMPLETE and COHERENT

This isn't retrieval. It's state serialization.

A Real-World Example: The Book Generator

We built a system that generates novels using multiple AI agents (more on this in a follow-up article). The "world" of a typical story includes:

STORY WORLD COMPONENTS:
- 3-5 characters:      ~500 tokens each  = 2,500 tokens
- 5-10 locations:      ~200 tokens each  = 2,000 tokens  
- 20-30 events:        ~100 tokens each  = 3,000 tokens
- Rules and timeline:                    = 1,000 tokens
────────────────────────────────────────────────────────
TOTAL:                                   ~8,500 tokens

Our context window: 200,000-1,000,000 tokens (depending on model).

We could fit the entire world 20-100 times over. Why were we using RAG?

The RAG Version (What We Started With)

class SharedRAG:
    def query(self, question: str, top_k: int = 5):
        # Embed the question
        query_embedding = self.embed(question)

        # Search vector store
        results = self.vector_store.similarity_search(
            query_embedding, 
            top_k=top_k
        )

        return results

Problems we encountered:

"Tell me about the café" returned café description + random events that mentioned coffee
Character relationships were split across chunks, often incomplete
The same location description was retrieved repeatedly (no frequency control)
Timeline was scrambled (similarity ≠ chronology)

The PostgreSQL Version (What We Switched To)

class CoherentMemory:
    def get_scene_context(self, chapter: int, scene: int, 
                          characters: List[str], location: str):
        """Return ONLY what's needed for this specific scene."""

        return {
            'location': self._get_location_details(location, already_narrated=True),
            'characters': self._get_characters(characters),
            'relationships': self._get_relationships_between(characters),
            'known_events': self._get_what_characters_know(characters),
            'recent_events': self._get_events_since(chapter - 1),
        }

Results:

Location details are complete and not duplicated
Only relationships between present characters are included
Events are filtered by what each character actually knows
Natural frequency control through already_narrated flags

The Schema

-- 16 tables for complete narrative state management

-- Characters and their attributes
CREATE TABLE characters (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) UNIQUE,
    full_name VARCHAR(200),
    age INT,
    occupation VARCHAR(100),
    background TEXT,
    cognitive_style VARCHAR(50),
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE character_traits (
    character_id INT REFERENCES characters(id),
    trait VARCHAR(100),
    PRIMARY KEY (character_id, trait)
);

CREATE TABLE character_secrets (
    id SERIAL PRIMARY KEY,
    character_id INT REFERENCES characters(id),
    secret TEXT,
    known_by INT[] DEFAULT '{}'  -- Array of character IDs who know
);

-- Relationships (bidirectional)
CREATE TABLE character_relationships (
    character_a INT REFERENCES characters(id),
    character_b INT REFERENCES characters(id),
    relationship_type VARCHAR(50),
    description TEXT,
    PRIMARY KEY (character_a, character_b)
);

-- Locations with usage tracking (frequency control!)
CREATE TABLE locations (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) UNIQUE,
    description TEXT,
    atmosphere TEXT,
    usage_count INT DEFAULT 0,
    max_usage INT DEFAULT 4
);

CREATE TABLE location_details (
    id SERIAL PRIMARY KEY,
    location_id INT REFERENCES locations(id),
    detail TEXT,
    narrated BOOLEAN DEFAULT FALSE  -- Has this been used in the story?
);

-- Events with character knowledge tracking
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    chapter INT,
    scene INT,
    description TEXT,
    location_id INT REFERENCES locations(id),
    resolved BOOLEAN DEFAULT FALSE
);

CREATE TABLE event_participants (
    event_id INT REFERENCES events(id),
    character_id INT REFERENCES characters(id),
    role VARCHAR(50),  -- 'witness', 'actor', 'mentioned'
    PRIMARY KEY (event_id, character_id)
);

-- Who knows what? Automatic knowledge tracking
CREATE TABLE character_knowledge (
    character_id INT REFERENCES characters(id),
    event_id INT REFERENCES events(id),
    learned_in_chapter INT,
    PRIMARY KEY (character_id, event_id)
);

-- Style tracking (integrated repetition control)
CREATE TABLE style_patterns (
    id SERIAL PRIMARY KEY,
    pattern_type VARCHAR(50),
    pattern_text TEXT,
    chapter INT,
    usage_count INT DEFAULT 0,
    max_per_chapter INT DEFAULT 2
);

Key Queries

Get scene context (coherent segmentation):

-- Everything needed for Chapter 2, Scene 3, with Alice and Bob in the Library

SELECT json_build_object(
    'location', (
        SELECT json_build_object(
            'name', l.name,
            'description', l.description,
            'atmosphere', l.atmosphere,
            'unused_details', (
                SELECT json_agg(ld.detail)
                FROM location_details ld
                WHERE ld.location_id = l.id AND ld.narrated = FALSE
            )
        )
        FROM locations l WHERE l.name = 'Library'
    ),
    'characters', (
        SELECT json_agg(json_build_object(
            'name', c.name,
            'occupation', c.occupation,
            'traits', (SELECT array_agg(trait) FROM character_traits WHERE character_id = c.id),
            'cognitive_style', c.cognitive_style
        ))
        FROM characters c WHERE c.name IN ('Alice', 'Bob')
    ),
    'relationship', (
        SELECT json_build_object(
            'type', cr.relationship_type,
            'description', cr.description
        )
        FROM character_relationships cr
        WHERE (cr.character_a = 1 AND cr.character_b = 2)
           OR (cr.character_a = 2 AND cr.character_b = 1)
    ),
    'what_they_know', (
        SELECT json_agg(DISTINCT e.description)
        FROM events e
        JOIN character_knowledge ck ON e.id = ck.event_id
        WHERE ck.character_id IN (1, 2)
    )
) AS scene_context;

Check for overused patterns:

-- Before generating, check what patterns to avoid

SELECT pattern_text 
FROM style_patterns 
WHERE chapter = 2 
  AND usage_count >= max_per_chapter;

-- Returns: ["shoulders slumped", "heart raced"]
-- These get injected as "AVOID:" instructions

Update after scene generation:

-- Mark location details as narrated
UPDATE location_details 
SET narrated = TRUE 
WHERE location_id = 5 AND detail LIKE '%dusty shelves%';

-- Record that Alice now knows about the secret
INSERT INTO character_knowledge (character_id, event_id, learned_in_chapter)
VALUES (1, 42, 2);

-- Increment location usage
UPDATE locations 
SET usage_count = usage_count + 1 
WHERE name = 'Library';

The Comparison

Aspect	RAG	PostgreSQL
Setup complexity	High (embeddings, vector DB)	Medium (schema design)
Query type	Semantic similarity	Exact relational
Results	Top-k approximate	Complete exact
Debugging	Hard ("why this chunk?")	Easy (EXPLAIN)
Relationships	Manual in text	Native JOINs
Frequency control	Add-on	Native (usage_count)
Timeline	Lost in embeddings	ORDER BY timestamp
Knowledge tracking	Manual	Automatic with FKs
Cost	Vector DB + embeddings	Standard PostgreSQL
Best for	Unstructured exploration	Structured applications

When RAG Still Makes Sense

I'm not saying RAG is useless. It has legitimate use cases:

✅ Use RAG When:

Data is truly unstructured — Chat logs, free-form notes, scraped web pages with no consistent format
Volume exceeds context — You have 10GB of documents and even the best chunking gives you 50MB of potentially relevant content
Discovery is the goal — "Find me something related to X" where you don't know what you're looking for
Approximate is acceptable — Recommendations, inspiration, brainstorming assistance
Data changes constantly — News feeds, social media, where re-indexing is expensive

❌ Skip RAG When:

Data has natural structure — Products, customers, transactions, policies, documentation
Total data fits in context — Most knowledge bases under 100k tokens
Precision matters — Legal, medical, financial, compliance
Relationships are important — "Which customers bought X and also complained about Y?"
You need auditability — "Why did the AI say this?" requires deterministic retrieval

The Hybrid Approach

You don't have to choose. Many systems benefit from both:

class HybridMemory:
    def __init__(self):
        self.postgres = PostgresMemory()  # Structured state
        self.rag = RAGMemory()            # Unstructured fallback

    def get_context(self, query, scene_requirements):
        # First: get structured context (deterministic)
        structured = self.postgres.get_scene_context(**scene_requirements)

        # Second: if there's room, add relevant unstructured content
        remaining_tokens = MAX_CONTEXT - count_tokens(structured)

        if remaining_tokens > 1000:
            unstructured = self.rag.search(query, max_tokens=remaining_tokens)
            return merge(structured, unstructured)

        return structured

The key insight: Structured first, RAG for gaps.

Implementation Checklist

If you're considering PostgreSQL over RAG, here's a practical checklist:

1. Analyze Your Data

[ ] Can it be structured into entities and relationships?
[ ] Does it have natural categories, types, or hierarchies?
[ ] Are there temporal aspects (timeline, versions, updates)?
[ ] Do you need to track "who knows what"?

2. Estimate Token Budget

[ ] Total structured data: _____ tokens
[ ] Available context window: _____ tokens
[ ] Ratio: If data < 50% of context, skip RAG

3. Design Schema

[ ] Core entities identified
[ ] Relationships mapped
[ ] Usage tracking added (for frequency control)
[ ] Knowledge tracking added (for multi-agent scenarios)

4. Build Context Serializer

[ ] Function to dump relevant state as JSON/text
[ ] Filters for scene/query relevance
[ ] Token counting to stay within limits

5. Test Determinism

[ ] Same query → same results?
[ ] Can you explain why each piece of context was included?
[ ] Edge cases handled (empty results, too many results)?

Conclusion: Think Before You Embed

RAG was a brilliant solution to the context length problem of 2022. In 2025, with 200k+ token context windows, it's often a solution in search of a problem.

Before building another RAG pipeline, ask:

Does my data fit in context? If yes, just include it all.
Is my data structured? If yes, use SQL queries, not semantic search.
Do I need precision or exploration? Precision → PostgreSQL. Exploration → RAG.
Am I using RAG because it's right, or because it's trendy?

The best architecture is the one that solves your actual problem with minimal complexity. Sometimes that's a sophisticated RAG pipeline with reranking and hybrid search. Sometimes it's a PostgreSQL database with good indexes and a JSON serializer.

Know the difference.

What's Next

In the next article, I'll show how we applied this PostgreSQL-based approach to build a multi-agent book generation system where AI characters have their own consciousness, memories, and voices. The coherent memory system described here is what allows characters to know things, remember events, and maintain consistent relationships across an entire novel.

Spoiler: When we switched from RAG to PostgreSQL, our characters stopped repeating themselves and started having coherent conversations.

Stay tuned.

If you found this useful, the code is open source: [GitHub link]

Appendix: Quick Reference

PostgreSQL Extensions for AI Workloads

-- Full-text search (built-in)
CREATE INDEX idx_fts ON documents 
USING GIN(to_tsvector('english', content));

-- Vector similarity (pgvector extension)
CREATE EXTENSION vector;
CREATE INDEX idx_embedding ON documents 
USING ivfflat (embedding vector_cosine_ops);

-- JSON querying (built-in)
SELECT data->>'name' FROM entities 
WHERE data @> '{"type": "character"}';

When to Use What

┌─────────────────────────────────────────────────────────────┐
│                    DECISION FLOWCHART                       │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Does your data have structure?                           │
│           │                                                 │
│     ┌─────┴─────┐                                          │
│    YES          NO                                         │
│     │            │                                          │
│     ▼            ▼                                          │
│  PostgreSQL   Can you add structure?                       │
│     ▲            │                                          │
│     │      ┌─────┴─────┐                                   │
│     │     YES          NO                                  │
│     │      │            │                                   │
│     └──────┘            ▼                                   │
│                    Does it fit in context?                 │
│                         │                                   │
│                   ┌─────┴─────┐                             │
│                  YES          NO                            │
│                   │            │                             │
│                   ▼            ▼                             │
│              Full dump      RAG                             │
│              (no search)   (with all its tradeoffs)        │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Cost Comparison (Rough Estimates)

Solution	Monthly Cost	Setup Time	Maintenance
Pinecone + OpenAI Embeddings	$70-500	1-2 weeks	Ongoing
Weaviate Cloud	$50-300	1-2 weeks	Ongoing
PostgreSQL on RDS	$20-100	2-3 days	Minimal
PostgreSQL + pgvector	$20-100	3-5 days	Minimal
SQLite (local)	$0	1 day	None

DEV Community