A Contrarian View on the Most Hyped Technology in AI Infrastructure
"The best solution to a problem is often realizing you don't have the problem."
The Uncomfortable Question
Before you spin up another Pinecone instance, embed another million documents, and debug another "why didn't it retrieve the right chunk?" issue, ask yourself:
Do my data actually need semantic search, or am I using RAG because everyone else is?
After building a multi-agent AI system for narrative generation, we discovered something counterintuitive: replacing our RAG pipeline with PostgreSQL didn't just simplify our architecture—it made the output better.
This article explains why, and when you should consider the same.
A Brief History: Why RAG Exists
RAG (Retrieval-Augmented Generation) was born from necessity. In 2022-2023, context windows were tiny:
GPT-3.5: 4,096 tokens (~3,000 words)
GPT-4: 8,192 tokens (~6,000 words)
Claude 1: 9,000 tokens (~7,000 words)
If your knowledge base was larger than ~5,000 words, you had a problem. You couldn't fit it in context. RAG was the elegant solution:
- Chunk your documents into small pieces
- Embed each chunk into a vector
- Store vectors in a specialized database
- Retrieve the top-k most "similar" chunks to the query
- Inject those chunks into the prompt
- Generate a response based on the retrieved context
Brilliant. A necessity became a pattern, the pattern became a best practice, and the best practice became... an assumption.
The World Changed, The Assumption Didn't
Fast forward to December 2025:
GPT-5.2: 400,000 tokens (~300,000 words)
Claude Opus 4.5: 200,000 tokens (~150,000 words)
Claude Sonnet 4.5: 1,000,000 tokens (beta, tier 4+)
Gemini 3 Pro: 1,000,000 tokens (~750,000 words)
That's not a typo. One million tokens. You can fit entire codebases, complete documentation sets, or full novels in a single context window.
Yet we're still chunking, embedding, and retrieving as if it's 2022.
The Hidden Costs of RAG
RAG isn't free. It comes with costs that are rarely discussed in the "RAG is amazing!" tutorials.
1. Chunking Destroys Context
ORIGINAL DOCUMENT:
"The patient presented with symptoms consistent with Type 2 diabetes,
including elevated blood glucose levels (HbA1c: 8.2%). However, given
the patient's age (12) and rapid onset, we conducted additional testing
which revealed GAD65 antibodies, confirming Type 1 diabetes instead."
AFTER CHUNKING (500 token chunks):
Chunk 1: "The patient presented with symptoms consistent with Type 2
diabetes, including elevated blood glucose levels (HbA1c: 8.2%)."
Chunk 2: "However, given the patient's age (12) and rapid onset, we
conducted additional testing which revealed GAD65 antibodies,
confirming Type 1 diabetes instead."
Query: "What type of diabetes does the patient have?"
If your retrieval returns only Chunk 1, your LLM confidently answers "Type 2 diabetes."
Wrong.
The correction was in Chunk 2, which may not have been "semantically similar" enough to the query to make the top-k cut.
2. Embedding Loses Nuance
Embeddings compress meaning into ~1,500 floating point numbers. They're remarkably good at capturing gist, but they lose:
- Negation: "This is good" and "This is not good" often have similar embeddings
- Specificity: "Apple Inc." and "apple fruit" may cluster together
- Relationships: "A loves B" vs "B loves A" look identical to embeddings
- Recency: A 2020 policy and a 2024 update might have similar embeddings
3. Top-K is Arbitrary
Why retrieve the top 5 chunks? Why not 3? Or 10? Or 50?
The answer is usually: "because that's what the tutorial did."
Top-k retrieval means:
- You might miss the 6th most relevant chunk (which was the one you needed)
- You definitely include the 5th chunk (even if it's barely relevant)
- There's no way to know if you got the right ones
4. "Lost in the Middle"
Research shows that LLMs pay less attention to information in the middle of long contexts. If your most relevant chunk ends up sandwiched between less relevant ones, the model might ignore it.
5. The Verification Problem
Here's the question nobody wants to answer:
If I still need to verify the LLM's response (because RAG retrieval might have missed something), what exactly did I gain?
For low-stakes applications, approximate answers are fine. For legal, medical, financial, or any domain requiring accuracy, you're back to manual verification anyway.
The Alternative: Structured State Management
What if, instead of treating your knowledge as "documents to search," you treated it as "state to manage"?
This is the key insight: For most applications, you don't need semantic search. You need structured queries.
The PostgreSQL Approach
-- Instead of: "Find chunks about customer complaints from Q3"
-- You write:
SELECT
c.complaint_id,
c.description,
c.resolution,
cu.name as customer_name,
p.name as product_name
FROM complaints c
JOIN customers cu ON c.customer_id = cu.id
JOIN products p ON c.product_id = p.id
WHERE c.created_at BETWEEN '2024-07-01' AND '2024-09-30'
AND c.status = 'resolved'
ORDER BY c.severity DESC
LIMIT 20;
This query is:
- Deterministic: Same query, same results
- Debuggable: EXPLAIN shows exactly what happened
- Complete: You get ALL matching records, not "top-k similar"
- Relational: You can join customers, products, resolutions in one query
- Fast: Milliseconds, not seconds
Full Context Injection
Now here's the key: once you have your structured data, you don't "retrieve" it into a tiny context. You dump it entirely into the prompt.
def build_context(scene_requirements):
"""Build complete context for this scene."""
context = db.query("""
SELECT json_build_object(
'characters', (
SELECT json_agg(c.*)
FROM characters c
WHERE c.name = ANY(%s)
),
'location', (
SELECT row_to_json(l.*)
FROM locations l
WHERE l.name = %s
),
'recent_events', (
SELECT json_agg(e.*)
FROM events e
WHERE e.chapter >= %s - 1
),
'relationships', (
SELECT json_agg(r.*)
FROM relationships r
WHERE r.character_a = ANY(%s)
AND r.character_b = ANY(%s)
)
)
""", [characters, location, current_chapter, characters, characters])
return context # ~2,000-10,000 tokens, COMPLETE and COHERENT
This isn't retrieval. It's state serialization.
A Real-World Example: The Book Generator
We built a system that generates novels using multiple AI agents (more on this in a follow-up article). The "world" of a typical story includes:
STORY WORLD COMPONENTS:
- 3-5 characters: ~500 tokens each = 2,500 tokens
- 5-10 locations: ~200 tokens each = 2,000 tokens
- 20-30 events: ~100 tokens each = 3,000 tokens
- Rules and timeline: = 1,000 tokens
────────────────────────────────────────────────────────
TOTAL: ~8,500 tokens
Our context window: 200,000-1,000,000 tokens (depending on model).
We could fit the entire world 20-100 times over. Why were we using RAG?
The RAG Version (What We Started With)
class SharedRAG:
def query(self, question: str, top_k: int = 5):
# Embed the question
query_embedding = self.embed(question)
# Search vector store
results = self.vector_store.similarity_search(
query_embedding,
top_k=top_k
)
return results
Problems we encountered:
- "Tell me about the café" returned café description + random events that mentioned coffee
- Character relationships were split across chunks, often incomplete
- The same location description was retrieved repeatedly (no frequency control)
- Timeline was scrambled (similarity ≠ chronology)
The PostgreSQL Version (What We Switched To)
class CoherentMemory:
def get_scene_context(self, chapter: int, scene: int,
characters: List[str], location: str):
"""Return ONLY what's needed for this specific scene."""
return {
'location': self._get_location_details(location, already_narrated=True),
'characters': self._get_characters(characters),
'relationships': self._get_relationships_between(characters),
'known_events': self._get_what_characters_know(characters),
'recent_events': self._get_events_since(chapter - 1),
}
Results:
- Location details are complete and not duplicated
- Only relationships between present characters are included
- Events are filtered by what each character actually knows
- Natural frequency control through
already_narratedflags
The Schema
-- 16 tables for complete narrative state management
-- Characters and their attributes
CREATE TABLE characters (
id SERIAL PRIMARY KEY,
name VARCHAR(100) UNIQUE,
full_name VARCHAR(200),
age INT,
occupation VARCHAR(100),
background TEXT,
cognitive_style VARCHAR(50),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE character_traits (
character_id INT REFERENCES characters(id),
trait VARCHAR(100),
PRIMARY KEY (character_id, trait)
);
CREATE TABLE character_secrets (
id SERIAL PRIMARY KEY,
character_id INT REFERENCES characters(id),
secret TEXT,
known_by INT[] DEFAULT '{}' -- Array of character IDs who know
);
-- Relationships (bidirectional)
CREATE TABLE character_relationships (
character_a INT REFERENCES characters(id),
character_b INT REFERENCES characters(id),
relationship_type VARCHAR(50),
description TEXT,
PRIMARY KEY (character_a, character_b)
);
-- Locations with usage tracking (frequency control!)
CREATE TABLE locations (
id SERIAL PRIMARY KEY,
name VARCHAR(100) UNIQUE,
description TEXT,
atmosphere TEXT,
usage_count INT DEFAULT 0,
max_usage INT DEFAULT 4
);
CREATE TABLE location_details (
id SERIAL PRIMARY KEY,
location_id INT REFERENCES locations(id),
detail TEXT,
narrated BOOLEAN DEFAULT FALSE -- Has this been used in the story?
);
-- Events with character knowledge tracking
CREATE TABLE events (
id SERIAL PRIMARY KEY,
chapter INT,
scene INT,
description TEXT,
location_id INT REFERENCES locations(id),
resolved BOOLEAN DEFAULT FALSE
);
CREATE TABLE event_participants (
event_id INT REFERENCES events(id),
character_id INT REFERENCES characters(id),
role VARCHAR(50), -- 'witness', 'actor', 'mentioned'
PRIMARY KEY (event_id, character_id)
);
-- Who knows what? Automatic knowledge tracking
CREATE TABLE character_knowledge (
character_id INT REFERENCES characters(id),
event_id INT REFERENCES events(id),
learned_in_chapter INT,
PRIMARY KEY (character_id, event_id)
);
-- Style tracking (integrated repetition control)
CREATE TABLE style_patterns (
id SERIAL PRIMARY KEY,
pattern_type VARCHAR(50),
pattern_text TEXT,
chapter INT,
usage_count INT DEFAULT 0,
max_per_chapter INT DEFAULT 2
);
Key Queries
Get scene context (coherent segmentation):
-- Everything needed for Chapter 2, Scene 3, with Alice and Bob in the Library
SELECT json_build_object(
'location', (
SELECT json_build_object(
'name', l.name,
'description', l.description,
'atmosphere', l.atmosphere,
'unused_details', (
SELECT json_agg(ld.detail)
FROM location_details ld
WHERE ld.location_id = l.id AND ld.narrated = FALSE
)
)
FROM locations l WHERE l.name = 'Library'
),
'characters', (
SELECT json_agg(json_build_object(
'name', c.name,
'occupation', c.occupation,
'traits', (SELECT array_agg(trait) FROM character_traits WHERE character_id = c.id),
'cognitive_style', c.cognitive_style
))
FROM characters c WHERE c.name IN ('Alice', 'Bob')
),
'relationship', (
SELECT json_build_object(
'type', cr.relationship_type,
'description', cr.description
)
FROM character_relationships cr
WHERE (cr.character_a = 1 AND cr.character_b = 2)
OR (cr.character_a = 2 AND cr.character_b = 1)
),
'what_they_know', (
SELECT json_agg(DISTINCT e.description)
FROM events e
JOIN character_knowledge ck ON e.id = ck.event_id
WHERE ck.character_id IN (1, 2)
)
) AS scene_context;
Check for overused patterns:
-- Before generating, check what patterns to avoid
SELECT pattern_text
FROM style_patterns
WHERE chapter = 2
AND usage_count >= max_per_chapter;
-- Returns: ["shoulders slumped", "heart raced"]
-- These get injected as "AVOID:" instructions
Update after scene generation:
-- Mark location details as narrated
UPDATE location_details
SET narrated = TRUE
WHERE location_id = 5 AND detail LIKE '%dusty shelves%';
-- Record that Alice now knows about the secret
INSERT INTO character_knowledge (character_id, event_id, learned_in_chapter)
VALUES (1, 42, 2);
-- Increment location usage
UPDATE locations
SET usage_count = usage_count + 1
WHERE name = 'Library';
The Comparison
| Aspect | RAG | PostgreSQL |
|---|---|---|
| Setup complexity | High (embeddings, vector DB) | Medium (schema design) |
| Query type | Semantic similarity | Exact relational |
| Results | Top-k approximate | Complete exact |
| Debugging | Hard ("why this chunk?") | Easy (EXPLAIN) |
| Relationships | Manual in text | Native JOINs |
| Frequency control | Add-on | Native (usage_count) |
| Timeline | Lost in embeddings | ORDER BY timestamp |
| Knowledge tracking | Manual | Automatic with FKs |
| Cost | Vector DB + embeddings | Standard PostgreSQL |
| Best for | Unstructured exploration | Structured applications |
When RAG Still Makes Sense
I'm not saying RAG is useless. It has legitimate use cases:
✅ Use RAG When:
Data is truly unstructured — Chat logs, free-form notes, scraped web pages with no consistent format
Volume exceeds context — You have 10GB of documents and even the best chunking gives you 50MB of potentially relevant content
Discovery is the goal — "Find me something related to X" where you don't know what you're looking for
Approximate is acceptable — Recommendations, inspiration, brainstorming assistance
Data changes constantly — News feeds, social media, where re-indexing is expensive
❌ Skip RAG When:
Data has natural structure — Products, customers, transactions, policies, documentation
Total data fits in context — Most knowledge bases under 100k tokens
Precision matters — Legal, medical, financial, compliance
Relationships are important — "Which customers bought X and also complained about Y?"
You need auditability — "Why did the AI say this?" requires deterministic retrieval
The Hybrid Approach
You don't have to choose. Many systems benefit from both:
class HybridMemory:
def __init__(self):
self.postgres = PostgresMemory() # Structured state
self.rag = RAGMemory() # Unstructured fallback
def get_context(self, query, scene_requirements):
# First: get structured context (deterministic)
structured = self.postgres.get_scene_context(**scene_requirements)
# Second: if there's room, add relevant unstructured content
remaining_tokens = MAX_CONTEXT - count_tokens(structured)
if remaining_tokens > 1000:
unstructured = self.rag.search(query, max_tokens=remaining_tokens)
return merge(structured, unstructured)
return structured
The key insight: Structured first, RAG for gaps.
Implementation Checklist
If you're considering PostgreSQL over RAG, here's a practical checklist:
1. Analyze Your Data
- [ ] Can it be structured into entities and relationships?
- [ ] Does it have natural categories, types, or hierarchies?
- [ ] Are there temporal aspects (timeline, versions, updates)?
- [ ] Do you need to track "who knows what"?
2. Estimate Token Budget
- [ ] Total structured data: _____ tokens
- [ ] Available context window: _____ tokens
- [ ] Ratio: If data < 50% of context, skip RAG
3. Design Schema
- [ ] Core entities identified
- [ ] Relationships mapped
- [ ] Usage tracking added (for frequency control)
- [ ] Knowledge tracking added (for multi-agent scenarios)
4. Build Context Serializer
- [ ] Function to dump relevant state as JSON/text
- [ ] Filters for scene/query relevance
- [ ] Token counting to stay within limits
5. Test Determinism
- [ ] Same query → same results?
- [ ] Can you explain why each piece of context was included?
- [ ] Edge cases handled (empty results, too many results)?
Conclusion: Think Before You Embed
RAG was a brilliant solution to the context length problem of 2022. In 2025, with 200k+ token context windows, it's often a solution in search of a problem.
Before building another RAG pipeline, ask:
- Does my data fit in context? If yes, just include it all.
- Is my data structured? If yes, use SQL queries, not semantic search.
- Do I need precision or exploration? Precision → PostgreSQL. Exploration → RAG.
- Am I using RAG because it's right, or because it's trendy?
The best architecture is the one that solves your actual problem with minimal complexity. Sometimes that's a sophisticated RAG pipeline with reranking and hybrid search. Sometimes it's a PostgreSQL database with good indexes and a JSON serializer.
Know the difference.
What's Next
In the next article, I'll show how we applied this PostgreSQL-based approach to build a multi-agent book generation system where AI characters have their own consciousness, memories, and voices. The coherent memory system described here is what allows characters to know things, remember events, and maintain consistent relationships across an entire novel.
Spoiler: When we switched from RAG to PostgreSQL, our characters stopped repeating themselves and started having coherent conversations.
Stay tuned.
If you found this useful, the code is open source: [GitHub link]
Appendix: Quick Reference
PostgreSQL Extensions for AI Workloads
-- Full-text search (built-in)
CREATE INDEX idx_fts ON documents
USING GIN(to_tsvector('english', content));
-- Vector similarity (pgvector extension)
CREATE EXTENSION vector;
CREATE INDEX idx_embedding ON documents
USING ivfflat (embedding vector_cosine_ops);
-- JSON querying (built-in)
SELECT data->>'name' FROM entities
WHERE data @> '{"type": "character"}';
When to Use What
┌─────────────────────────────────────────────────────────────┐
│ DECISION FLOWCHART │
├─────────────────────────────────────────────────────────────┤
│ │
│ Does your data have structure? │
│ │ │
│ ┌─────┴─────┐ │
│ YES NO │
│ │ │ │
│ ▼ ▼ │
│ PostgreSQL Can you add structure? │
│ ▲ │ │
│ │ ┌─────┴─────┐ │
│ │ YES NO │
│ │ │ │ │
│ └──────┘ ▼ │
│ Does it fit in context? │
│ │ │
│ ┌─────┴─────┐ │
│ YES NO │
│ │ │ │
│ ▼ ▼ │
│ Full dump RAG │
│ (no search) (with all its tradeoffs) │
│ │
└─────────────────────────────────────────────────────────────┘
Cost Comparison (Rough Estimates)
| Solution | Monthly Cost | Setup Time | Maintenance |
|---|---|---|---|
| Pinecone + OpenAI Embeddings | $70-500 | 1-2 weeks | Ongoing |
| Weaviate Cloud | $50-300 | 1-2 weeks | Ongoing |
| PostgreSQL on RDS | $20-100 | 2-3 days | Minimal |
| PostgreSQL + pgvector | $20-100 | 3-5 days | Minimal |
| SQLite (local) | $0 | 1 day | None |
Top comments (0)