Threshika Vijayakumar

Posted on Jun 10

The Day I Realized RAG Isn't an AI Problem

#ai #rag #llm #machinelearning

When I first started learning Retrieval-Augmented Generation (RAG), I thought the hardest part would be understanding Large Language Models.

I was wrong.

I thought I would spend most of my time:

Choosing the best LLM
Writing better prompts
Tweaking model parameters

Instead, I ended up spending most of my time thinking about search.

And that's when something clicked:

RAG isn't primarily an AI problem.

It's a search problem.

Let me explain.

The Mental Model Most Beginners Have

When we first interact with ChatGPT or any AI assistant, we imagine something like this:

Question
   ↓
  AI
   ↓
Answer

Simple, right?

Ask a question.

Get an answer.

But once you start building applications with your own data, this model breaks.

Fast.

My First "Wait... Why Is This Wrong?" Moment

Imagine asking an AI:

"What happened in yesterday's IPL match?"

Or:

"What's the latest version of this framework?"

Or:

"What does page 42 of this PDF say?"

The model might answer confidently.

The problem?

It may not actually know.

Why?

Because LLMs don't magically know everything.

They only know what was available during training.

Anything outside that knowledge is a problem.

And that's exactly the problem RAG tries to solve.

What I Thought RAG Was

When I first heard about RAG, I imagined something extremely complicated.

Maybe:

Multiple AI models
Complex reasoning systems
Fancy prompt engineering tricks

But after digging deeper, I realized RAG is surprisingly simple.

The biggest surprise wasn't the "Generation" part.

It was the "Retrieval" part.

Notice how the answer isn't generated immediately. The system first searches for relevant information, gathers context, and only then asks the LLM to generate a response.

This was the moment I started seeing RAG as a search problem rather than an AI problem.

The Library Analogy That Made Everything Click 📚

Imagine walking into a library with one million books.

You ask:

"How do black holes affect time?"

What would a librarian do?

Probably not this:

Read 1,000,000 books
      ↓
Find answer

Instead:

Find relevant books
      ↓
Open relevant pages
      ↓
Read only what's needed
      ↓
Answer

That's basically RAG.

The system first finds relevant information.

Then the LLM uses that information to generate an answer.

Why Traditional Search Isn't Enough

Let's say a document contains:

Electric vehicles are becoming more popular.

Now imagine the user searches:

Why are battery-powered cars growing in popularity?

Keyword matching struggles.

Humans don't.

We instantly understand both sentences are talking about the same thing.

Machines need help understanding that connection.

This is where embeddings enter the story.

The Concept That Changed Everything: Embeddings

Embeddings sounded scary when I first heard the term.

In reality, the idea is beautiful.

We convert text into numbers.

Something like:

"car"
      ↓
[0.12, -0.55, 0.89, ...]

"vehicle"
      ↓
[0.15, -0.51, 0.91, ...]

The exact numbers don't matter.

What matters is this:

Similar meanings produce similar vectors.

Which means:

car
vehicle
automobile

end up close together in vector space.

Now the machine can search by meaning instead of exact words.

That's huge.

The Most Underrated Part of RAG: Chunking

When people talk about RAG, they usually talk about:

OpenAI
Gemini
Claude
Vector Databases

But one of the most important decisions happens before any of that.

Chunking

Imagine storing an entire 200-page book as a single document.

A user asks about one sentence.

Good luck retrieving that efficiently.

Instead we split content into smaller chunks:

Document
   ↓
Chunk 1
Chunk 2
Chunk 3
Chunk 4
...

Now retrieval becomes much more precise.

One thing I've learned:

Bad chunking can destroy a RAG system.

Even when everything else is configured correctly.

So Why Do We Need Vector Databases?

After creating embeddings, we need somewhere to store them.

That's where vector databases come in.

Traditional databases answer questions like:

SELECT * FROM users
WHERE name = 'John';

Vector databases answer questions like:

Find content most similar
to this question

That's a completely different problem.

And it's what makes semantic search possible.

Popular options include:

PostgreSQL + pgvector
Pinecone
Weaviate
Qdrant
Milvus

What Actually Happens Inside a RAG Pipeline?

Here's the simplified flow:

Notice something?

The LLM appears near the end.

Most of the work happens before generation.

The Biggest Lesson From My RAG Journey

When I started learning RAG, I thought:

Better model = better answers

Now I think:

Better retrieval = better answers

Because even the most powerful model can't answer questions if the relevant information never reaches it.

That's why experienced engineers spend so much time improving:

Chunking
Embeddings
Search quality
Metadata filtering
Reranking

The answer quality often depends more on retrieval than generation.

Final Thoughts

The most surprising thing I've learned about RAG is that it changed the way I think about AI systems.

I used to believe the intelligence lived entirely inside the model.

Now I realize a huge part of the intelligence comes from finding the right information at the right time.

And that's why I no longer see RAG as just an AI technique.

I see it as a search problem that happens to use AI.

And honestly?

That realization taught me more about modern AI than any prompt engineering tutorial ever did.

💡What's the most surprising thing you've learned while building or learning RAG? I'd love to hear your experience in the comments.

DEV Community