DEV Community

Cover image for LLM, RAG, AGENT And MCP
Taki
Taki

Posted on

LLM, RAG, AGENT And MCP

🧠 Modern AI Systems β€” A Practical, End-to-End Mental Model

Goal: Understand how LLMs, RAG, AI Agents, and MCP fit together to build real production AI systems, not demos.

Modern AI is not one model.
It is a system of responsibilities.

LLM   β†’ understands & reasons with language
RAG   β†’ retrieves correct knowledge
Agent β†’ decides & performs actions
MCP   β†’ standardizes context & tools
Enter fullscreen mode Exit fullscreen mode

Each layer exists because the previous one cannot solve real-world problems alone.


1️⃣ LLM (Large Language Model) β€” The Brain

What an LLM actually is (no marketing)

An LLM is a probabilistic model trained to predict:

β€œWhat token comes next?”

That’s it.

It does not:

  • think like a human
  • reason independently
  • β€œknow” facts by default

It predicts patterns extremely well.


Mental model (important)

🧠 A very powerful autocomplete engine

Example:

"The capital of France is ___"
Enter fullscreen mode Exit fullscreen mode

β†’ β€œParis”

Not because it understands geography,
but because that sequence appears frequently in training data.


What LLMs are good at

βœ… Natural language understanding
βœ… Writing & summarization
βœ… Translation
βœ… Code generation
βœ… Reasoning within provided context


Critical weaknesses

❌ Hallucination (confidently wrong answers)
❌ No access to private/company data
❌ No long-term memory
❌ Knowledge cutoff
❌ Cannot take actions (no APIs, no workflows)

πŸ‘‰ LLMs alone are not usable in production systems.

This leads to the next layer.


2️⃣ RAG (Retrieval-Augmented Generation) β€” Giving the Brain Memory

Problem RAG solves

β€œHow can the LLM answer questions using our private, up-to-date data without retraining it?”


Core idea (simple)

Instead of:

User β†’ LLM β†’ Answer (may hallucinate)
Enter fullscreen mode Exit fullscreen mode

We do:

User β†’ Retrieve relevant data β†’ LLM β†’ Answer grounded in data
Enter fullscreen mode Exit fullscreen mode

LLM = language & reasoning
RAG = knowledge retrieval


How RAG works (step by step)

1. Prepare data

  • Split documents into chunks
  • Convert text β†’ vectors (embeddings)
  • Store in a vector database

2. User asks a question

  • Question β†’ vector

3. Similarity search

  • Retrieve the most relevant chunks

4. Prompt the LLM

SYSTEM:
Answer only using the provided context.

CONTEXT:
[retrieved documents]

USER:
What is the refund policy?
Enter fullscreen mode Exit fullscreen mode

5. LLM answers

  • Grounded
  • Auditable
  • Much lower hallucination risk

Why RAG is mandatory in real systems

Without RAG:

  • AI makes things up
  • Legal and compliance risk
  • Users lose trust

With RAG:

  • Accurate answers
  • Data can be updated anytime
  • No model retraining

RAG tooling ecosystem

Embedding models

  • OpenAI embeddings
  • Cohere
  • SentenceTransformers
  • BGE / E5 / Instructor

Vector databases

  • Pinecone
  • Weaviate
  • Qdrant
  • Milvus
  • FAISS
  • MongoDB Atlas Vector Search
  • Elasticsearch (vector)

Frameworks

  • LangChain
  • LlamaIndex
  • Haystack

RAG limitation

RAG can:
βœ” answer questions

RAG cannot:
❌ decide what to do
❌ call APIs
❌ run workflows

That requires the next layer.


3️⃣ AI Agent β€” Giving the Brain Hands & Goals

Problem agents solve

β€œI don’t just want answers β€” I want the AI to do things.”

Example:

β€œCheck my order, see if it’s delayed, open a ticket, notify me.”

This is multi-step work.


What an AI Agent is

An AI Agent =

LLM
+ tools
+ memory
+ decision loop
Enter fullscreen mode Exit fullscreen mode

Core agent loop (critical concept)

1. Observe (input & state)
2. Reason (LLM)
3. Choose action
4. Execute tool
5. Observe result
6. Repeat until goal achieved
Enter fullscreen mode Exit fullscreen mode

This is often called a ReAct loop (Reason + Act).


Example agent tools

Tool Purpose
search_docs RAG search
get_order_status Backend API
create_ticket CRM / Support
send_email Notification
write_db Memory / Logging

When to use (and not use) agents

Use agents when:

  • Multi-step reasoning is required
  • Tools must be orchestrated
  • Decisions depend on outcomes

Do NOT use agents for:

  • Static FAQs
  • Simple Q&A
  • Single-step tasks

Agents are:

  • Slower
  • More expensive
  • Harder to debug

Agent tooling ecosystem

Frameworks

  • LangChain Agents
  • OpenAI Assistants
  • AutoGen
  • CrewAI
  • Semantic Kernel

Execution

  • REST / gRPC
  • Function calling
  • Webhooks

Memory

  • Redis
  • PostgreSQL
  • Vector databases
  • In-memory stores

4️⃣ MCP (Model Context Protocol) β€” The Nervous System

This is architecture-level, not prompt engineering.


The scaling problem MCP solves

As systems grow:

  • Prompts duplicated everywhere
  • Tools defined inconsistently
  • Context assembled differently per service
  • Agents break when tools change
  • Models tightly coupled to apps

This becomes prompt spaghetti 🍝


What MCP is (plain English)

MCP is a protocol that standardizes how models:

  • discover tools
  • receive context
  • access resources

Think of it as:

πŸ“‘ A REST API for LLM context and capabilities


Mental model

LLM / Agent
   ↓
MCP Server
   ↓
Tools | Data | Memory | Capabilities
Enter fullscreen mode Exit fullscreen mode

The model does not guess what it can do.
It discovers capabilities via MCP.


Why MCP matters

With MCP:

  • Clean architecture
  • Tool discoverability
  • Model-agnostic systems
  • Easier testing & maintenance

Without MCP:

  • Hidden coupling
  • Fragile agents
  • Hard-to-replace models

MCP ecosystem

  • OpenAI MCP
  • Anthropic MCP
  • Custom MCP servers
  • Integrations: databases, filesystems, APIs, GitHub

5️⃣ Real-World Project β€” End-to-End System

Project: AI Customer Support Assistant (E-commerce)


Requirements

  • Answer policy questions
  • Check order status
  • Handle refunds
  • Escalate to humans when needed

Architecture

Chat UI
  ↓
Backend API (e.g. NestJS)
  ↓
MCP Server
  ↓
Agent
  ↓
RAG + Business Tools
Enter fullscreen mode Exit fullscreen mode

Component responsibilities

LLM

  • Language understanding & reasoning

RAG

  • Product docs
  • Refund & shipping policies
  • FAQs in vector DB

Agent

  • Decide when to search
  • Call order APIs
  • Create tickets
  • Escalate issues

MCP

  • Defines tools:

    • search_knowledge_base
    • get_order_status
    • create_support_ticket
  • Provides clean, consistent context


Example user flow

User:

β€œMy order hasn’t arrived. What should I do?”

Agent:

  1. Retrieve shipping policy (RAG)
  2. Call order status API
  3. Evaluate delay
  4. Decide next action
  5. Respond or open ticket

No hallucination
No hardcoded prompts
Fully scalable


🧠 Final Mental Model (memorize this)

LLM   β†’ understands language
RAG   β†’ retrieves truth
Agent β†’ performs actions
MCP   β†’ organizes everything
Enter fullscreen mode Exit fullscreen mode

🎯 Target Project (One Project, Many Levels)

AI Knowledge & Action Assistant for a Company

  • Answers questions from internal docs
  • Can take actions (create tickets, generate reports)
  • Safe, auditable, scalable

Stack:

  • Frontend: Next.js
  • Backend: NestJS
  • AI: LLM + RAG + Agents + MCP
  • Infra: Docker, Env-based config

PHASE 0 β€” Mental Model (Day 0)

Before writing code, understand this flow:

UI β†’ API β†’ AI Core β†’ Tools β†’ Result β†’ UI
Enter fullscreen mode Exit fullscreen mode

Everything you build later fits somewhere here.

If you don’t know where a piece belongs, don’t code it.


PHASE 1 β€” LLM Basics (Beginner)

⏱ Time: 1–2 days
🎯 Goal: β€œI can talk to an LLM safely via backend”


1.1 What you build

A simple chat API:

POST /chat
{
  "message": "Explain SOLID principles"
}
Enter fullscreen mode Exit fullscreen mode

Response:

LLM text
Enter fullscreen mode Exit fullscreen mode

1.2 Architecture (minimal but correct)

Next.js
  ↓
NestJS Controller
  ↓
AI Service
  ↓
LLM Provider (OpenAI / Gemini / Claude)
Enter fullscreen mode Exit fullscreen mode

1.3 Key lessons here (VERY important)

βœ… Backend owns AI calls

Never call LLM directly from Next.js.

Why:

  • API key security
  • Rate limiting
  • Observability
  • Cost control

βœ… Prompt β‰  Message

Start separating:

  • system prompt
  • user prompt

This prepares you for agents later.


1.4 Common beginner mistakes

❌ Hardcoding API keys
❌ No timeout handling
❌ No token limits
❌ Trusting LLM output blindly


Exit criteria

βœ” You can explain what an LLM can & cannot do
βœ” You understand tokens & costs
βœ” You never expose LLM keys to frontend


PHASE 2 β€” RAG (Intermediate Foundation)

⏱ Time: 3–5 days
🎯 Goal: β€œMy AI answers using MY data”


2.1 What you add

  • Document ingestion
  • Embeddings
  • Vector search

2.2 Architecture upgrade

User Question
  ↓
Vector Search
  ↓
Relevant Chunks
  ↓
LLM (context injected)
  ↓
Answer + Sources
Enter fullscreen mode Exit fullscreen mode

2.3 What you actually build

Backend

  • /documents/upload
  • /documents/index
  • /ask

Storage

  • Raw files (S3 / local)
  • Vector DB

2.4 Chunking (don’t skip this)

Bad chunking = bad AI.

Rules:

  • 300–800 tokens per chunk
  • Overlap ~10–20%
  • Keep semantic meaning intact

2.5 Prompt discipline (critical)

Your prompt should say:

β€œAnswer ONLY using provided context.
If missing, say you don’t know.”

This single rule prevents 80% hallucinations.


Common RAG failures

❌ Stuffing too much context
❌ No metadata filtering
❌ No source citation
❌ Treating vector DB as magic


Exit criteria

βœ” AI answers correctly from internal docs
βœ” Hallucination rate is low
βœ” You can swap vector DB without rewriting logic


PHASE 3 β€” Structured AI Core (Pre-Agent)

⏱ Time: 2–3 days
🎯 Goal: β€œAI logic is modular and testable”


3.1 Why this phase exists

If you jump straight to agents:

πŸ’₯ You will create an un-debuggable mess

So first: structure the AI core.


3.2 Introduce these concepts

  • Prompt templates
  • Output schemas (JSON)
  • AI β€œuse cases”

Example:

AnswerQuestionUseCase
SummarizeDocUseCase
ExtractTasksUseCase
Enter fullscreen mode Exit fullscreen mode

Each one:

  • Has input
  • Has prompt
  • Has expected output

3.3 This unlocks later

  • Tool calling
  • Agents
  • Validation
  • Retries

Exit criteria

βœ” AI responses are structured
βœ” You can validate outputs
βœ” You can test AI logic without UI


PHASE 4 β€” AI Agents (Action Layer)

⏱ Time: 4–7 days
🎯 Goal: β€œAI can plan and act, not just talk”


4.1 What changes conceptually

From:

Request β†’ LLM β†’ Response
Enter fullscreen mode Exit fullscreen mode

To:

Goal β†’ Think β†’ Act β†’ Observe β†’ Repeat
Enter fullscreen mode Exit fullscreen mode

4.2 What you build

Agent with:

  • Goal
  • Memory
  • Tool registry
  • Stop conditions

4.3 Example Agent

Goal:

β€œCreate a weekly report and open tasks”

Tools:

  • search_docs
  • create_jira_ticket
  • generate_markdown

4.4 Critical safety rules

  • Max steps
  • Max tokens
  • Tool allowlist
  • Read-only vs write tools

This is non-negotiable in production.


Common agent failures

❌ Infinite loops
❌ Too much autonomy
❌ No human approval
❌ No logs


Exit criteria

βœ” Agent completes tasks reliably
βœ” You can stop it at any time
βœ” Every action is logged


PHASE 5 β€” MCP (Production Tooling Layer)

⏱ Time: 3–5 days
🎯 Goal: β€œSafe, scalable tool integration”


5.1 What MCP gives you

  • Tool discovery
  • Strong schemas
  • Permission control
  • Replaceable tools

5.2 Architecture

Agent
 ↓
MCP Client
 ↓
MCP Server
 ↓
Tool Implementations
Enter fullscreen mode Exit fullscreen mode

5.3 Why MCP matters in production

Without MCP:

  • Hardcoded tools
  • Unsafe execution
  • Tight coupling

With MCP:

  • Clean contracts
  • Auditing
  • Enterprise-ready

Exit criteria

βœ” Tools are schema-defined
βœ” Permissions are enforced
βœ” Agents can’t β€œinvent” tools


PHASE 6 β€” Production Hardening

⏱ Time: ongoing
🎯 Goal: β€œThis won’t wake me up at 3AM”


6.1 Mandatory production features

πŸ” Security

  • API auth
  • Tool permissions
  • Input sanitization

πŸ“Š Observability

  • Prompt logs
  • Token usage
  • Agent step traces

πŸ’° Cost control

  • Token budgets
  • Rate limits
  • Model tiers

6.2 Human-in-the-loop

For risky actions:

  • Show plan
  • Ask approval
  • Then execute

PHASE 7 β€” Scaling & Multi-Agent

⏱ Time: advanced
🎯 Goal: β€œAI team, not AI bot”

Examples:

  • Planner agent
  • Executor agent
  • Reviewer agent

Each has one responsibility.


FINAL MENTAL MODEL (Memorize This)

Phase 1: LLM β†’ Brain
Phase 2: RAG β†’ Knowledge
Phase 3: Structure β†’ Discipline
Phase 4: Agent β†’ Action
Phase 5: MCP β†’ Safety
Phase 6: Production β†’ Survival
Enter fullscreen mode Exit fullscreen mode

Top comments (0)