Khush Panchal

Posted on Mar 31 • Originally published at Medium

What even is an AI Agent?… Isn’t it just an API Call?

#ai #llm #mcp #agents

If you’ve been hearing “agents will do everything” and thinking…

“Isn’t this just a GPT API with a fancy wrapper?”

You’re not alone.

Here’s the simple truth:

An LLM (like GPT-5) is like a very powerful text-predicting brain.
An Agent is the system you build around that brain.

Let’s simplify what happens behind the scenes. Please note this is the beginner’s guide, we’ll skip the jargon and explain how things work, in layman’s terms.

LLM — The “black box” that answers in English

Let’s start super basic.

An LLM (Large Language Model) is trained to do one main thing:

Given some text, predict what text should come next.

It doesn’t “think” the way humans do. It predicts the next pieces of text using patterns it learned from a vast amount of reading. That’s literally how large language models are trained.

Let’s understand this a bit more with an example.

What happens when you ask: “How does a Car Engine work?”

In the simplest possible story:

Your sentence is split into tokens (numbers — computers love numbers).
The model uses a huge set of learned numbers (the “black box” — the LLM) to predict the next token.

In short, this LLM is using Machine Learning behind the scenes.

It is fed with lots of data (scraped from the internet over the years), learns patterns from that data, and when a new set of numbers is given, it predicts the best possible next response as numbers — which are eventually converted back to text again.

That’s it.

Mini LLM pseudo code

def mini_llm(prompt_text):  
    # 1) Break text into tokens (tiny chunks) because computers work on numbers  
    token_ids = tokenize(prompt_text)  

    # 2) Predict next token again and again  
    while not stop_condition(token_ids):  
        next_id = black_box_predict_next(token_ids)  # "best next number" (probability-based)  
        token_ids.append(next_id)  

    # 3) Convert numbers back to readable text  
    return to_text(token_ids)

This “predict next token” idea, and the repeated loop, is a standard way to describe how modern text-generation systems (LLMs) produce outputs.

Some private model examples are GPT-5, Gemini 2.5 Pro, Opus 4.5, and so on.

Now say you take any model and ask:

“What’s happening today in India?”

It will not be able to answer properly on its own.

As we discussed, a model is something pre-trained with lots of data, and now it understands how to answer questions based on that knowledge. But if you ask the model about today or any current information, it will not reliably know the answer.

So how do we solve that?

Train the model daily?

Issue — Training big models is crazy expensive (like millions of dollars).

That’s why a pure pretrained model may confidently answer “how engines work” but struggle with “what happened today”.

But then how is it that when you type the same query in ChatGPT, it is able to answer properly?

And this is where RAG comes in.

RAG in one line: Search + Attach + Answer

RAG full form is:

Retrieval‑Augmented Generation

Retrieval = go and fetch the right info (like searching)
Augmented = attach that info to your question
Generation = write the final answer

They don’t “retrain the brain”.

They connect the brain to fresh information through external sources.

A beginner mental model:

RAG turns the AI from “closed book exam” into an “open book exam”.

Now if you ask “what’s today’s news?”, I can call a News API, get today’s news, and then silently update the actual prompt I send to the LLM by passing the API data I received from the News API.

Now the LLM knows what to answer, and you see the response you actually wanted.

That’s RAG — passing an external source of information to the LLM.

Mini RAG pseudo code (simple version)

def mini_rag(question):  
    # Step 1: Retrieval (search some knowledge source)  
    context = search_news(question)  # returns response from News API  

    # Step 2: Augment (attach context to the question)  
    prompt = f"""  
    Use the CONTEXT to answer the QUESTION.  

    CONTEXT:  
    {context}  

    QUESTION:  
    {question}  
    """  

    # Step 3: Generation (write the answer)  
    return mini_llm(prompt)

But now a big question:

How does the AI system know that it has to call search_news()?

What if the user asks “What’s the weather in India today?”

This time you need a weather API, not a news API.

This is where MCP comes into the picture.

MCP: A protocol for tools, like HTTP is a protocol for web APIs

In normal software, we have a simple idea:

HTTP is a standard way for apps to talk to servers.

Because of HTTP, your app can call 1,000 different APIs in a predictable way.

MCP (Model Context Protocol) tries to do something similar for AI systems:

A standard way for an AI app to connect to external tools and data sources.

It was introduced as an open standard to make secure, two-way connections between AI apps and tools/data.

MCP is the “common language” between your agent and your tools.

Now, in the structure of MCP, there is metadata that contains the description of that MCP.

So how does this MCP help us solve the problem?

MCP has a structure where each MCP has a description of what it does. We call that metadata.

For example News MCP have:

{  
 "name": "news.latest"  
 "description": "Get today's latest headlines for a topic"  
}

And this News MCP when called will internally use news API to fetch the data and then the MCP will return the response.

So now the question changes to:

How does the LLM know which MCP to call?

So the general flow becomes:

When we ask “What’s today’s news?”, the AI system silently adds the metadata of all available MCPs to the prompt request sent to the LLM and then asks the LLM: if you don’t know the answer directly, choose the correct MCP that does know.

And the LLM, based on its strength in understanding English and descriptions, can correctly identify which MCP should be called for which question.

Then we can call search_news(), and again go back to the LLM — this time with the news information — and perform our RAG.

MCP vs RAG (the clean difference) — RAG is the idea: retrieve info, attach it, answer. MCP is the standard connection method to tools and data.

Agent: the backend “manager” that makes automation real

Now we reach the main hook of this blog:

An agent is not the brain. It’s the system.

The LLM is the “brain that writes”.

The Agent is the system that:

reads your request,
decides if a tool is needed,
calls that tool,
and then asks the LLM to write the final reply using tool results.

This “tool-calling flow” is explicitly a multi-step conversation:

send tools → model chooses tool → we execute tool → send tool output → model produces final response

Big picture architecture: apps → backend agent → tools/LLM

This matches how tool calling is designed in modern LLM APIs: the application (backend) sits in the middle, executes tools, and feeds results back to the model.

The simple “Latest news today” flow you asked to LLM

User question:

“What’s the latest news today in India?”

We’ll show the exact path, and what data is sent at each step.

Key detail: the model first receives the tool menu, not the tool result

When your backend contacts the LLM initially, it sends:

the question,
tool descriptions (what tools exist, what they do, what inputs they accept)

Step-by-step sequence diagram

The “model asks for a tool, app executes it, then model answers” pattern is exactly how tool calling works.

Example pseudo code

TOOLS = [  
  {  
    "name": "news.latest",  
    "description": "Get today's latest headlines for a topic",  
    "inputs": {"query": "text"}  
  }  
]  

def agent_answer(question):  
    # 1) Ask the LLM with the tool descriptions (menu)  
    first = llm_call(  
        prompt=question,  
        tools=TOOLS  
    )  

    # 2) If LLM says: "use the news tool", backend runs it  
    if first.type == "tool_request" and first.tool_name == "news.latest":  
        news = mcp_call("news.latest", {"query": first.args["query"]})  

        # 3) Ask LLM again, now including the fresh news data  
        final = llm_call(  
            prompt=f"Question: {question}\nFresh news data: {news}\nExplain simply.",  
            tools=[]  
        )  
        return final.text  

    # If no tool needed, return what model wrote directly  
    return first.text

Again: the LLM doesn’t “browse”.

Your backend agent does the tool calls, then the model writes the final message.

One more tiny example: “2 + 2”

Here’s the beginner-friendly punchline:

LLMs are great at language, but for guaranteed maths, you can use a Calculator MCP.

So your agent can do:

If question looks like maths → use Calculator MCP
Then ask the LLM to explain nicely.

This is the same tool calling loop, just with a different tool.

Calendar agent

A calendar agent works because you give the backend:

a calendar tool (via MCP),
and the LLM can choose it based on the tool description.

So:

“Schedule a meeting tomorrow 5 pm”

becomes:

model asks to call calendar.create_event,
backend calls it,
model confirms.

That “agent loop with tools + optional human approvals” is also a core pattern described in agent systems.

The simplest takeaway

If you remember only one thing, remember this:

LLM = Machine learning brain that predicts text (token-by-token)
RAG = Give the brain fresh notes before it speaks
MCP = A protocol that plugs tools into your AI system
Agent = Your backend system that connects brain + tools + loop

That’s why “agents” are not just an API call.

They are system design.

And once you see that, all those “agent agent agent” discussions stop sounding like magic — and start sounding like engineering.

Contact Me: LinkedIn | Twitter

Happy coding! ✌️

DEV Community