🎯MCP vs Direct API Calls

Vansh Uttam — Thu, 12 Feb 2026 19:42:01 +0000

MCP is designed to abstract the complexity of traditional APIs (which were built for developers not for AI models).
While in direct API Calls we have to hardcode every possible interaction. e.g we have to explicitly define every endpoint and parameter in advance. And a lot of integration burden for every individual API (e.g REST, GraphQL).

🤔What is MCP used for ?

We use MCP while building AI agents and applications that require dynamic, autonomous and context aware interactions with external tools and data. Like while you ask LLM "what is the weather in California?", as it does not have real time data, so it make an external tool call to a weather API to fetch the data and then it serve to us.

🧩MCP can solve following problems-

Dynamic Tool Discovery: It allows AI agents to dynamically discover and understand available tools and their capabilities at runtime.
Standardization: It provides a single, unified protocol for AI-tool communication, it act as a universal connector like USB-C. Meaning once an AI agent understands MCP, it can potentially use any MCP-compliant service, which reduces integration burden on developer.
Context and State Management: It supports stateful sessions and bidirectional context streaming, allowing AI agents to maintain conversation history and build upon previous interactions. Meanwhile traditional APIs are stateless, requiring developers to manually manage and pass all necessary context with each independent request.
Enhanced Security: Direct API access for AI models can be risky, exposing sensitive API keys and potentially allowing models to make unintended or malicious requests. MCP acts as a controlled layer that abstracts raw credentials or network details from AI.
Simplified Workflow: MCP enables the creation of powerful, high-level tools that abstract complex, multi-step operations into a single AI command.

📝Important: We often user a hybrid approach for real world use cases. Although MCP is optimized for AI specific tasks, but it lacks maximum control, predictable/stable workflow (e.g payment processing or user authentication), latency due multistep workflow which include reasoning and context.

How caching helps in LLM Application?

Vansh Uttam — Thu, 12 Feb 2026 19:37:09 +0000

🤔What is caching?

Caching is the technique of storing frequently accesed data in a temporary, high speed storage base (e.g Redis). It reduces the unintentional compute load on the server for the same request and reduces latency.

🤝How it help with LLM API calls?

Unlike in traditional API call which needs database fetching or maybe some computation on the server.

In LLM API calls the cost is measured on the basis of “Tokens”, not only we need it while making request as a client (called “Input Token”) but also the responses which we receive from the Model are based on tokens called “Output Tokens”.

And we have to pay per token. Which is really expensive in the scenario where a user request the same query again and again. So we implement “Caching” to store frequently asked query and it’s response.

💭Here is an analogy:-

User A - “What are some good places to visit in japan?”

User B - “I want to visit japan what are some good spots.”

Both of the above prompts are semantically similar. It would be great to “Cache” the response once and serve multiple time.

Caching not only reduces your API bill but also reduces the latency. “Cache hit” is generally 10-20 ms while generating a fresh response could take 3-5 sec.

🦾Best use cases -

FAQ bots
RAG with repeated queries.
Fixed prompt generation
Educational/learning Apps.
Where facts are involved.

🫸When to avoid Caching?

Although caching helps in optimization, but it is not always a great idea to cache every response. There are some situations where we strictly avoid caching.

Legal/medical context.
User specific data.
Personalized outputs.
Response involve realtime data.(e.g stock price, currency rate conversion)
Where creativeness is priority.(Temperature≠0)

But how do we distinguish between responses?

We use hashing to do so. There are many important parameters which needs to be passed in a hash function. Some of them are temperature(measure of randomness), model, prompt.

🤔What about chat involving multiple interaction? In that case we pass “full conversation history”.

💲Here is a short analysis of caching vs no caching:

Suppose each API call costs $0.0001

Without caching-

100,000 same prompt = $10 😵‍💫

With caching-

100,000 same prompt = $0.0001 😎

📝Note: Caching comes with some tradeoffs so, a deep analysis is better to weather use it or not or use a hybrid approach instead.

DEV Community: Vansh Uttam

🎯MCP vs Direct API Calls

How caching helps in LLM Application?