I Turned Any REST API into an AI-Powered Chatbot (Without Writing a Backend Twice)

#ai #mcp #llm

“What if you could take any boring REST API… and talk to it like ChatGPT, without touching its backend code?”

I hacked this together over a weekend. No rewrites, no breaking changes — just a lightweight MCP server in Python sitting between my APIs and a Streamlit chatbot. The result? I can now chat with my APIs in plain English as if they were humans.

In this article, I’ll show you how I built it, step by step, so you can do the same — even if your backend is Java, Node, or something else entirely.

✨ TL;DR
Problem: Teams want AI features quickly but can’t rework existing services.
Solution: Add a small Python MCP serverthat registers REST endpoints as callable tools for an LLM — no upstream changes.
Outcome: AI access to any API within minutes, incremental rollout, and a central place for security and governance.

🤔 Why this is worth your 30 minutes
You’ve already got APIs running. Rewriting them for AI feels like starting over — avoid that frustration. Use an adapter instead.

The LLM decides what tool to call.
The MCP server calls your existing API.
The LLM turns the JSON into a natural-language answer.
💡 Result: No refactor, no downtime, just a clean conversational layer.

✅ Why this works (MCP)
The Model Context Protocol (MCP) is quickly becoming the standard way to let LLMs call tools and services safely. Instead of embedding business logic into your chatbot, you expose tools over MCP and let the AI call them when needed.

Think of it as turning APIs into “functions” that an LLM can use.

Many production systems are stable and important but weren’t designed for direct integration with LLMs. Rather than rewriting, the MCP server wraps those services so an LLM can safely call them. This gives you a fast path to AI features while preserving control over authentication, privacy, and reliability.

🧭 The approach — MCP Server
Model Context Protocol (MCP) is a thin adapter between an LLM and your APIs:

Register each REST endpoint as a tool in the server.
Expose the tools via a WebSocket JSON-RPC interface.
The LLM requests a tool call (name + args) → server calls the API → returns structured JSON → LLM converts to human text.
Key benefits: zero upstream changes, incremental adoption, centralized access control and redaction.

🎨 Flowchart — How it all connects
Here’s the bird’s-eye view of how requests flow through the system:

Request flow through the MCP system
🏛️ Architecture — The modern view
This is the more detailed breakdown showing where the governance and safety layers sit:

🧩 Minimal MCP server(concise example)
A compact Python server that demonstrates the core pattern. In real use, you should add auth, validation, logging, and error handling.

# mcp_server.py — minimal MCP server
import asyncio
import json
import requests
import websockets
from typing import Any, Dict

TOOLS: Dict[str, Any] = {}

def tool(fn):
    TOOLS[fn.__name__] = fn
    return fn

BASE_URL = "http://localhost:8080"  # point to your Spring Boot / backend

@tool
def get_user(id: int):
    r = requests.get(f"{BASE_URL}/users/{id}", timeout=5)
    if r.status_code == 404:
        return {"error": "not_found", "message": "User not found"}
    r.raise_for_status()
    # Optionally redact/summarize fields here
    return r.json()

@tool
def list_users():
    r = requests.get(f"{BASE_URL}/users", timeout=5)
    r.raise_for_status()
    return r.json()

async def handler(ws):
    async for raw in ws:
        req = json.loads(raw)
        method = req.get("method")
        params = req.get("params", {})
        if method in TOOLS:
            try:
                result = TOOLS[method](**params)
                resp = {"jsonrpc": "2.0", "id": req.get("id"), "result": result}
            except Exception as e:
                resp = {"jsonrpc": "2.0", "id": req.get("id"), "error": str(e)}
        else:
            resp = {"jsonrpc": "2.0", "id": req.get("id"), "error": f"method_not_found: {method}"}
        await ws.send(json.dumps(resp))

async def main():
    print("🚀 MCP server running at ws://0.0.0.0:8765")
    async with websockets.serve(handler, "0.0.0.0", 8765):
        await asyncio.Future()

if __name__ == "__main__":
    asyncio.run(main())

Extension idea: register a schema with each tool (JSON Schema / OpenAPI snippet) so the LLM can form correct arguments without trial and error.

💬 The Streamlit agent (two-pass pattern)
This pattern ensures the LLM never fabricates data and that responses are polished.

Pass 1 — Plan: Provide system + user messages + tool metadata (names + param schemas). The LLM responds with either a direct answer or a tool call (function name + arguments).
Tool Execution: Server executes the tool and returns a concise JSON result.
Pass 2 — Synthesize: Feed the tool result back into the conversation as a tool role message. The LLM generates the final natural-language response referencing the actual data.
This also allows for safe clarification: if a tool call fails, the server returns an error object the LLM can use to ask the user a follow-up.

Pseudocode:

first = openai.chat.create(model, messages, tools=tool_defs)
if first.tool_calls:
    tcall = first.tool_calls[0]
    args = json.loads(tcall.function.arguments)
    result = call_mcp(tcall.function.name, args)
    final = openai.chat.create(model, messages + [assistant_with_tool_call, tool_with_result])
    display(final.content)
else:
    display(first.content)

Streamlit sends user messages to the LLM. If the model decides it needs live data, it returns a tool/function call. The frontend executes that via MCP, and then asks the model to produce the final reply.


# streamlit_agent.py — simplified sketch
import streamlit as st, json, asyncio, websockets
from openai import OpenAI  # or whichever LLM client you use

client = OpenAI(api_key="...")

async def call_mcp(method, params):
    async with websockets.connect("ws://localhost:8765") as ws:
        req = {"jsonrpc":"2.0","id":1,"method":method,"params":params}
        await ws.send(json.dumps(req))
        return json.loads(await ws.recv())

st.title("Chat with your APIs")
if "messages" not in st.session_state: st.session_state.messages = []
msg = st.chat_input("Ask about users, orders, anything...")

if msg:
    st.session_state.messages.append({"role":"user","content":msg})
    # First LLM pass: let model plan or call tool
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=st.session_state.messages,
        tools=[{
            "type":"function",
            "function":{
                "name":"get_user",
                "description":"Fetch a user by ID",
                "parameters":{"type":"object","properties":{"id":{"type":"number"}},"required":["id"]}
            }
        }],
        tool_choice="auto"
    )
    assistant_msg = resp.choices[0].message
    # If model wants to call a tool
    if getattr(assistant_msg, "tool_calls", None):
        tcall = assistant_msg.tool_calls[0]
        args = json.loads(tcall.function.arguments)
        tool_result = asyncio.run(call_mcp(tcall.function.name, args))
        # Second LLM pass: synthesize a natural reply using tool output
        st.session_state.messages.append({"role":"assistant","content":None,"tool_calls":[tcall]})
        st.session_state.messages.append({"role":"tool","tool_call_id":tcall.id,"content":json.dumps(tool_result)})
        final = client.chat.completions.create(model="gpt-4o-mini", messages=st.session_state.messages)
        st.session_state.messages.append({"role":"assistant","content":final.choices[0].message.content})
    else:
        st.session_state.messages.append({"role":"assistant","content":assistant_msg.content})

for m in st.session_state.messages:
    st.chat_message(m["role"]).write(m["content"])

🛡️Practical tips (don’t skip)
Before exposing LLM → API access, ensure:

✅Authentication & authorization — Prevent unauthorized queries and ensure the model only sees data the user is allowed to access.
✅ Always do the second LLM pass — otherwise users see raw JSON.
✅ Redact sensitive fields before sending data back to the LLM.
✅ Validate input arguments with pydantic/jsonschema.
✅ Add rate limiting & caching.
✅ Keep audit logs (tool names, args hashes, latencies).

⚠️ Common pitfalls & quick fixes
Problem: Model returns a malformed tool call.
Fix: Validate tool args server-side and return a structured error the model can surface as a clarification question.
Problem: Returning entire database rows (PII leak).
Fix: Use field-level redaction and only expose required fields or summaries.
Problem: Model makes too many calls and overloads API.
Fix: Implement per-user and per-tool rate limits and short-term caches for repeated reads.

📈 Extensions & scaling ideas
Tool discovery (GET /tools) — enable UIs to show available actions and param hints.
OpenAPI → tools pipeline — auto-generate tool definitions and schemas from existing specs so onboarding is faster.
Multi-tenant routing — map tool calls to different backend clusters or tenants.
Caching layer — speed up repeated read queries (orders, product info).
Streaming & chunking — support long-running tasks or paginated data by streaming partial results back to the model.

⏱️ Quick 30-minute checklist
Clone or copy the MCP server code.
pip install websockets requests pydantic.
Add 2–3 read-only tools (users, orders, accounts).
Run mcp_server.py + Streamlit.
Ask “Show me user with id 3.”
Harden with auth, validation, and redaction.

🚀 Final takeaway
You don’t need to rebuild your backend for AI. By plugging an MCP server in front of your existing APIs, you unlock conversational access in minutes. It’s fast, safe, and future-proof.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.