After using Cursor for over a year on my side projects, I was convinced AI was coming for our jobs at the speed of light. But the more I used it, the more I realised that — beneath all “magic,” AI is still just code running on someone else’s machine.
And as a developer, that thought wouldn’t leave me alone. I wanted a peek behind the curtain. I wanted to see the magic myself.
So, to really understand how Cursor works under the hood, I decided to build my own mini version from scratch.
In this post, I’ll walk you through how I built it, the logic behind tool-calling and share a few code snippets so you can try yourself.
How AI Tool Calling works
The core idea behind Cursor - and most AI agents - is surprisingly simple.
When you ask an AI agent to “create a file” or “run a command,” it doesn’t actually execute anything itself.
The model has no access to the system, no shell, no files.
Instead, the LLM’s job is to decide which tool to call and what parameters to pass.
Think of it as a smart middleman orchestrating a conversation between your request and real code:
User Query → LLM → Tool → Result → LLM → Final Answer
The LLM detects intent, formats a structured command, and a separate process (i.e our code) handles the real execution.
Here’s what really happens under the hood:
- We give a natural language instruction — “Write ‘Hello World’ to a file.
- LLM parses that intent and determines it should use the FileWriter tool.
- It returns a structured JSON-like command, e.g:
{
"tool": "FileWriter",
"input" :"Hello World"
}
- Our backend (the “executor”) runs the actual function associated with that tool.
- The tool’s output is then sent back to the LLM, which turns it into a human-readable response — “File written successfully.”
So, what feels like “AI doing work” is really AI orchestrating work — reasoning about what needs to be done, while our code actually does it.
That’s the exact mechanism Cursor, Copilot, and most agent frameworks like LangChain or LangGraph use under the hood.
Setup
We’ll start by setting up our environment and preparing the tools.
Let’s begin by installing the required dependencies:
pip install -q -U openai dotenv requests
We’ll need an OpenAI key to talk to the LLM. If you prefer a free option, you can run models locally using Ollama — just note that performance will depend on your hardware.
We can store this key in a .env file or export it from the terminal, I recommend the .env method.
Now, let’s load these environment variables and initialize the OpenAI client
from dotenv import load_dotenv
from openai import OpenAI
import json
import requests
import os
# # Load environment variables into runtime
load_dotenv()
# Create new openai client
client = OpenAI()
This ensures our API key is securely loaded and we’re ready to talk to the model.
Defining system prompt
Before we let the LLM start reasoning, we need to set the stage.
The system prompt defines how the model should behave — almost like giving it a rulebook.
In this case, the prompt tells the model to follow a clear reasoning process, making sure it uses the correct step-by-step structure, adheres to the JSON format, and only calls tools during the designated action step.
SYSTEM_PROMPT = """
You are a helpful AI assistant specialized in resolving user queries through structured reasoning and tool use.
You operate in a five-step cycle:
Start → Plan → Action → Observe → Output
For the given user query and available tools, plan the step-by-step execution.
Based on the planning, select the relevant tool from the available tools and, based on the tool selection, perform an action to call the tool.
Step Definitions:
- Start: Read and understand the user's query carefully.
- Plan: Create a step-by-step plan: which tool(s) to use and the exact inputs.
- Action: Execute exactly one tool per step using proper input parameters.
- Observe: Wait for the tool's output and summarize it.
- Output: Generate the final user-facing answer.
Rules:
- Always follow the sequence: start → plan → action → observe → output
- Perform one step at a time and wait for the next input.
- Only Action steps may call a tool (exactly one tool per Action).
- Each step must follow the Output JSON Format.
- Be concise, structured, and analytical.
Output JSON Format:
{
"step": "string",
"content": "string",
"function": "The name of the tool used, if the step is 'action'",
"input": "The input parameter for the tool",
"output": "The output from the selected tool, if the step is 'observe'"
}
Available Tools:
- "get_weather": Takes a city name as input and returns the current weather for the city.
- "run_command": Takes a Linux command string, executes it, and returns the output.
Examples:
EXAMPLE 1: Ask for a location's weather
User Query: What's the weather in New York?.
Step-by-step Output:
{"step":"start","content":"User wants the current weather in New York."}
{"step":"plan","content":"I should use the 'get_weather' from the available tools list with input 'new york' to fetch the current weather."}
{"step":"action","function":"get_weather","input":"new york"}
{"step":"observe","output":"12°C, cloudy"}
{"step":"output","content":"The weather in New York is 12°C and cloudy."}
EXAMPLE 2: Run Linux Command
User Query: Show me the list of files in the current directory.
Step-by-step Output:
{"step": "start", "content": "User wants to see the files in the current directory."}
{"step": "plan", "content": "I should use the 'run_command' tool with input 'ls' to list the files."}
{"step": "action", "function": "run_command", "input": "ls"}
{"step": "observe", "output": "['main.py', 'notes.txt', 'data.csv']"}
{"step": "output", "content": "The directory contains: main.py, notes.txt, and data.csv."}
"""
# Set the initial system message that defines how the LLM should operate
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
This prompt gives the LLM a defined reasoning cycle and a JSON structure to communicate with our code.
You can freely modify it to experiment different behaviours, the possibilities are endless.
Defining the Tools (Agent’s Action Layer)
Now that our LLM knows how to think, let’s give it some tools to perform some actions.
We’ll define two simple tools and register them in a dictionary.
# Fetch current weather for a given city
def get_weather(city: str):
url = f"https://wttr.in/{city}?format=%C+%t"
response = requests.get(url)
if response.status_code == 200:
return f"The weather of {city} is {response.text}"
return f"Something went wrong"
# Execute shell commands on host system
def run_command(cmd: str):
result = os.system(cmd)
return result
available_tools = {
"get_weather": get_weather,
"run_command": run_command
}
The get_weather function calls an external API to fetch weather info, while run_command executes a Linux command and returns its output.
These represent the action layer of our agent.
Orchestrating the Agent
Time to wire it all up. This is where the LLM and tools actually start talking to each other in a structured loop
# Outer loop: keep taking new user queries
while True:
query = input("> ")
messages.append({"role": "user", "content": query})
# Inner loop: process the LLM's step-by-step reasoning cycle
while True:
# Talk to the LLM
response = client.chat.completions.create(
model="gpt-4.1",
response_format={"type": "json_object"},
messages=messages
)
# Add the response from Assistant to the current context
messages.append({"role": "assistant", "content": response.choices[0].message.content})
# Get the response object from JSON
parsed_response = json.loads(response.choices[0].message.content)
# If the step is plan, print the response from LLM
if parsed_response.get("step") == "plan":
print(f"🧠: {parsed_response.get("content")}")
continue
# If the step is action, print the tool and input and call the appropriate tool
if parsed_response.get("step") == "action":
tool_name = parsed_response.get("function")
tool_input = parsed_response.get("input")
print(f"🛠️: Calling tool: {tool_name} with input: {tool_input}")
# If the selected tool is present in the available tools list, call the tool, get the output and append it to the messages
if available_tools.get(tool_name) != False:
output = available_tools[tool_name](tool_input)
messages.append({"role": "assistant", "content": json.dumps({"step":"observe", "output": output})})
continue
# If step is output, print the final response from LLM
if parsed_response.get("step") == "output":
print(f"🤖: {parsed_response.get("content")}")
break
The outer loop waits for a new user query and adds it to the message history.
The inner loop handles the LLM’s reasoning cycle — it sends the full message context to the model, parses the JSON response, and decides what to do next.
- If the step is plan, we print the model’s reasoning.
- If the step is action, we call the corresponding tool with the provided
inputand feed the tool’s output back into the conversation as an observe step. - When the model reaches the output step, we print the final answer and break out of the inner loop.
This loop continues, allowing the agent to plan, act, observe, and respond — just like Cursor-style agentic workflows.
Testing the Agent
Now that everything is wired together, let’s look at how the agent behaves with real queries.
Below are two examples that showcase tool-calling, multi-step reasoning, and the final output.
Fetching weather & writing to a file
I asked the agent to fetch the weather for a city and then write the result to a weather.txt file.
LLM plans the steps, calls the get_weather tool, observes the output, and finally instructs the system to write the result to a file using run_command tool.
This demonstrates how the agent uses external tools to perform tasks beyond simple text generation.
Creating a todo app
Previous example felt pretty basic so, next, I asked the agent to create a basic HTML/CSS/JS Todo App.
It breaks the task down, generates the code, and outputs the completed files.
Once generated, I opened the output locally to see what the AI built:
Pretty awesome right!
This shows how tool-calling and structured reasoning can handle small, practical coding tasks with ease.
Wrapping Up
Building this mini agent made one thing clear: Cursor’s “magic” is really structured reasoning and a straightforward tool-calling loop. The LLM carefully plans and executes the tools resulting in a seamless experience that goes beyond plain text generation.
This small project taught me how:
A good system prompt can define a full reasoning workflow.
LLMs don’t do tasks — they orchestrate them.
Simple code become powerful “tools” when paired with the model.
Real-world tasks like fetching data, running commands, or generating code are just extensions of the same idea.
Turns out you don’t need the whole agentic-AI machinery to feel like something’s happening — a few Python functions plus an LLM doing backflips is enough to summon a tiny wizard of your own!



Top comments (0)