Your LLM just picked up a calculator and a search engine. Here's how.
Day 1 of PromptFromZero showed you the ReAct loop: Thought → Action → Observation, repeat until done. It looked clean. It was also a bit of a lie — the "tool calls" in that demo were hardcoded fakes. The model didn't actually call anything.
Day 7 fixes that. We wire real tools into the loop, and suddenly the model can do math without hallucinating, look up live facts, send emails, read files — whatever function you hand it.
This is the pattern behind Perplexity, Cursor's agent mode, and every "agentic" product you've seen in the last two years. It's surprisingly simple once you see the seams.
The problem with a naked LLM
Ask a language model to multiply 23 × 47 and it'll probably get it right. Ask it to multiply 7,391 × 8,847 and it starts making stuff up. Ask it what the population of France is right now and it'll give you a number from its training data, which is already months or years stale.
This isn't a bug — it's the architecture. An LLM is a frozen text predictor. It has no running process, no clock, no internet. It can only predict the next token based on patterns it memorised during training.
ReAct + Tools is the patch.
The format that makes it work
You give the model a system prompt that establishes three things:
- The Thought label — the model must always write its reasoning before acting
- The Action format — a specific string pattern your code can parse
- The tool list — what tools exist, what they do, how to call them
Here's a minimal system prompt:
You are a ReAct agent.
Available tools:
- calculator[expr] — evaluates a math expression, e.g. calculator[23*47]
- search[query] — looks up a factual question, e.g. search[population of France]
For EVERY step, write:
Thought: <your reasoning>
Action: <tool_name>[<input>]
When you have the final answer, write:
Final Answer: <your answer>
Never skip the Thought. Never write Action and Final Answer in the same step.
That's it. The model now knows it's supposed to reason out loud, pick a tool, and format the call in a way your code can intercept.
A worked trace: "What is 23 × 47 + the population of France?"
Here's what a well-prompted model produces, step by step:
Thought: I need two things — the result of 23 × 47 (math) and the
population of France (a fact I should look up). I'll start
with the calculation.
Action: calculator[23*47]
Observation: 1081
Thought: Got 1081. Now I need the population of France.
Action: search[population of France]
Observation: France has a population of approximately 68 million (2024).
Thought: Population ≈ 68,000,000. Now add 1081 + 68000000.
Action: calculator[1081 + 68000000]
Observation: 68001081
Final Answer: 23 × 47 = 1,081. France's population is ~68,000,000.
Together: 68,001,081.
Notice what happened: the model chose which tool to use based on the nature of the sub-problem. Math → calculator. Fact → search. It even chained three tool calls before it was ready to answer. The loop kept running because your code kept injecting Observations.
The loop in code
Here's the skeleton. It works with any LLM that takes a messages array:
function parseAction(text) {
const m = text.match(/Action:\s*(\w+)\[(.+?)\]/);
if (m) return { tool: m[1], input: m[2] };
if (/Final Answer:/i.test(text)) return { tool: "done" };
return null;
}
async function runTool(tool, input) {
if (tool === "calculator") {
return String(Function(`return (${input})`)());
}
if (tool === "search") {
return await mySearchAPI(input); // real or mock
}
return "unknown tool";
}
const messages = [{ role: "user", content: question }];
let answer = null;
for (let step = 0; step < 8; step++) {
const reply = await llm(SYSTEM_PROMPT, messages);
messages.push({ role: "assistant", content: reply });
const action = parseAction(reply);
if (!action || action.tool === "done") {
answer = reply.match(/Final Answer:\s*(.+)/s)?.[1];
break;
}
const obs = await runTool(action.tool, action.input);
messages.push({ role: "user", content: `Observation: ${obs}` });
}
console.log(answer);
Three things to notice:
- The model doesn't call your function. It writes text. You parse the text. You run the function. You inject the result. The model just speaks; your code acts.
-
The step cap is non-negotiable. Without
step < 8, a confused model loops forever and burns your API budget. -
Adding a tool is trivial. Add one line to the system prompt and one branch in
runTool. The loop doesn't change at all.
Why Thought before every Action matters
You might be tempted to drop the Thought requirement to save tokens. Don't.
The Thought forces the model to commit to a plan before picking a tool. Without it, models tend to call tools reflexively and then rationalise the result after. With it, they decide what they need, then ask for it. Accuracy goes up noticeably, and the traces are much easier to debug.
It also means you get a human-readable log of the agent's reasoning for free, which matters a lot when something goes wrong in production.
When should you use this pattern?
Use it when:
- The question requires live data (prices, weather, recent news)
- The question requires reliable arithmetic on large numbers
- The answer requires multiple sub-steps that each need different capabilities
- You want to give users a verifiable trace of how the answer was produced
Skip it (and just use plain prompting) when:
- The question is simple Q&A and the model already knows the answer from training
- Latency is critical — every tool call adds a round trip
- You need deterministic output — agents introduce non-determinism
The interactive demo
This series ships every technique as a live interactive page. Day 7's demo lets you pick one of three questions ("How many seconds in 3 days, divided by 7?", "GDP of Japan minus South Korea?", and the example above), then animates every step of the loop — Thought, Action with a tool badge, Observation, repeat — until the Final Answer pops up.
The tools are mocked (no real API keys needed in the browser), but the loop logic is real. You can see exactly what the agent "says" at each step and which tool it reaches for.
Check it out at dev48v.infy.uk/prompt/day7-react-tools.html — and click through UNDERSTAND for the step-by-step breakdown, then BUILD for the full code scaffolding you can drop into a real project today.
What's next
Day 8 covers RAG rerank — once you've retrieved your top-K documents from a vector search, how do you reorder them so the actually relevant ones float to the top? A cross-encoder model reads each candidate pair and scores relevance more precisely than cosine similarity can. Same retrieved set, much sharper answers.
Follow dev48v on dev.to so you don't miss it.
Top comments (0)