“Prompt Engineering Is Enough” Is Wrong – Here’s What I Had to Add

#learning #ai #machinelearning #architecture

I used to believe the hype.

I thought if I just wrote better prompts clearer instructions, few‑shot examples, chain‑of‑thought so that I could make any LLM do whatever I wanted. I spent weeks refining prompts, tweaking wording, adding “think step by step” like it was magic.

Then I tried to build something useful.

I asked the model to check the weather and tell me if I needed an umbrella. The response was confident and completely wrong. It hallucinated the forecast based on its training cut‑off. No real data, just made‑up facts wrapped in perfect English.

The prompt was excellent. The model was powerful. The result was useless.

That’s when I realised the uncomfortable truth: prompt engineering is not enough.

I went back and added one thing that actually fixed it tools.

I gave the model a simple weather tool and forced it to use a basic OBSERVE → THINK → ACT loop. Nothing fancy. Just three steps every single time.

Here’s what happened.

First loop – OBSERVE

The model receives my request: “Check the weather in Kochi and tell me if I need an umbrella today.” It sees it has access to a tool called get_weather(location).

Second loop – THINK

Instead of guessing, it reasons out loud: “I don’t have current weather data. I should use the tool to get real information.”

Third loop – ACT

It calls the tool in clean JSON:

{
  "tool": "get_weather",
  "arguments": {
    "location": "Kochi, Kerala"
  }
}

The tool returns actual data. The model goes back to THINK mode, sees the result (“light rain expected”), and only then generates the final answer: “Yes, take an umbrella.”

No hallucination. No made‑up weather. Just one tool + one loop.

I tested it on something harder, building a small React login page from scratch. With pure prompting, the model produced broken, outdated code and confidently told me it was correct. After adding file_system and run_command tools plus the same OBSERVE‑THINK‑ACT loop, it actually listed the directory, read the existing package.json, wrote proper components, fixed its own bugs, and shipped working code across eight loops.

The model was the same. The prompts were similar. The only difference was that I stopped treating the LLM as a magic oracle and started treating it as a brain that needs hands.

I also added MCP, the simple protocol everyone now calls “USB‑C for AI.” It made plugging in new tools ridiculously easy. No custom glue code. Just declare the tool once, and the agent knows exactly how to call it.

The change was night and day.

I stopped wasting time writing longer and longer prompts. I started adding tools and a reliable loop instead. The results went from “impressively wrong” to “actually useful.”

The lesson for 2026 is brutally simple: prompt engineering is table stakes, not the complete solution. If your LLM keeps hallucinating, forgetting tasks, or failing at real work, stop tweaking the prompt. Give it proper tools and a structured loop to use them.

The model is the brain. Tools are the hands. Without hands, even the smartest brain is stuck guessing.

I no longer believe “prompt engineering is enough.”

I now know exactly what I have to add.