DEV Community

Cover image for AGENTS.md vs. Skills: How We Refactored OpenClaw to Fix AI Hallucinations
Behram
Behram

Posted on

AGENTS.md vs. Skills: How We Refactored OpenClaw to Fix AI Hallucinations

I bet everyone has had this experience.

You ask your AI to use the new Gemini 3.0 Pro model, and it argues with you: "That model is invalid, I will use 1.5 Pro instead."
Or you are working on a Next.js project, and the AI keeps debating you, insisting on using old getStaticProps syntax when you are clearly using the App Router.

It is exhausting. You enforce rules, you add docs, you install MCP servers, you build custom "Skills"... and it still hallucinates. You feel like you are just piling rule after rule on top of a broken foundation.

I was stuck in this loop for weeks. I built complex "Research Skills" designed to force the AI to be smart, but they just turned into black boxes. I pushed a button, the AI disappeared into a script, and it came back with wrong answers (like telling me a £716 visa cost £70k).

Then, last week, I saw an article that solved everything.

Vercel's AI team published research that completely flipped my perspective. They found that simply dividing your project knowledge into Indices (in a markdown file) vs. Skills (executable code) changed the game.

I immediately tried it on my OpenClaw agent. I deleted my complex "Black Box" skills and replaced them with a simple AGENTS.md index.

The result? It worked perfectly. The hallucinations stopped. The "syntax debates" ended. Here is why—and how you can do it too.

The Vercel Wake-Up Call

Vercel's AI SDK team published research testing this exact problem on coding agents.

They compared two methods for teaching an AI about Next.js 16:

  1. Skills (Tools): Giving the AI a tool to "Look up documentation."
  2. Context (AGENTS.md): Just putting the documentation index in a markdown file in the root directory.

The results were brutal:

  • Skills: 53% Pass Rate. (The AI often forgot to use the tool, or used it wrong).
  • Context (AGENTS.md): 100% Pass Rate.

Why? Because Skills require a decision. The AI has to stop and think, "Should I check the docs?" Often, it gets lazy and guesses.
Context is passive. The instructions are just there. The AI doesn't have to choose to be smart; it has no choice but to see the map.

Refactoring OpenClaw: The "Hands vs. Brains" Split

We took this data and immediately refactored our entire agent stack. We realized we were making a fundamental architecture mistake.

We were building Skills for things that should have been Context.

The Old Way (Black Box)

  • Task: "Research this."
  • Mechanism: Call Tool: Research_Skill().
  • Reality: The AI offloads thinking to a hidden script. It stops being an intelligence and becomes a button-pusher.

The New Way (The Hybrid Stack)

We split our architecture into two distinct layers: Hands and Brains.

1. Brains (AGENTS.md + docs/)

This is for Knowledge, Rules, and Logic.
We deleted the Research.ts skill entirely. In its place, we added a simple file: docs/research.md.

# Research Protocol
1. **Source of Truth:** Always check official docs (.gov, .org) first.
2. **Citation:** You must link every claim.
3. **Limit:** Max 5 searches per topic.
Enter fullscreen mode Exit fullscreen mode

In AGENTS.md (the file the AI always sees), we just added one line:

For research tasks, READ docs/research.md first.

2. Hands (skills/)

This is for Execution only.
We kept skills for things the AI physically cannot do with its brain:

  • git (Running terminal commands)
  • whatsapp (Sending API requests)
  • remindctl (Talking to macOS)

The Result: Transparency

Now, when I ask: "Research the cost of a UK Global Talent Visa."

  1. The AI reads AGENTS.md.
  2. It sees the rule: "Read docs/research.md."
  3. It reads the protocol: "Check official sources."
  4. I see it work: I see it generating the search query: site:gov.uk global talent visa fee.
  5. It returns: "The application fee is £716. Note: Some consultants charge £70k, but that is a service fee, not the visa cost."

It worked. Not because I wrote better code, but because I stopped trying to code the thinking process.

The Guide: When to use what?

If you are building an AI agent (with Cursor, Claude, or OpenClaw), stop building complex tool chains for everything. Use this heuristic:

Requirement Use This Why?
"I need you to know X" AGENTS.md Knowledge should be passive. Don't make the AI "search" for your coding style.
"I need you to follow process Y" docs/Y.md Rules belong in markdown. They are easier to edit and easier for the AI to read.
"I need you to touch Z" Skill If it needs an API key or a CLI command, wrap it in a tool.

Start Small: The "Agile" Agent

Don't over-engineer. Start with a single AGENTS.md file in your root.

  • Add your project structure.
  • Add your preferred tech stack.
  • Add a link to your docs.

Watch your agent's IQ double overnight. The best tool you can give your AI isn't a Python script; it's a good ReadMe.

Top comments (0)