Ken Imoto

Posted on May 7 • Originally published at zenn.dev

Meta's AI agent rewrote its own harness 100 times -- the loop that makes self-improving agents work

#ai #agentengineering #llm #automation

Harnesses aren't supposed to be static

Most AI agent setups treat the harness -- the instructions, constraints, and tool configurations that govern agent behavior -- as a fixed artifact. You write AGENTS.md once, deploy it, and move on.

But what if the agent could improve its own harness?

I dismissed this idea for months -- sounded like the kind of thing that looks great in a blog post and falls apart the moment you try it. Then I read Meta's actual implementation. It's not magic. It's a for-loop with a diff tool and a surprisingly short prompt.

This isn't a thought experiment anymore. In March 2026, Meta Research published "HyperAgents" -- a framework where agents read their own source code, identify improvements, generate patches, and update themselves. After hundreds of iterations, these agents independently built persistent memory systems, performance tracking, and modular architecture. Nobody told them to. They figured out they needed those capabilities and built them.

What Meta's HyperAgents actually did

The HyperAgents paper introduces a distinction that matters: task agents solve problems, while meta agents modify the task agent's code and behavior. In HyperAgents, both live in a single editable program. The agent can modify not just its task-solving logic, but also the improvement mechanism itself.

Here's what happened when researchers let this run:

Across four domains -- coding, paper review, robotics reward design, and Olympiad-level math grading -- the agents consistently improved their own performance over time, outperforming both static baselines and earlier self-improving systems.

The transfer finding was the surprise. Improvement strategies learned in one domain (robotics, paper review) transferred successfully to a completely novel domain (Olympiad math grading), scoring imp@50 = 0.630. The agent didn't just get better at one task. It learned how to get better -- and that meta-skill carried over.

Emergent capabilities appeared. Without being instructed to do so, the agents independently invented persistent memory systems and automated performance tracking. The system discovered it needed these capabilities and built them from scratch.

The core limitation of hand-crafted meta-agents, as the paper states: "They can only improve as fast as humans can design and maintain them." HyperAgents removes that bottleneck.

The practical version: a 4-step cycle you can run today

Full HyperAgents-style self-modification is still mostly in the lab. But the underlying pattern -- observe failures, propose improvements, merge approved changes -- is something you can implement right now.

Step 1: Run the task and log failures

def log_failure(task, error, harness_file):
    failure = {
        "task": task,
        "error": error,
        "harness_file": harness_file,
        "timestamp": datetime.now().isoformat(),
    }
    with open("harness/memory/failures.jsonl", "a") as f:
        f.write(json.dumps(failure) + "\n")

Every time the agent fails -- wrong output, crashed tool call, timeout -- log it. Don't try to fix it in real-time. Just collect the data.

Step 2: Analyze failure patterns

At the end of each week, feed the failure log to an LLM and ask for patterns:

def suggest_improvements(failures):
    prompt = f"""
Analyze these agent failure patterns.
Suggest specific constraints to add to AGENTS.md.

Failures:
{json.dumps(failures, indent=2)}

Output format:
- Constraint to add (exact wording)
- Target file (AGENTS.md or skill file name)
- Reasoning
"""
    return llm.generate(prompt)

Step 3: Human review and approval

Monday:  Weekly failure analysis → auto-generate improvement proposals
Tuesday: Human reviews proposals → approve or reject each one
Wednesday: Approved changes merge into AGENTS.md
Thursday onward: Agent operates under improved harness

The human stays in the loop for direction-setting. The agent handles the grunt work of identifying what went wrong and drafting fixes. Think of it as code review where the junior engineer never gets tired and never takes it personally when you reject their PR.

Step 4: Measure and repeat

Track whether the approved changes actually reduce failures. If a new constraint causes more problems than it solves, revert it. The cycle is designed to be self-correcting.

What the agent should never modify

Self-improvement has hard boundaries:

Strategic direction. "Should we pivot from Python to Rust?" is a human decision.
Quality criteria. What counts as "good output" must be human-defined.
Ethical boundaries. What the agent is and isn't allowed to do is not up for self-modification.

Self-improvement means increasing accuracy within an established direction. Changing the direction itself is a human job. You wouldn't want the AI unilaterally deciding "Let's drop our main product line and build something else."

The Hermes Agent result

Meta isn't alone in this space. Hermes Agent v0.10, published in April 2026 and accepted as an ICLR 2026 Oral paper (GEPA framework), demonstrated that self-improving agents can generate 20+ specialized skills autonomously and achieve 40% faster task completion. The mechanism: the agent observes its own execution traces, identifies recurring patterns, and packages them into reusable skills.

This maps directly to how senior engineers work. You don't write the same boilerplate twice. You notice the pattern, extract it into a utility, and move on. The difference is that the agent does this automatically, continuously, at machine speed.

Where this lands in 6 months

Agents that improve their own harnesses will outperform agents that don't, given enough iterations. The gap compounds over time -- each improvement makes the next improvement easier.

If you're running production agents today, you probably already have a version of this cycle -- you just call it "fixing the prompt after it broke in prod at 2 AM." The 4-step approach makes it systematic instead of reactive.

My recommendation: start logging failures today. Even without the auto-improvement loop, a structured failure log is the foundation for every upgrade path -- manual, semi-automated, or fully autonomous.

The self-evolving agent patterns in this article are covered in depth in Harness Engineering: From Using AI to Controlling AI -- including lifecycle management, hooks, feedback loops, and the full framework for building systems that control AI agents.

DEV Community