Suzuki Yuto

Posted on Jul 18

🧠 Kaizen Agent Architecture — How Our AI Agent Improves Other Agents

#ai #aiops #opensource #discuss

At Kaizen Agent, we’re building something meta: an AI agent that automatically tests and improves other AI agents.

Today I want to share the architecture behind Kaizen Agent, and open it up for feedback from the community. If you're building LLM apps, agents, or dev tools—your input would mean a lot.

🧰 Why We Built Kaizen Agent

One of the biggest challenges in developing AI agents and LLM applications is non-determinism.

Even when an agent “works,” it might:

Fail silently with different inputs
Succeed one run but fail the next
Produce inconsistent behavior depending on state, memory, or context

This makes testing, debugging, and improving agents very time-consuming — especially when you need to test changes again and again.

So we built Kaizen Agent to automate this loop: generate tests, run them, analyze the results, fix problems, and repeat — until your agent improves.

🖼 Architecture Diagram

Here’s the system diagram that ties it all together — showing how config, agent logic, and the improvement loop interact:

📊 Note: Due to dev.to's image compression, click here to view the full resolution diagram for better clarity.

⚙️ Core Workflow: The Kaizen Agent Loop

Here are the five core steps our system runs, automatically:

[1] 🧪 Auto-Generate Test Data

Kaizen Agent creates a broad range of test cases based on your config — including edge cases, failure triggers, and boundary conditions.

[2] 🚀 Run All Test Cases

It executes every test on your current agent implementation and collects detailed outcomes.

[3] 📊 Analyze Test Results

We use an LLM-based evaluator to interpret outputs against your YAML-defined success criteria.

It identifies why specific tests failed.
The failed test analysis is stored in long-term memory, helping the system learn from past failures and avoid repeating the same mistakes.

[4] 🛠 Fix Code and Prompts

Kaizen Agent suggests and applies improvements not just to prompts, but also modifies your code:

It may add guardrails or new LLM calls.
It aims to eventually test different agent architectures and automatically compare them to select the best-performing one.

[5] 📤 Make a Pull Request

Once improvements are confirmed (no regressions, better metrics), the system generates a PR with all proposed changes.

This loop continues until your agent is reliably performing as intended.

🙏 What We’d Love Feedback On

We’re still early and experimenting. Your input would help shape this.

👇 We'd love to hear:

What kind of AI agents would you want to test with Kaizen Agent?
What extra features would make this more useful for you?
Are there specific debugging pain points we could solve better?

If you’ve got thoughts, ideas, or feature requests — drop a comment, open an issue, or DM me.

💡 Big Picture

We believe that as AI agents become more complex, testing and iteration tools will become essential.

Kaizen Agent is our attempt to automate the test–analyze–improve loop.

🔗 Links

GitHub: https://github.com/Kaizen-agent/kaizen-agent
Twitter/X: https://x.com/yuto_ai_agent

Top comments (3)

Vida Khoshpey • Jul 18

Really impressive concept 😍🤞🏻 turning agents into learners of each other through a Kaizen-like approach is such a smart way to scale intelligence.

Loved how you merged agent autonomy with continuous improvement. Subscribed to see how this evolves!

Quick question: How do you handle conflicting optimization goals when agents start improving each other? Sounds like a fun chaos to manage 😄😁
Keep going 💪🏻

Suzuki Yuto • Jul 19

@vidakhoshpey22

Thanks for the kind words! 🙏

Right now, it’s a one-way setup — one agent improves another, so they don’t improve each other (yet!). But I’ve definitely thought about using two Kaizen agents to improve each other… probably fun but chaotic 😄

You brought up a great point on conflicting optimization goals — we don’t handle that yet. I’ll likely need to add a validation step to ensure that user-defined goals don’t conflict from the start.

Appreciate you bringing it up!

Vida Khoshpey • Jul 19

Still, it's the best idea.😍🤞🏻 You can manage it in future, I believe in you