At Kaizen Agent, we’re building something meta: an AI agent that automatically tests and improves other AI agents.
Today I want to share the architecture behind Kaizen Agent, and open it up for feedback from the community. If you're building LLM apps, agents, or dev tools—your input would mean a lot.
🧰 Why We Built Kaizen Agent
One of the biggest challenges in developing AI agents and LLM applications is non-determinism.
Even when an agent “works,” it might:
- Fail silently with different inputs
- Succeed one run but fail the next
- Produce inconsistent behavior depending on state, memory, or context
This makes testing, debugging, and improving agents very time-consuming — especially when you need to test changes again and again.
So we built Kaizen Agent to automate this loop: generate tests, run them, analyze the results, fix problems, and repeat — until your agent improves.
🖼 Architecture Diagram
Here’s the system diagram that ties it all together — showing how config, agent logic, and the improvement loop interact:
📊 Note: Due to dev.to's image compression, click here to view the full resolution diagram for better clarity.
⚙️ Core Workflow: The Kaizen Agent Loop
Here are the five core steps our system runs, automatically:
[1] 🧪 Auto-Generate Test Data
Kaizen Agent creates a broad range of test cases based on your config — including edge cases, failure triggers, and boundary conditions.
[2] 🚀 Run All Test Cases
It executes every test on your current agent implementation and collects detailed outcomes.
[3] 📊 Analyze Test Results
We use an LLM-based evaluator to interpret outputs against your YAML-defined success criteria.
- It identifies why specific tests failed.
- The failed test analysis is stored in long-term memory, helping the system learn from past failures and avoid repeating the same mistakes.
[4] 🛠 Fix Code and Prompts
Kaizen Agent suggests and applies improvements not just to prompts, but also modifies your code:
- It may add guardrails or new LLM calls.
- It aims to eventually test different agent architectures and automatically compare them to select the best-performing one.
[5] 📤 Make a Pull Request
Once improvements are confirmed (no regressions, better metrics), the system generates a PR with all proposed changes.
This loop continues until your agent is reliably performing as intended.
🙏 What We’d Love Feedback On
We’re still early and experimenting. Your input would help shape this.
👇 We'd love to hear:
- What kind of AI agents would you want to test with Kaizen Agent?
- What extra features would make this more useful for you?
- Are there specific debugging pain points we could solve better?
If you’ve got thoughts, ideas, or feature requests — drop a comment, open an issue, or DM me.
💡 Big Picture
We believe that as AI agents become more complex, testing and iteration tools will become essential.
Kaizen Agent is our attempt to automate the test–analyze–improve loop.
🔗 Links
- GitHub: https://github.com/Kaizen-agent/kaizen-agent
- Twitter/X: https://x.com/yuto_ai_agent
Top comments (3)
Really impressive concept 😍🤞🏻 turning agents into learners of each other through a Kaizen-like approach is such a smart way to scale intelligence.
Loved how you merged agent autonomy with continuous improvement. Subscribed to see how this evolves!
Quick question: How do you handle conflicting optimization goals when agents start improving each other? Sounds like a fun chaos to manage 😄😁
Keep going 💪🏻
@vidakhoshpey22
Thanks for the kind words! 🙏
Right now, it’s a one-way setup — one agent improves another, so they don’t improve each other (yet!). But I’ve definitely thought about using two Kaizen agents to improve each other… probably fun but chaotic 😄
You brought up a great point on conflicting optimization goals — we don’t handle that yet. I’ll likely need to add a validation step to ensure that user-defined goals don’t conflict from the start.
Appreciate you bringing it up!
Still, it's the best idea.😍🤞🏻 You can manage it in future, I believe in you