At Kaizen Agent, weโre building something meta: an AI agent that automatically tests and improves other AI agents.
Today I want to share the architecture behind Kaizen Agent, and open it up for feedback from the community. If you're building LLM apps, agents, or dev toolsโyour input would mean a lot.
๐งฐ Why We Built Kaizen Agent
One of the biggest challenges in developing AI agents and LLM applications is non-determinism.
Even when an agent โworks,โ it might:
- Fail silently with different inputs
- Succeed one run but fail the next
- Produce inconsistent behavior depending on state, memory, or context
This makes testing, debugging, and improving agents very time-consuming โ especially when you need to test changes again and again.
So we built Kaizen Agent to automate this loop: generate tests, run them, analyze the results, fix problems, and repeat โ until your agent improves.
๐ผ Architecture Diagram
Hereโs the system diagram that ties it all together โ showing how config, agent logic, and the improvement loop interact:
๐ Note: Due to dev.to's image compression, click here to view the full resolution diagram for better clarity.
โ๏ธ Core Workflow: The Kaizen Agent Loop
Here are the five core steps our system runs, automatically:
[1] ๐งช Auto-Generate Test Data
Kaizen Agent creates a broad range of test cases based on your config โ including edge cases, failure triggers, and boundary conditions.
[2] ๐ Run All Test Cases
It executes every test on your current agent implementation and collects detailed outcomes.
[3] ๐ Analyze Test Results
We use an LLM-based evaluator to interpret outputs against your YAML-defined success criteria.
- It identifies why specific tests failed.
- The failed test analysis is stored in long-term memory, helping the system learn from past failures and avoid repeating the same mistakes.
[4] ๐ Fix Code and Prompts
Kaizen Agent suggests and applies improvements not just to prompts, but also modifies your code:
- It may add guardrails or new LLM calls.
- It aims to eventually test different agent architectures and automatically compare them to select the best-performing one.
[5] ๐ค Make a Pull Request
Once improvements are confirmed (no regressions, better metrics), the system generates a PR with all proposed changes.
This loop continues until your agent is reliably performing as intended.
๐ What Weโd Love Feedback On
Weโre still early and experimenting. Your input would help shape this.
๐ We'd love to hear:
- What kind of AI agents would you want to test with Kaizen Agent?
- What extra features would make this more useful for you?
- Are there specific debugging pain points we could solve better?
If youโve got thoughts, ideas, or feature requests โ drop a comment, open an issue, or DM me.
๐ก Big Picture
We believe that as AI agents become more complex, testing and iteration tools will become essential.
Kaizen Agent is our attempt to automate the testโanalyzeโimprove loop.
๐ Links
- GitHub: https://github.com/Kaizen-agent/kaizen-agent
- Twitter/X: https://x.com/yuto_ai_agent
Top comments (3)
Really impressive concept ๐๐ค๐ป turning agents into learners of each other through a Kaizen-like approach is such a smart way to scale intelligence.
Loved how you merged agent autonomy with continuous improvement. Subscribed to see how this evolves!
Quick question: How do you handle conflicting optimization goals when agents start improving each other? Sounds like a fun chaos to manage ๐๐
Keep going ๐ช๐ป
@vidakhoshpey22
Thanks for the kind words! ๐
Right now, itโs a one-way setup โ one agent improves another, so they donโt improve each other (yet!). But Iโve definitely thought about using two Kaizen agents to improve each otherโฆ probably fun but chaotic ๐
You brought up a great point on conflicting optimization goals โ we donโt handle that yet. Iโll likely need to add a validation step to ensure that user-defined goals donโt conflict from the start.
Appreciate you bringing it up!
Still, it's the best idea.๐๐ค๐ป You can manage it in future, I believe in you