Why AI-Generated Code Needs a Quality Layer
When we started RLabs, we believed in a simple promise: let AI agents write code faster. But we quickly discovered a harder truth—speed without structure creates disasters.
I've watched Claude and GPT produce dozens of lines of plausible-looking code that falls apart the moment you try to run it. Hallucinated imports. Inconsistent architecture. Missing error handling. Functions that assume globals exist. Async/await patterns that deadlock. It's not the models' fault. They're trained to continue patterns, not to architect systems.
The first instinct was predictable: "Let's just prompt harder." Ask the agent to use a specific pattern. Add examples to the context. Tell it to validate its own output. Some of this helps, but it doesn't scale. Every project needs slightly different rules. Every team has different conventions. And most critically—checking your own work is not the same as building with constraint.
We built AgentGuard to solve this.
The Problem Isn't the Agent. It's the Contract.
When developers write code without structure, we catch mistakes in code review. When agents write code, we don't have that. So we decided: lock down the structure before the agent even starts typing.
AgentGuard is an MCP (Model Context Protocol) server that guides AI agents through a five-step refinement process:
- Skeleton — Agent creates the file structure and function signatures without implementation. Catches architectural mistakes early.
- Contracts and Wiring — Agent specifies types, dependencies, and interfaces. You verify they match your system before any logic exists.
- Logic — Only now does the agent implement the actual code. But it's constrained by the contracts above—no surprise dependencies, no weird globals, no architectural detours.
- Challenge Criteria — Agent generates self-review criteria against quality standards. Built-in validation before a single test runs.
- Validate — Final check: syntax, lint, types, imports. Does the structure match the contract?
All of this runs locally, without calling an LLM server. AgentGuard doesn't generate code. It structures the work.
Archetypes: Reusable Quality Patterns
Not every team needs the same structure. A hexagonal architecture API looks completely different from an ADR documentation system, which looks different from a Go-to-Market campaign builder.
We created archetypes (architectural templates) for this. An archetype is a set of templates + prompts that define how each of the five steps should work for your specific context.
We ship 61+ archetypes on our marketplace at agentguard.rlabs.cl/marketplace. Hexagonal API. ADR. Event-driven systems. FastAPI backend. CQRS patterns. Each one encodes architectural knowledge into a repeatable pattern your agent can follow consistently.
When you run AgentGuard with an archetype, the agent doesn't see generic instructions. It sees the exact structure your team cares about. The validation it runs matches your conventions.
Real Numbers, Honest State
We've been live for a few months:
- 878+ downloads/month on PyPI
- 9 stars on GitHub (we're not a viral hit, and we're okay with that)
- Used by teams running production workloads on the five-step system
- In active development — improvements to the refinement process every sprint
This isn't a wrapper around ChatGPT. AgentGuard doesn't touch your API keys or call external models. It runs locally. It works with Claude, GPT, Cline, or any agent that speaks MCP.
The Next Step
If you've felt friction between "AI-generated code" and "production-ready code," AgentGuard was built for that gap.
pip install rlabs-agentguard
No API keys. No setup ceremony. Then explore archetypes for your use case at agentguard.rlabs.cl/marketplace.
We built this because we got tired of code that looked right until it didn't. Structure matters more than speed.
Top comments (0)