DEV Community

Romeo Ricci
Romeo Ricci

Posted on

I built an autonomous coding agent that argues with itself before touching your code

Most AI coding tools have the same fundamental problem: they start writing code immediately. No verification, no pushback, no evidence. Just vibes.
I spent 6 months building something different. MAESTRO runs an adversarial debate between two frontier AI models before executing anything.
How it works

You describe a task
Two AI models — GPT-5.4 as proposer, Claude Sonnet as challenger — debate the approach. Each reads your actual files using a structural knowledge graph (Memgraph + Tree-sitter AST parsing). Each cites evidence. Each challenges the other's assumptions.
The debate produces a specification: exact files to modify, constraints, test expectations, risk level
You read it and approve it
Only then does execution begin — inside an isolated Firecracker microVM
Nothing commits until it passes 16 deterministic safety checks and a structured review

The stack

FastAPI backend, Temporal for workflow orchestration
Firecracker microVMs for isolated execution
Memgraph + Tree-sitter for structural code graph (AST-derived, deterministic)
FAISS + HyDE for semantic search
PostgreSQL for episodic memory — MAESTRO gets smarter on a project the more it works on it
Claude executor: Haiku for simple tasks, Sonnet for medium, Opus for complex

Benchmark results
95+% success rate on 25 missions across Python backend tasks. When it succeeds, it scores 95-100/100. Average mission time: 4 minutes for simple tasks, 7 minutes for complex ones.
The 13% failure rate is mostly false positives in safety checks and transient API issues — actively working on pushing to 95%+.
What makes it different from just using Claude Code directly
The PM-Agent debate is the moat. Claude Code has no challenger. It has no structural understanding of your codebase before proposing changes. It has no episodic memory of what failed last time on your project.
MAESTRO knows your codebase from the AST, not from a README. It remembers every mission. It argues with itself before touching anything.
Looking for beta testers
I'm looking for 10 senior developers or CTOs at startups working on Python or TypeScript backends. Free trial, direct access from me.
Sign up at maestro-orchestrator.dev
Happy to answer questions in the comments.

Top comments (0)