innermost47

Posted on Feb 19

I Built a Thing That Builds Things: Tres Comas Scrum

#agents #ai #llm #python

A few days ago I started wondering what would happen if you gave three LLM agents a Jira board and told them to figure it out. The answer is: chaos, then surprisingly coherent Python code, then more chaos.

This is the story of Tres Comas Scrum — a project where a CEO agent, a Coder agent, and a Tester agent run actual Scrum sprints to build an agentic framework from scratch. No human in the loop. Just vibes and OpenRouter API calls.

The Setup

The architecture is dead simple:

CEO holds the product backlog, plans sprints, reviews deliveries and decides when the project is "done"
Coder receives a ticket, reads the existing codebase, and delivers code in XML format
Tester wakes up every 4 sprints, inspects what was built, and generates new stories based on what's broken or missing

Each delivery gets tested in an isolated sandbox (bwrap on Linux) before being written to disk. If tests fail, the Coder gets the error and tries again — up to 3 times.

That's it. No orchestration framework, no LangChain, no AutoGen. Just a SQLite database, a few hundred lines of Python, and a lot of time.sleep(4) to stay under the rate limit.

What Went Wrong (A Highlight Reel)

Oh where to start.

The Coder kept writing `python inside XML tags. Every. Single. Sprint. We added a regex to strip markdown backticks. He kept adding them. We added it to the system prompt with a visual example. He still did it sometimes. At this point I think it's personal.

code.py is a reserved name. Our test runner was named code.py and Python kept importing the stdlib code module instead. Renamed it to test_runner.py. Two hours of debugging for a filename change.

sys.exit() makes bwrap angry. The sandbox intercepted pytest's exit call and threw a fit. Removed the sys.exit() wrapper. Solved instantly.

The Coder delivered 0 files and the tests passed. This one was beautiful in a terrible way. No delivery = no test files = the runner returned success: True. Added a check: no files = failure. Added a check: no test files = failure. The Coder has been slightly more motivated since then.

argparse isn't in the stdlib imports list. Our system strips external imports before inlining code into the test runner. Turns out we forgot about half of the stdlib. Added argparse, subprocess, random, zipfile, gzip, statistics, decimal... the list kept growing.

What Went Right

The CEO figured out the right build order on its own. It started with foundations (Agent, LLM, Memory, Tools), then the run loop, then inter-agent communication, then the tools library, then docs. That's exactly what a senior developer would do. Nobody told it to do that.

The Tester's feedback at Sprint 8 was genuinely useful:

"Tests pass but you can't actually use the framework in practice. The bus pattern is implemented but agents don't use it. No CLI, no examples, no error handling."

The CEO turned that into 6 actionable stories. They were all fixed by Sprint 12.

By Sprint 12, the generated framework had:

An Agent class with run() and run_autonomous()
A MessageBus with pub/sub between agents
Persistent Memory with search and tagging
A Tools registry with 16 pre-configured tools
Error handling in the LLM provider
A ConfigManager that loads agents from YAML
A README

Not bad for something that started with zero lines of code.

The Meta Thing

The whole point of this experiment was to see if agents could build the very framework you'd use to build agents. They mostly did. The generated code has Pydantic V2 warnings everywhere and Message doesn't extend BaseModel so model_dump() will crash in production — but the architecture is sound.

The next step is to use that framework to rewrite this builder — agents orchestrated by their own creation, running on Ollama locally with qwen3:8b. Turtles all the way down.

Try It

The project is on GitHub: Tres Comas Scrum

You'll need Linux (bwrap for sandboxing), Python 3.10+, and an OpenRouter API key. The free tier is enough to run a full build — just set the model to arcee-ai/trinity-large-preview:free and wait.

Fair warning: it will take a few hours, the Coder will occasionally deliver empty XML, and you will at some point stare at an IndentationError that makes no sense. That's part of the experience.

Named after the Tres Comas tequila from Silicon Valley. Three agents, three commas. Russ Hanneman would not be impressed, but he'd probably try to invest anyway.

DEV Community