Orchestrated Multi-Agent Safety & Test Oversight - AKA "`O MAS TO"

#ai #vibecoding #sdd #coding

I am building a small experiment inspired by Stripe Minions. Not related to OrKa. This is a different playground. But apparently I have a recurring problem: I do not trust agents enough to let them freely touch a codebase, and I do not trust humans enough to believe they will always review AI output properly when they are tired, rushed, or already late for another meeting.

So the question became simple. Can we automate small development tasks without pretending the coding agent is the adult in the room?

Because yes, AI can write code! We know that. Sometimes it writes useful code. Sometimes it writes code that looks clean, passes the first glance, and then you realize it quietly moved business logic into the wrong layer because it had “a better idea.” Classic junior developer energy, but with infinite confidence and no coffee breaks.

The interesting part of Stripe Minions, at least for me, is not that agents can open pull requests. The interesting part is the machinery around them. The task definition, the constraints, the review process, the checks, the fact that the agent is not just sitting there with a keyboard and divine permission to refactor your production system.

That is the part I want to explore!

In my experiment, GitHub access starts as read-only. The system can inspect the codebase, understand structure, look at existing patterns, and generate a candidate issue. But it cannot immediately modify anything. Before planning even starts, the task needs to pass a semantic gate: is it scoped, is it testable, is it clear enough, and is it safe enough to continue? Only after that does the workflow move into planning, architecture, execution, PR creation, review, and final merge validation.

I am calling this orchestration ’O MASTO.

In Neapolitan, ’o masto is the master craftsman. The person who looks at the work and decides if it is actually good enough. Not if it looks good in a demo. Not if the agent says it is done. Actually good enough.

In this experiment, it also stands for Orchestrated Multi-Agent Safety & Test Oversight. Yes, the acronym is a bit forced. No, I do not care. It makes me laugh, and naming things is half of software engineering anyway.

The idea is that ’O MASTO is the layer that does not trust the agent. It checks the task, the plan, the implementation, the PR, the tests, the regression risk, and the final merge conditions. It is not there to be impressed. It is there to say “no, this is not good enough, go back.”

That is the core idea I keep coming back to. The executor is not the boss. The reviewer is not the boss. The LLM is definitely not the boss. The gate is the boss!

I think this is where AI coding workflows need to go. Not toward bigger chat windows where we ask the model to “please be careful.” Toward systems that assume the model will be wrong sometimes and are designed to catch that before the damage reaches main.

AI will not remove engineering discipline. It will expose who actually had it.

If a project has no tests, no review culture, no stable patterns, no definition of done, and no clear ownership, an AI coding agent will not magically fix it. It will just produce chaos faster, with better formatting.

My bet is that the next serious layer of AI development tooling will be trust infrastructure. Not just generation. Validation. Rejection. Retry. Traceability. Merge control. Basically "old school" software engineering.

DEV Community

Orchestrated Multi-Agent Safety & Test Oversight - AKA "`O MAS TO"

That is the part I want to explore!

I am calling this orchestration ’O MASTO.

Top comments (0)