AI coding agents are fast. Claude Code, Cursor, Copilot — they can generate hundreds of lines in seconds. But here's the uncomfortable truth we learned after testing five different multi-agent tools across five production projects:
Without governance, speed just amplifies mistakes.
This is the story of how we built TinySDLC — a minimal, open-source agent orchestrator that adds SDLC role discipline to AI coding. 8 roles, structured handoffs, separation of duties, security hardening — all local, zero external dependencies.
The Real Starting Point: A Skeptical Team and a Non-Coding CEO
Before we talk about architecture, let me be honest about where this started.
In May 2025, I had a problem. My development team at MTS was slow to adopt AI coding tools. They were skeptical — and frankly, they had a point. They thought AI-generated code was full of bugs. "More time fixing than coding," they said. They used ChatGPT and Gemini individually for quick prompts, but had no team-wide process.
I'm a CEO. Not a professional software developer — at least, not for the past 30 years. The last time I wrote code was 1994: Assembler and Borland C++ for my graduation thesis, a graphics application for designing electronic circuit boards. Object-oriented programming deeply shaped how I think about systems. Then I moved into management and never looked back. My team knew that. And when I pushed for AI adoption, they were polite but unconvinced: the boss isn't a real software engineer.
I brought in experts. Ran a Claude Code workshop. The team was still slow to change.
So I made a decision: I would learn it myself.
I started with Python. Then free tools — LM Studio, Ollama with Continue.dev. Then paid — GitHub Copilot, Cursor, Claude Code. Small apps first. Then I moved to real enterprise platforms: Bflow (bflow.vn — an ERP+BPM Platform for Vietnamese SMEs, built by my MTS team, launched Oct 2024), then evolving it into Bflow 2.0 (ERP+BPM+AI — Conversation-First with AI as a core pillar), and NQH-Bot (an AI-Powered Workforce Management platform for Vietnamese F&B market, at Nhat Quang Holding — my second startup).
The Crisis: 679 Mocks and 78% Failure
NQH-Bot was an AI-Powered Workforce Management platform for Vietnamese F&B market — auto-scheduling, multi-tenant SaaS, regional compliance — at Nhat Quang Holding, my second startup. We were using AI coding tools heavily. The speed was incredible.
Then we deployed to production.
- 679 out of ~900 implementations were mock code (placeholder
// TODO: implementpatterns) - 78% of production endpoints failed on real traffic
- 6 weeks of debugging to untangle what the AI had generated vs. what was actually working
The AI tools weren't broken. Our process was. We had no gates, no evidence capture, no structured review. The agents generated code, we skimmed it, and we shipped it.
My team's skepticism was validated — but for the wrong reason. The problem wasn't AI. The problem was ungoverned AI.
That crisis gave birth to what we now call the Zero Mock Policy — and eventually, to a complete governance framework.
What We Tried First: Five Multi-Agent Tools
Over the next months, we experimented with different multi-agent orchestration approaches:
| Tool | What It Did | Why It Wasn't Enough |
|---|---|---|
| TinyClaw | @mention-based agent routing | No governance loop, just routing |
| OpenClaw | Lane-based message queue + failover | Great infra, no quality gates |
| NanoBot | Tool-context isolation + shell guards | Security focused, not governance focused |
| PicoClaw | Lightweight single-agent wrapper | Too simple for team workflows |
| ZeroClaw | Output scrubbing + query classification | Post-hoc safety, not pre-hoc governance |
Each tool solved a piece of the puzzle. But none of them answered the fundamental question:
How do you ensure AI-generated code meets quality standards before it enters your codebase?
The Architecture: Role Discipline + Structured Handoffs
TinySDLC's architecture is built on two principles: separation of duties and structured handoffs.
The methodology (MTS-SDLC-Lite) defines the governance loop:
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ Spec │────>│ Gate │────>│ Evidence │────>│ Approval │
│ (Define) │ │ (Validate) │ │ (Capture) │ │ (Sign-off) │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
↑ │
└──────────────────── Feedback loop ────────────────────┘
TinySDLC enforces this loop through role constraints, not automated gates:
Role isolation: Each agent has a defined workspace, tool permissions, and scope. The coder can't approve its own output. The reviewer can't skip the tester. Separation of duties is structural, not optional.
Structured handoffs: Agents communicate through @agent: message mentions. Work flows from researcher → architect → coder → reviewer → tester with explicit handoff points. No silent pass-through.
Event logging: Every agent action is logged as a JSON event with correlation IDs — which agent did what, when, in response to what request. This gives you traceability, not just chat history:
{
"event": "handoff",
"from_role": "coder",
"to_role": "reviewer",
"correlation_id": "conv-a1b2c3",
"action": "submit_for_review",
"timestamp": "2026-02-18T14:32:01Z",
"message": "@reviewer: Auth service implementation ready for review"
}
Human checkpoints: The methodology defines when a human should review. TinySDLC provides the structure; your team provides the judgment.
Important distinction: TinySDLC is a minimal agent orchestrator extracted from a larger internal system. It provides structure and role discipline — real governance with zero infrastructure. It's a complete, standalone tool, not a crippled version of something else.
The 8 Agent Roles
TinySDLC defines 8 specialized roles, each with scoped permissions and responsibilities:
┌──────────────────────────────────────────────────────────┐
│ Governance Layer │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Researcher│ │ PM │ │ PJM │ │ Architect│ │
│ │ (Explore)│ │(Require) │ │ (Track) │ │ (Design) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Coder │ │ Reviewer │ │ Tester │ │ DevOps │ │
│ │(Generate)│ │ (Review) │ │ (Test) │ │ (Deploy) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└──────────────────────────────────────────────────────────┘
Each role has:
- Defined tool permissions (what the agent can access — isolated workspaces)
- System prompt (what the agent's responsibilities are)
- Scope constraints (what the agent is NOT allowed to do — enforced separation of duties)
- Handoff responsibilities (which role receives its output next)
This isn't about restricting AI. It's about giving structure to a multi-agent workflow so that a reviewer can't be bypassed, a coder can't self-approve, and every handoff is explicit.
Key Design Decisions
1. Local-first, zero external dependencies
TinySDLC runs on your machine. File-based queue (incoming → processing → outgoing), no Redis, no Postgres, no cloud services. Install and run in under 5 minutes.
2. Multi-channel from day one
Discord, Telegram, WhatsApp, Zalo — your team works where they already are. Agents respond in the same channel. No context switching.
3. Security hardening built in
This came directly from our ZeroClaw experiments:
- Credential scrubbing: Agent output is scanned for leaked API keys, tokens, passwords before it reaches the channel
-
Environment variable scrubbing:
.envcontents never appear in agent responses - Input sanitization: 12 injection patterns blocked for external channel content
- Shell guards: 8 deny patterns + path traversal detection for any shell operations
4. Role constraints that enforce discipline
A reviewer can't approve their own code. A coder can't skip the review step. A tester can't deploy. These aren't suggestions — they're structural constraints in the agent definitions. This was a hard lesson from the NQH-Bot crisis: when governance is optional, it gets skipped.
The Methodology: MTS-SDLC-Lite
TinySDLC is the tool. But governance needs more than code — it needs a methodology.
MTS-SDLC-Lite is the community edition of our SDLC 6.1.0 framework. It's pure documentation:
- Core concepts: Design Thinking, Systems Thinking, 10-Stage Lifecycle
- Roles and teams: 4 team archetypes for different project sizes
- Playbooks: Step-by-step guides for common workflows
- Templates: Spec templates, gate checklists, evidence formats
- Case studies: Real examples from our production projects
It's tool-agnostic. Use it with Claude, GPT, Copilot, Cursor, or pen and paper. The methodology works regardless of which AI tool you choose.
What We Learned
After 12 iterations of the framework and 5 production projects, here are our key takeaways:
Governance is not overhead — it's insurance. The time spent on gates and evidence capture pays back 10x when something breaks in production and you need to trace the root cause.
AI doesn't need fewer rules — it needs better rules. The agents are eager to follow structure. Give them clear constraints and they produce better output than with vague "be careful" instructions.
Methodology outlives tools. We've switched AI providers three times. The SDLC framework hasn't changed. Invest in your process, not your tool vendor.
Not being an expert can be an advantage. Professional developers have ingrained habits — "this is how we've always done it." As a non-coding CEO, I had no muscle memory to override. No legacy patterns to defend. I was ready to learn whatever was new, because everything was new. Sometimes the beginner's mind sees what the expert's mind filters out. We are always programming in our lives — with AI today, anyone with design thinking, systems thinking, and domain knowledge can quickly experiment and turn ideas into products.
Start minimal. TinySDLC is deliberately small. You don't need a full enterprise platform to start governing AI output. You need a loop: Spec → Gate → Evidence → Approval.
What TinySDLC Does NOT Solve
Transparency matters more than polish. Here's what TinySDLC intentionally does not do:
- It does not guarantee code quality. It structures the workflow — the quality of output still depends on your AI provider and your prompts.
- It does not replace CI/CD or SAST. No automated test execution, no static analysis. Those belong in your pipeline, not your orchestrator.
- It does not eliminate bad architecture decisions. If your spec is wrong, governed agents will build the wrong thing — just more traceably.
- It adds structure, not intelligence. The agents are still AI. TinySDLC constrains how they interact, not what they think.
Governance is a constraint system, not a magic layer. TinySDLC makes multi-agent workflows auditable and disciplined — nothing more, nothing less.
Get Started
# The tool
git clone https://github.com/Minh-Tam-Solution/tinysdlc.git
cd tinysdlc && npm install && npm run build
./tinysdlc.sh start # Interactive setup wizard
# The methodology
git clone https://github.com/Minh-Tam-Solution/MTS-SDLC-Lite.git
Both repos are MIT licensed. Use them, fork them, improve them.
If you're building with AI coding agents and want to talk about governance approaches, find me on LinkedIn or open an issue on GitHub.
AI is fast. Governance must be faster.
— Tai Dang, CEO/Founder MTS & CEO Nhat Quang Holding
Top comments (0)