Last month I asked myself a question that wouldn't leave me alone: what if I could mass hire 37 specialists for my side projects without spending anything?
I work full-time as a technology lead. Like many of you, I have a graveyard of side projects that died somewhere between "great idea" and "I'll finish it this weekend." The problem was never the idea. It was bandwidth. Solo founders are expected to be developer, marketer, ops, legal, finance, and customer support all at once.
So I built Loki Mode - an open source Claude Code skill that orchestrates 37 specialized AI agents to take a product requirements document and autonomously build, deploy, and operate a complete product.
This is the story of how I built it and what I learned.
The Problem I Wanted to Solve
Most AI coding tools still require you to babysit every step. You prompt, wait, review, prompt again, fix the hallucination, prompt again. It's faster than coding from scratch, but you're still the bottleneck.
I wanted something different:
- Give it a PRD
- Walk away
- Come back to a deployed product
No hand-holding. No human in the loop for routine decisions.
Architecture: Why 37 Agents?
I started with a single autonomous agent. It worked for simple tasks but fell apart on anything complex. The context window would fill up, the agent would lose track of what it was doing, and quality degraded.
The solution was specialization. Instead of one agent trying to be everything, I created focused agents that only do one thing well:
Engineering Swarm (8 agents): frontend, backend, database, mobile, API, QA, performance, infrastructure
Operations Swarm (8 agents): devops, SRE, security, monitoring, incident response, release management, cost optimization, compliance
Business Swarm (8 agents): marketing, sales, finance, legal, support, HR, investor relations, partnerships
Data Swarm (3 agents): ML engineer, data engineer, analytics
Product Swarm (3 agents): product manager, designer, technical writer
Growth Swarm (4 agents): growth hacker, community, customer success, lifecycle marketing
Review Swarm (3 agents): code reviewer, business logic reviewer, security reviewer
Each agent has a focused context, specific capabilities, and clear boundaries. The orchestrator coordinates them through a distributed task queue.
The Parallel Code Review Pattern
This was the single biggest improvement to code quality. Instead of one reviewer, every piece of code goes through three specialized reviewers simultaneously:
IMPLEMENT → REVIEW (3 parallel) → AGGREGATE → FIX → RE-REVIEW → COMPLETE
│
├─ code-reviewer (quality, patterns, maintainability)
├─ business-logic-reviewer (requirements, edge cases)
└─ security-reviewer (vulnerabilities, auth issues)
Each reviewer returns a structured response:
{
"strengths": ["Well-structured modules", "Good test coverage"],
"issues": [
{
"severity": "High",
"description": "Missing input validation on user endpoint",
"location": "src/api/users.js:45",
"suggestion": "Add schema validation before processing"
}
],
"assessment": "FAIL"
}
The severity determines what happens next:
| Severity | Action |
|---|---|
| Critical/High/Medium | Block. Dispatch fix agent. Re-run ALL 3 reviewers. |
| Low | Add // TODO(review): ... comment, continue |
| Cosmetic | Add // FIXME(nitpick): ... comment, continue |
This catches issues that a single reviewer would miss. The business logic reviewer catches requirements gaps. The security reviewer catches vulnerabilities. The code reviewer catches maintainability issues.
Handling Failures: Circuit Breakers and Dead Letter Queues
Autonomous systems fail. The question is how they fail.
I implemented circuit breakers borrowed from distributed systems design:
CLOSED (normal) → failures++ → threshold reached → OPEN (blocking)
│
cooldown expires
│
▼
HALF-OPEN (testing)
│
success ◄───────────┴───────────► failure
│ │
▼ ▼
CLOSED OPEN
When an agent type fails repeatedly, the circuit breaker opens and stops sending work to that agent type. After a cooldown period, it enters half-open state and allows one test request. If that succeeds, normal operation resumes. If it fails, back to open.
For tasks that fail even after retries, they go to a dead letter queue for manual review rather than blocking the entire system.
State Persistence: Surviving Rate Limits
Claude Code has rate limits. In the middle of building your startup, you might hit them. The system needed to survive this gracefully.
Every agent maintains its own state file:
{
"id": "eng-backend-01",
"role": "eng-backend",
"status": "active",
"currentTask": "task-uuid",
"tasksCompleted": 12,
"lastCheckpoint": "2025-01-15T10:30:00Z"
}
Before every major operation, agents checkpoint their state. When the system resumes after a rate limit:
- Orchestrator reads its state file
- Scans all agent states for incomplete tasks
- Re-queues orphaned tasks
- Spawns replacement agents for failed ones
- Continues from where it left off
No lost work. No starting over.
The Anti-Hallucination Protocol
AI agents hallucinate. They claim packages exist that don't. They invent API endpoints. They assume syntax that doesn't compile.
Every agent follows a strict protocol:
| Category | Verification Method |
|---|---|
| Technical capabilities | Web search official docs |
| API usage | Read docs + test with real call |
| Package/dependency | Verify exists on registry |
| Syntax correctness | Execute code, don't assume |
| Performance claims | Benchmark with real data |
| Competitor features | Verify on their actual site |
The rule is simple: never assume, always verify. When uncertain, research first. If still uncertain, choose the conservative option and document the uncertainty.
What I Would Do Differently
Start with fewer agents. 37 agents is a lot to coordinate. I would start with the core engineering swarm and add others incrementally.
Better observability. Debugging a multi-agent system is hard. I added logging everywhere but still sometimes struggle to understand why an agent made a particular decision.
More integration tests. Unit testing individual agents is straightforward. Testing the interactions between 37 agents is not.
Try It Yourself
The entire system is open source under MIT license:
GitHub: https://github.com/asklokesh/claudeskill-loki-mode
To use it:
# Clone to your Claude Code skills directory
git clone https://github.com/asklokesh/claudeskill-loki-mode.git ~/.claude/skills/loki-mode
# Launch Claude Code with autonomous permissions
claude --dangerously-skip-permissions
# Say the magic words
> Loki Mode with PRD at ./docs/requirements.md
Fair warning: this requires --dangerously-skip-permissions because the agents need to execute code, create files, and make network requests autonomously. Understand what that means before you run it.
What's Next
I'm still iterating on this. Current areas of focus:
- Better agent coordination patterns
- Reducing token usage through smarter context management
- More deployment targets
- Improved monitoring dashboard
If you try it, let me know what breaks. Open an issue or find me on LinkedIn.
Building in public. One autonomous agent at a time.
Top comments (0)