Lokesh Mure

Posted on Dec 26, 2025

How I Built an Autonomous AI Startup System with 37 Agents Using Claude Code

#programming #productivity #ai #opensource

Last month I asked myself a question that wouldn't leave me alone: what if I could mass hire 37 specialists for my side projects without spending anything?

I work full-time as a technology lead. Like many of you, I have a graveyard of side projects that died somewhere between "great idea" and "I'll finish it this weekend." The problem was never the idea. It was bandwidth. Solo founders are expected to be developer, marketer, ops, legal, finance, and customer support all at once.

So I built Loki Mode - an open source Claude Code skill that orchestrates 37 specialized AI agents to take a product requirements document and autonomously build, deploy, and operate a complete product.

This is the story of how I built it and what I learned.

The Problem I Wanted to Solve

Most AI coding tools still require you to babysit every step. You prompt, wait, review, prompt again, fix the hallucination, prompt again. It's faster than coding from scratch, but you're still the bottleneck.

I wanted something different:

Give it a PRD
Walk away
Come back to a deployed product

No hand-holding. No human in the loop for routine decisions.

Architecture: Why 37 Agents?

I started with a single autonomous agent. It worked for simple tasks but fell apart on anything complex. The context window would fill up, the agent would lose track of what it was doing, and quality degraded.

The solution was specialization. Instead of one agent trying to be everything, I created focused agents that only do one thing well:

Engineering Swarm (8 agents): frontend, backend, database, mobile, API, QA, performance, infrastructure

Operations Swarm (8 agents): devops, SRE, security, monitoring, incident response, release management, cost optimization, compliance

Business Swarm (8 agents): marketing, sales, finance, legal, support, HR, investor relations, partnerships

Data Swarm (3 agents): ML engineer, data engineer, analytics

Product Swarm (3 agents): product manager, designer, technical writer

Growth Swarm (4 agents): growth hacker, community, customer success, lifecycle marketing

Review Swarm (3 agents): code reviewer, business logic reviewer, security reviewer

Each agent has a focused context, specific capabilities, and clear boundaries. The orchestrator coordinates them through a distributed task queue.

The Parallel Code Review Pattern

This was the single biggest improvement to code quality. Instead of one reviewer, every piece of code goes through three specialized reviewers simultaneously:

IMPLEMENT → REVIEW (3 parallel) → AGGREGATE → FIX → RE-REVIEW → COMPLETE
                │
                ├─ code-reviewer (quality, patterns, maintainability)
                ├─ business-logic-reviewer (requirements, edge cases)
                └─ security-reviewer (vulnerabilities, auth issues)

Each reviewer returns a structured response:

{
  "strengths": ["Well-structured modules", "Good test coverage"],
  "issues": [
    {
      "severity": "High",
      "description": "Missing input validation on user endpoint",
      "location": "src/api/users.js:45",
      "suggestion": "Add schema validation before processing"
    }
  ],
  "assessment": "FAIL"
}

The severity determines what happens next:

Severity	Action
Critical/High/Medium	Block. Dispatch fix agent. Re-run ALL 3 reviewers.
Low	Add `// TODO(review): ...` comment, continue
Cosmetic	Add `// FIXME(nitpick): ...` comment, continue

This catches issues that a single reviewer would miss. The business logic reviewer catches requirements gaps. The security reviewer catches vulnerabilities. The code reviewer catches maintainability issues.

Handling Failures: Circuit Breakers and Dead Letter Queues

Autonomous systems fail. The question is how they fail.

I implemented circuit breakers borrowed from distributed systems design:

CLOSED (normal) → failures++ → threshold reached → OPEN (blocking)
                                                        │
                                                   cooldown expires
                                                        │
                                                        ▼
                                                  HALF-OPEN (testing)
                                                        │
                                    success ◄───────────┴───────────► failure
                                       │                                  │
                                       ▼                                  ▼
                                    CLOSED                              OPEN

When an agent type fails repeatedly, the circuit breaker opens and stops sending work to that agent type. After a cooldown period, it enters half-open state and allows one test request. If that succeeds, normal operation resumes. If it fails, back to open.

For tasks that fail even after retries, they go to a dead letter queue for manual review rather than blocking the entire system.

State Persistence: Surviving Rate Limits

Claude Code has rate limits. In the middle of building your startup, you might hit them. The system needed to survive this gracefully.

Every agent maintains its own state file:

{
  "id": "eng-backend-01",
  "role": "eng-backend",
  "status": "active",
  "currentTask": "task-uuid",
  "tasksCompleted": 12,
  "lastCheckpoint": "2025-01-15T10:30:00Z"
}

Before every major operation, agents checkpoint their state. When the system resumes after a rate limit:

Orchestrator reads its state file
Scans all agent states for incomplete tasks
Re-queues orphaned tasks
Spawns replacement agents for failed ones
Continues from where it left off

No lost work. No starting over.

The Anti-Hallucination Protocol

AI agents hallucinate. They claim packages exist that don't. They invent API endpoints. They assume syntax that doesn't compile.

Every agent follows a strict protocol:

Category	Verification Method
Technical capabilities	Web search official docs
API usage	Read docs + test with real call
Package/dependency	Verify exists on registry
Syntax correctness	Execute code, don't assume
Performance claims	Benchmark with real data
Competitor features	Verify on their actual site

The rule is simple: never assume, always verify. When uncertain, research first. If still uncertain, choose the conservative option and document the uncertainty.

What I Would Do Differently

Start with fewer agents. 37 agents is a lot to coordinate. I would start with the core engineering swarm and add others incrementally.

Better observability. Debugging a multi-agent system is hard. I added logging everywhere but still sometimes struggle to understand why an agent made a particular decision.

More integration tests. Unit testing individual agents is straightforward. Testing the interactions between 37 agents is not.

Try It Yourself

The entire system is open source under MIT license:

GitHub: https://github.com/asklokesh/claudeskill-loki-mode

To use it:

# Clone to your Claude Code skills directory
git clone https://github.com/asklokesh/claudeskill-loki-mode.git ~/.claude/skills/loki-mode

# Launch Claude Code with autonomous permissions
claude --dangerously-skip-permissions

# Say the magic words
> Loki Mode with PRD at ./docs/requirements.md

Fair warning: this requires --dangerously-skip-permissions because the agents need to execute code, create files, and make network requests autonomously. Understand what that means before you run it.

What's Next

I'm still iterating on this. Current areas of focus:

Better agent coordination patterns
Reducing token usage through smarter context management
More deployment targets
Improved monitoring dashboard

If you try it, let me know what breaks. Open an issue or find me on LinkedIn.

Building in public. One autonomous agent at a time.

DEV Community