This is a submission for the OpenClaw Challenge.
What I Built
I built ClawForge โ a multi-agent orchestration system that turns a single natural language prompt into a fully deployed production application. No human intervention after the initial message. No manual code review. No hand-written tests. Just one sentence and five AI agents handle the rest.
The pipeline looks like this:
You: "Build me a URL shortener with click analytics"
โ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ ๐ง Architect Agent โ Plans stack, schema, endpoints
โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ ๐ป Coder Agent โ Implements full project (11 files)
โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ ๐ Reviewer Agent โ Security audit, code quality gate
โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ ๐งช Tester Agent โ Writes & runs test suite
โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ ๐ Deployer Agent โ Git init, GitHub push, deploy
โโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
Live URL + Repo
The first project it built โ a URL shortener with click analytics โ went from zero to a live deployed app in about 20 minutes. One Telegram message. That's the entire human input.
Live demo: URL Shortener
Source code: GitHub โ mamoor123/clawforge
How I Used OpenClaw
ClawForge is built as an OpenClaw skill โ a drop-in module that extends what your AI agent can do. Here's how the pieces fit together:
The Skill Layer
The orchestrator lives in skills/clawforge/SKILL.md. When OpenClaw sees a trigger phrase like "Build me..." or "ClawForge: [description]", it activates the pipeline. The skill file tells the agent how to coordinate โ it doesn't write code itself. It's a conductor, not a musician.
Five Specialized Agents
Each agent has its own IDENTITY.md file โ a focused persona with a checklist:
| Agent | What It Does | Output |
|---|---|---|
| ๐ง Architect | Stack selection, schema design, API planning | Full plan in clawforge-state.json
|
| ๐ป Coder | Implements everything in TypeScript | 11 production-ready source files |
| ๐ Reviewer | Security audit, code quality, best practices | Pass/fail with specific issues |
| ๐งช Tester | Writes and runs a test suite | 10 tests, results + coverage |
| ๐ Deployer | Git setup, GitHub push, live deployment | Repo URL + live URL |
Shared State Communication
All agents communicate through a single JSON file (clawforge-state.json). Each stage reads the previous agent's output and writes its own results. The orchestrator reads the state to decide what happens next โ including whether to loop back and retry on failure.
# The state manager that ties everything together
bash clawforge.sh init "Build a URL shortener"
bash clawforge.sh stage architect plan_complete
bash clawforge.sh review-result true ""
bash clawforge.sh test-result 10 10 0
bash clawforge.sh deploy-result https://live-url.com https://github.com/...
The Retry Loop
Here's where it gets interesting. If the Reviewer finds issues, the pipeline loops back to the Coder with specific fix instructions. Same with tests โ if tests fail, the Coder gets the failure output and fixes the code. Up to 2 retries per stage. This isn't just "generate and pray" โ there's a real feedback loop.
initialized โ architect โ coder โ reviewer โ tester โ deployer โ complete
โ โ
โโโโโโโโโโโ (retry on failure, max 2x)
Demo
Here's what happened when I sent a single Telegram message:
"Build me a URL shortener with click analytics"
Architect output: Planned a TypeScript + Express + SQLite stack with nanoid for short codes. Defined 5 API endpoints, database schema, and the full file structure.
Coder output: 11 files โ server, database setup, route handlers (shorten, redirect, analytics), frontend with dark UI, and tests. All in TypeScript with strict mode.
Reviewer output: Found 21 issues across security, correctness, and architecture. 6 high-severity bugs including a heredoc command injection vulnerability, JSON injection via newlines, and a bug where an empty field path would destroy the entire state file. The pipeline looped back to fix them.
Tester output: 10 tests, all passing. Covers URL creation, validation, redirects, click tracking, and error cases.
Deployer output: Git initialized, pushed to GitHub, deployed to a live URL.
Final result:
- ๐ข Live URL Shortener โ fully functional
- ๐ข GitHub Repository โ complete source
- ๐ข Audit Report โ 21 issues documented
The URL shortener supports:
- Shorten any URL with 7-character codes
- Custom short codes for branded links
- Click tracking (referrer, user agent, IP, timestamp)
- Analytics dashboard with daily clicks and top referrers
- Full REST API with pagination
What I Learned
The Reviewer Agent Was the MVP
I expected the Coder to be the star. It wasn't. The Reviewer agent found real bugs that would have been production nightmares:
-
Command injection via heredoc โ the
initfunction used an unquoted heredoc, meaning$(rm -rf /)in user input would execute. The Reviewer caught this. -
State destruction bug โ passing an empty field path to the
updatecommand would turn the entire state file into a single string. Confirmed it โ the state file literally becomes"boom". Gone. -
Schema mismatch โ the
initcommand andplan.shscript initialized the same fields with different types (empty objects vs null). This would crash downstream jq queries.
These aren't hypothetical. They're bugs that a human code reviewer could easily miss.
AI Agents Need Guardrails, Not Just Prompts
The initial pipeline had no retry mechanism. The first run produced code that the Reviewer flagged, and without a loop, it would have shipped broken software. Adding the retry logic (max 2 attempts per stage) transformed the system from "generate once" to "iterate until quality gate passes."
Lesson: don't just prompt an AI to build something. Give it a feedback loop. The Reviewer-Tester-Coder cycle is what makes ClawForge actually produce reliable output.
The Architecture Was the Hardest Part
Getting five agents to coordinate through a shared state file sounds simple. It wasn't. The state schema needed to be consistent across all agents, error handling needed to work across stage boundaries, and the orchestrator needed to handle edge cases like "what if the Reviewer and Tester both fail on the first try?"
The audit report revealed issues I didn't anticipate โ like the lack of file locking when multiple agents could theoretically write to the state file simultaneously, or the fact that runtime state files were being committed to git.
What Surprised Me Most
The Coder agent generated a dark-themed responsive frontend without being explicitly told to. The prompt was "URL shortener with click analytics" โ nothing about UI design. But the Architect planned a frontend, and the Coder delivered a clean dark UI with copy-to-clipboard functionality. The agents made design decisions I wouldn't have thought to specify.
Also surprising: the entire pipeline ran in about 20 minutes. From a cold start to a live deployed app. That's faster than most human developers can scaffold a project.
ClawForge is open source under MIT. Built on OpenClaw. Try it yourself โ just say "Build me..." and see what happens.
Top comments (0)