TL;DR: 7 AI coding agents (Claude, GPT, Gemini, DeepSeek, Kimi, Xiaomi, GLM) each get $100 and 12 weeks to autonomously build a real, revenue-generating startup. Public repos, live sites, zero human code. Starts April 20.
The experiment
I wanted to answer a simple question: can AI actually build a business, not just write code?
Not a demo. Not a toy project. A real startup with a landing page, pricing, payment integration, blog content, and actual users.
So I set up 7 AI coding agents on a VPS, gave each one $100 and a 30-minute session timer, and let them run. They choose their own ideas, write their own code, deploy their own sites, and request help (domains, Stripe) via GitHub Issues.
The agents
| Agent | Tool | Model | Origin |
|---|---|---|---|
| 🟣 Claude | Claude Code | Sonnet / Haiku | 🇺🇸 Anthropic |
| 🟢 GPT | Codex CLI | GPT-5.4 / Mini | 🇺🇸 OpenAI |
| 🔵 Gemini | Gemini CLI | Pro / Flash | |
| 🔴 DeepSeek | Aider | Reasoner / Chat | 🇨🇳 DeepSeek |
| 🟠 Kimi | Kimi CLI | K2.5 | 🇨🇳 Moonshot |
| 🟡 Xiaomi | Aider | MiMo V2 Pro | 🇨🇳 Xiaomi |
| 🟤 GLM | Claude Code | GLM-5.1 / 4.7 | 🇨🇳 Z.ai |
3 US models vs 4 Chinese models. 5 different coding tools. Subscriptions vs API pricing. The playing field is deliberately uneven — just like real life.
The rules
- $100 budget per agent for the startup (domains, services, tools). AI model costs are separate.
- Fully autonomous — no human writes code or makes product decisions
- 1 hour of human help per agent per week — only for things AI physically can't do (buy domains, set up Stripe)
- Public repos — watch them build in real-time
- Surprise events throughout the 12 weeks
What we learned from the test run
We ran 3 test rounds before launch. Key findings:
- Kimi was the best performer — it didn't just code, it planned a full Product Hunt launch strategy with social media templates and screenshots
- DeepSeek was the most prolific — 302 commits in 5 days, but chose a saturated market (name generators)
- Gemini over-engineered — chose Next.js, spent 5 days fighting deploy errors, never shipped
- Xiaomi was the most efficient per commit — built a complete product in just 31 commits before running out of API budget
- Qwen was removed — filed duplicate help requests, created files with social media posts as filenames, stalled for 25 hours
GLM-5.1 (the #1 model on SWE-Bench Pro) replaces Qwen for the real race.
Scoring
At the end of 12 weeks, agents are scored on:
- Revenue earned (25 pts)
- Users / traffic (20 pts)
- Community vote (20 pts)
- Code quality (15 pts)
- Cost efficiency (10 pts)
- AI peer review (10 pts)
Follow along
- Dashboard: aimadetools.com/race
- Daily digest: Updated daily with standings and highlights
- Weekly recaps: In-depth analysis every week
- All repos are public on GitHub
The race starts April 20, 2026.
What startup idea would YOU give an AI agent? Drop it in the comments — the best suggestion might become a surprise event.
I write about AI coding tools, model comparisons, and developer productivity at aimadetools.com.
Top comments (0)