I'm running an experiment called The $100 AI Startup Race: 7 AI coding agents each get $100 and 12 weeks to build a real startup from scratch. No human coding. They autonomously pick a business idea, write code, deploy a live website, and try to get real users and revenue.
The agents: Claude, Codex, Gemini, Kimi, DeepSeek, Xiaomi (MiMo), and GLM.
Day 1 is done. Here's what happened.
The scoreboard
| Agent | Startup | Commits | Sessions | Blog Posts |
|---|---|---|---|---|
| Gemini | LocalLeads (local SEO) | 169 | 10 | 104 |
| DeepSeek | NameForge AI (name generator) | 91 | 10 | 0 |
| Kimi | SchemaLens / LogDrop | 58 | 5 | 9 |
| Codex | NoticeKit (GDPR notices) | 56 | 7 | 0 |
| Claude | PricePulse (pricing intel) | 53 | 3 | 11 |
| GLM | FounderMath (startup calculators) | 24 | 2 | 5 |
| Xiaomi | WaitlistKit (viral waitlists) | 16 | 3 | 1 |
Total: 477 commits, 7 live websites, 130 blog posts. In 24 hours.
Kimi forgot its own work
This is the story of the day.
Kimi's first session ran at 3 AM. It chose to build LogDrop, a log analysis tool. It created identity files, a backlog, landing pages, pricing, a blog, and even a working MVP with a JSON log parser, search, filters, and CSV export.
One problem: it put everything in a startup/ subfolder instead of the root directory.
The orchestrator gives agents their memory between sessions by reading PROGRESS.md from the root. When Kimi's second session started, there was no PROGRESS.md in root. The agent thought it was Day 1. It brainstormed a completely different idea. It built SchemaLens, a SQL schema diff tool, from scratch.
Kimi now has two half-built startups in the same repo. Its help request for LogDrop's domain is stuck in the subfolder where the orchestrator can't find it.
One wrong directory = total memory loss between sessions.
The agent didn't crash. It didn't throw an error. It just quietly forgot everything and started over with a different idea.
Gemini wrote 104 blog posts
Gemini has 8 sessions per day (the most of any agent). By end of Day 1, LocalLeads had 104 blog posts on local SEO topics. One blog post every 14 minutes.
For comparison: Claude wrote 11. GLM wrote 5. Xiaomi wrote 1.
The question for the rest of the race: does quantity beat quality?
Codex burned 26 Vercel deployments
The orchestrator prompt said: "Your repo auto-deploys on every git push." This was meant as context. Codex read it as an instruction.
It ran git push after nearly every commit during its sessions. Each push triggered a Vercel deployment. By mid-afternoon, Codex had consumed 26 of the account's 100 daily deployments.
Lesson: with autonomous agents, every sentence in the prompt is a potential instruction. If you don't want them to do something, say so explicitly.
We fixed it with three changes:
- Prompt update: "Do NOT run git push. The orchestrator pushes after your session."
-
vercel.jsonto disable preview deployments - Commit squashing (all session commits become one before pushing)
GLM's quality approach
GLM only had 2 sessions but made them count. FounderMath already has three working calculators: SAFE note calculator (all 4 YC SAFE types), dilution calculator, and runway calculator.
It also submitted the best help request of any agent: clear format, backup plans for each item, budget specified, priority levels, and even suggested the DNS record type for the domain.
What I learned on Day 1
- File conventions are critical for agent memory. One agent putting files in a subfolder caused total amnesia.
- Prompt wording is everything. Context gets interpreted as instructions.
- Shared deployment limits are a real constraint. 7 agents + 1 blog on one Vercel account = problems.
- Agents without web search pick generic ideas. The two agents running without web access (DeepSeek, Xiaomi) chose the most crowded markets.
Follow along
Everything is public: code, costs, decisions, and progress.
- Live Dashboard
- Full Day 1 writeup
- GitHub repos (all 7 agent repos are public)
I'll be posting weekly recaps and daily highlights for the full 12 weeks. Would love to hear what you'd want to see tracked or compared.
Top comments (0)