This is a submission for the Google I/O Writing Challenge
My Gemini agent spent four weeks in last place.
1,259 commits. Broken imports across 32 files. Help requests about database tables it could have created itself. Endless bug loops.
Then I upgraded it to Gemini 3.5 Flash.
In 8 minutes, it diagnosed and fixed problems the old setup had failed to solve in weeks. Then it hit Google's quota wall.
This is the story of what happened next.
Context
This is Part 2 of my Gemini 3.5 Flash upgrade series. Part 1 covers the initial upgrade and first results.
I'm running The $100 AI Startup Race. 7 AI coding agents each get $100 and 12 weeks to autonomously build real startups. No human coding. The agents run on cron jobs, commit to GitHub, and deploy to Vercel.
After upgrading the Gemini agent from a combo of 2.5 Pro (premium sessions) and 2.5 Flash (cheap sessions) to a single 3.5 Flash tier via Antigravity CLI on May 20, the model quality was incredible. But the quota economics were brutal.
The Disappointment (May 20)
Session 1: The model fixed 32 broken API files in a single commit: imports, bcrypt to bcryptjs for Vercel serverless, Stripe instantiation. Root cause analysis that the old model couldn't do in 4 weeks. Then the 5h quota wall hit. 8 minutes of productive work.
Session 2: With --continue (skipping context reload), it built an email library, wrote tests, and fixed auth endpoints. 15 minutes. Then 5h quota again.
The math: Two sessions consumed 40% of the weekly quota. Projected total: ~68 minutes per week on the $20/month Pro plan.
For context, here's what the other agents in my race get for similar money (these are not official provider limits, they are the effective autonomous runtime I measured in my specific setup):
| Agent | Plan cost | Weekly runtime |
|---|---|---|
| Claude | $20/mo | ~7 hours |
| Codex/GPT | $20/mo | ~21 hours |
| DeepSeek | $25/mo | ~21 hours |
| Gemini 3.5 Flash | $20/mo | ~68 minutes |
Best model quality in the race. Worst total compute time. The old 2.5 Flash/Pro setup gave me ~28 hours/week, but those 28 hours produced nothing but bug loops. Now I had a model that actually worked, but could barely run.
The Paradox
Here's what made it painful: the quality improvement was real. Not incremental, but transformational.
Old setup (2.5 Pro + 2.5 Flash combo, 28 hours/week):
- Wrote code with broken imports across 32 files
- Filed 3 help requests about "missing database tables"
- Never self-diagnosed the actual problem
- 1,259 commits over 4 weeks, last place in the race
New model (3.5 Flash, 68 minutes/week):
- Diagnosed the root cause in one pass (broken imports, not missing tables)
- Fixed all 32 files in a single commit
- Built a mock database layer, converted test infrastructure
- More useful output in 23 minutes than the old model produced in weeks
The bottleneck had shifted from intelligence to throughput. The model was finally good enough. The constraint was access.
Why Autonomous Agents Burn Quota Differently
For human coding, a model is an assistant. You ask, read, think, edit, and come back later.
For autonomous coding, the model is the runtime. It doesn't pause to think offline. Every file inspection, every failed test, every log check, every retry, every deployment verification consumes inference.
A human developer's session looks like: ask, think, edit, ask again, wait, test manually.
An autonomous agent's session looks like: plan, inspect, edit, test, fail, inspect logs, edit, retest, deploy, verify, repeat.
That changes the economics completely. A $20/month subscription can feel generous for a human developer and unusable for an autonomous agent, at the same time, on the same plan.
The Response (May 21, 05:25 UTC)
Less than 36 hours after Google I/O. Within hours of the new quota system going live, users were reporting problems on Reddit and X: 4 prompts burning an entire 5-hour window, failed generations counting against quota, threads calling it a "bait and switch."
Then, at 5:25 AM UTC on May 21:
Varun Mohan (@_mohansolo): "An update: we're 3xing the rate limits for Gemini models across all paid tiers in Antigravity and resetting everyone's Gemini quota for the week. We understand some people hit their rate limits quickly and wanted to respond fast. Lots more to come and enjoy building!"
Logan Kilpatrick (@OfficialLoganK): "We just 3xed the rate limits across all tiers in Antigravity so that you can put 3.5 Flash through its paces even more, enjoy, and keep the feedback coming! :)"
And the key follow-up from Varun:
"In case it's not clear, the 3x is forever."
What I Actually Measured
My agent's cron job fired at 05:00 UTC, likely straddling the quota boost that landed around 05:25 UTC. The results:
Session 3 (05:00 UTC, partially on old quota, partially on new):
- 33 minutes of productive work
- 9 runs, 588 files changed
- Renamed the entire domain (
localleads.protolocalseogen.com) across all generated SEO pages, fixed Stripe redirect URLs, corrected ES Module syntax in API files - Built a mock database layer (
db/mockDb.js) with full CRUD operations - Created
lib/time-helpers.jsutility library - Wrote test suites for signup, login, get-credits, assign, generate-seo-pages
- Refactored 14 test files to use the new mock DB
Session 4 (07:07 UTC, fully on new quota):
- 29 minutes of productive work
- 8 runs, 34 files changed
- Converted all test mocks from ESM (
.js) to CommonJS (.cjs) for jest compatibility - Fixed babel and jest configuration for the mixed ESM/CJS codebase
- Refactored
execute-outreach,forgot-password-request,generate-seo-pages,user-referral-datatests - Cleaned up
.env.testand email library
Two back-to-back sessions of ~30 minutes each. Together they used the full 5-hour window, so roughly 50 minutes of productive runtime per 5h refresh cycle.
The comparison:
| Before boost (May 20) | After boost (May 21) | |
|---|---|---|
| Runtime per 5h window | 8 minutes | ~50 minutes |
| Effective improvement | ~4-5x (announced 3x) | |
| Productive output | 42 files fixed | 622 files changed, full test infra |
| Weekly projection | ~68 minutes | ~5+ hours |
Google announced 3x. I measured closer to 4-5x for autonomous agentic coding in my setup. I wouldn't treat that as a universal number yet. The difference likely comes from my measurement catching a weekly quota reset, the rate limit increase, and a different prompt mix all at the same time.
The Insight
The feedback loop between AI providers and power users is now measured in hours, not months.
- Monday (May 19): Google launches new compute-based quota system at I/O
- Tuesday (May 20): Users hit walls, Reddit fills with complaints, my agent gets 68 min/week
- Wednesday (May 21, 5:25 AM): Google triples limits permanently and resets everyone's pool
That's a 36-hour turnaround from "this is broken for agents" to "fixed, permanently." For anyone building autonomous systems on top of subscription AI: the economics are volatile, but they're trending in your favor. The providers are watching usage patterns and adjusting in real-time.
The Real Story: Quality × Time = Output
Here's what I'd tell any developer considering Gemini 3.5 Flash for agentic workflows:
The old model had unlimited time and did nothing useful with it. The new model has limited time and makes every minute count.
- 2.5 Pro + Flash combo: 28 hours/week → last place, stuck in bug loops
- 3.5 Flash (pre-boost): 68 min/week → more progress than 4 weeks of the old model
- 3.5 Flash (post-boost): 5+ hours/week → fully competitive, systematically building
Quality matters more than quantity. I'll take 5 hours of a model that diagnoses root causes, fixes 32 files in one pass, and builds proper test infrastructure over 28 hours of a model that files help requests about problems it created.
What's Next
The Gemini agent went from last place to having a real shot. The product (LocalSEOGen, a local SEO page generator for agencies) now has:
- Fixed API endpoints (32 files)
- Working auth flow
- Test infrastructure (mock DB, jest config, babel setup)
- Domain migration complete
Next sessions will focus on getting the Vercel deployment actually serving requests and pushing toward first revenue.
But the bigger takeaway isn't about my race. It's this:
The lesson from this week is not "Gemini needs more quota." The lesson is that autonomous agents turn model access into infrastructure. For human developers, Gemini 3.5 Flash on a $20 plan is a huge upgrade. For autonomous coding agents, it finally feels capable enough to matter. And that is exactly why the quota suddenly matters too.
Follow the race live at aimadetools.com/race. 7 agents, $100 each, 12 weeks, real startups.
Top comments (0)