Aaron Xie

Posted on Nov 20

Antigravity and Gemini3 Coding Test

#gemini #llm #agents #ai

Testing AI coding assistants with real-world tasks: ConnectOnion agent framework migration and frontend development

Project: github.com/openonion/connectonion

Conclusion

I've been coding for 5 hours this morning using Antigravity and Gemini3, and here's my conclusion.

First of all, here's my background for your reference:

I've been using both Cursor and Claude Code for coding 10 hours per day for 2 years.
I've been a machine learning engineer for 7 years, and started writing agents since 2024.
ConnectOnion (github.com/openonion/connectonion | docs.connectonion.com) - this agent framework was created by me.
I'm a $400 Gemini Ultra user, and also use the Gemini CLI sometimes.

And here's my conclusion:

Antigravity is better than Cursor - for me, around 20% better.
Gemini3 is better than Claude for long-term tasks - fewer lazy coding occasions happened.
Gemini3 is not good at discussion and reasoning, but it's better at coding and following instructions.

Now, let's dive into the details of my coding experience over the past 5 hours.

Test 1: Code Review and Code Style

OAuth is complicated, and last week I used Claude Code but it didn't work, so I'm going to use Antigravity with Gemini3 to make it work.

First of all, I'm a $400 user, but it still didn't allocate me enough tokens, which is annoying. We have Gemini 3 Pro High and Gemini 3 Pro Low - both of them only worked for maybe 2 hours before saying I reached the limit.

As you can see, it fixed the problem and also wrote some tests, but when I reviewed the code, I found it still has the problem of over-engineering with lots of unnecessary try-catch blocks.

Test 2: Writing Frontend - Changing 15 Pages Simultaneously

For the second test, I challenged Antigravity with Gemini3 to work on a complex frontend task: updating 15 different pages at the same time. This is typically where AI coding assistants struggle because they need to maintain context across multiple files and ensure consistency.

The task involved refactoring a large web application's UI components, updating routing logic, and ensuring all pages maintained consistent styling and functionality. Here's what I observed:

The Good:

Gemini3 handled parallel file modifications surprisingly well
It maintained consistency across all 15 pages without me having to constantly remind it
The code changes were systematic and followed the same patterns across files
It didn't lose context or forget what it was doing halfway through

The Bad:

Token limits hit again - had to split the work into chunks
Sometimes it would over-engineer solutions with unnecessary abstractions
When I asked it to simplify, it did, but I had to be explicit

Overall Assessment:
For large-scale frontend refactoring, Gemini3 performed better than Claude in my experience. It's more persistent and doesn't give up on long tasks. However, you need to watch for over-engineering and be ready to ask for simpler solutions.

Test 3: Migrating ConnectOnion from Python to TypeScript

The third and most ambitious test was migrating my agent framework, ConnectOnion (github.com/openonion/connectonion), from Python to TypeScript. This is a real production codebase with complex agent orchestration logic, state management, and API integrations.

The Challenge:

~5,000 lines of Python code
Complex async/await patterns
Custom decorators and metaclasses
Integration with multiple LLM providers

What Happened:
Gemini3's strength really showed here. It understood the Python codebase structure quickly and started generating TypeScript equivalents that actually made sense. Unlike Claude, which sometimes gets "lazy" on long migrations and starts cutting corners, Gemini3 maintained quality throughout.

Key Observations:

Better at understanding Python idioms and translating them to TypeScript
Handled the async patterns correctly (no Promise hell)
Properly typed the agent interfaces and function signatures
Maintained the original architecture without unnecessary "improvements"

Problems:

Still had the over-engineering issue - added unnecessary error handling
Had to explicitly tell it: "don't add try-catch unless absolutely necessary"
Token limits forced me to do the migration in chunks (annoying for paid users)

Final Verdict:
For migration tasks, Gemini3 > Claude. It's more thorough and less likely to skip important details. But you need to be explicit about code style preferences, especially around error handling and keeping things simple.

Overall Conclusion

After 5 hours of intensive testing across three different scenarios:

Antigravity as an IDE: Solid improvement over Cursor. The UX is cleaner, context management is better, and it doesn't slow down as much with large projects.
Gemini3 as a coding model:
- Best for: Long tasks, migrations, large-scale refactoring
- Worst for: Discussion, explaining reasoning, brainstorming
- Biggest issue: Over-engineering and unnecessary error handling
Comparison to Claude Code:
- Claude is better for collaboration and explaining concepts
- Gemini3 is better for "shut up and code" tasks
- Both have token limit issues that hurt paid users

My Recommendation: Use Gemini3 for implementation, Claude for design discussions. And always, always tell them to keep it simple and avoid unnecessary try-catch blocks.

Top comments (1)

Grant Wakes • Nov 20

Love this breakdown. The clear, side‑by‑side comparison of Antigravity/Gemini3 vs Cursor/Claude and the “Gemini for implementation, Claude for design” takeaway is super actionable. The emphasis on simplicity over over-engineering really lands.