Jim L

Posted on Mar 4 • Edited on Mar 5

Gemini CLI vs Claude Code — Two Weeks of Terminal AI, Honest Impressions

#gemini #claudecode #ai #terminal

I've been running both Gemini CLI and Claude Code in parallel for a couple of weeks now on real projects — not toy examples, not contrived benchmarks. A Next.js app, some Python data scripts, a refactoring job I'd been putting off. Here's what I actually think.

What these tools are

Both are AI assistants that live in your terminal. You open a directory, run a command, and start talking to an LLM that can read your files, write code, and execute things on your behalf.

Gemini CLI is Google's take on it. Backed by Gemini 2.5 Pro, with a context window somewhere around 1 million tokens. Claude Code is Anthropic's — uses Claude Sonnet 4, 200K context window.

The other big difference: Gemini CLI is free (something like 1,000 requests per day on the free tier). Claude Code requires either an API key (usage-based, roughly $3/M input tokens, $15/M output for Sonnet 4) or the $20/mo Max subscription.

Setup is straightforward for both:

# Gemini CLI
npm install -g @google/gemini-cli
gemini

# Claude Code
npm install -g @anthropic-ai/claude-code
claude

Gemini asks you to auth via Google account. Claude wants an API key or Max subscription. Neither requires much config beyond that.

The context window gap is real

1M vs 200K isn't just a number. I dropped a medium-sized codebase on Gemini and asked it to explain the overall architecture — how data flows from the API layer through to the database, where the business logic lives, that kind of thing. It handled it well. It could hold the whole thing in context.

Claude hits its limit on larger repos. You have to be more deliberate about which files you give it. That's not necessarily dealbreaking — I'm used to being intentional with context — but it does mean Gemini wins for "explain this codebase to me" type tasks.

Where it gets interesting: even with 200K tokens, Claude's answers on architectural questions were often sharper. It seemed to understand patterns in the code better. Maybe that's just benchmark-speak, but my experience tracked with what the numbers suggest — Claude Code scores somewhere in the low-to-mid 70s on SWE-bench, Gemini is a bit lower, around the mid-60s.

Numbers matter less than the feeling of using it, though.

Simple tasks

Both work. Honestly, for "add a function that does X" or "fix this type error" — either tool gets you there in one or two rounds. I stopped trying to find a meaningful difference here.

Claude's output is slightly cleaner. Less boilerplate, better naming, code that feels like it actually fits the surrounding context. But "slightly cleaner" doesn't justify a cost premium on its own.

One thing I noticed: Claude is better at understanding existing patterns in a file before it writes new code. If your project uses a particular error handling convention, or if you have a consistent way of structuring components, Claude tends to pick that up and mirror it. Gemini will usually get the logic right but occasionally writes code that's technically correct but stylistically out of place — like it's generating from a template rather than reading the room.

Where it actually diverges: multi-file changes

This is where Claude Code pulls ahead, clearly.

I had a task: refactor a component that had grown too large, split it into smaller pieces, update all the import paths, and make sure the types still worked across files. Tedious. Exactly the kind of thing you want an AI to do.

Claude Code handled it almost right the first time. It tracked the dependencies, updated the imports correctly, caught a type that would have broken at compile time. One round of corrections and it was done.

Gemini struggled. Not catastrophically — it did most of the refactor — but it missed one cross-file dependency and the types were off in two places. I had to point it at specific files and re-explain what was broken. Still useful, but it took longer.

I've seen this pattern consistently. For changes that stay inside a single file, Gemini is fine. Anything that touches 3+ files with shared types or imports, Claude handles it more reliably.

The way I think about it: Gemini reasons well about what you describe to it. Claude seems to reason better about what exists and how the pieces connect. For code archaeology — "why does this behave this way?" — Gemini's 1M context is the better tool. For surgery — "change this without breaking everything else" — Claude's stronger model quality wins out.

Git integration

Claude Code has this built in. It can stage files, write commit messages, commit. Useful if you trust it. I mostly use it for the commit message drafting — it reads the diff and writes something reasonable, which saves me the usual 30 seconds of "ok what did I actually change here."

Gemini doesn't do this. It's a coding assistant, not a workflow assistant. That's a legit design choice, but it does mean you're switching back to your shell for git stuff.

Claude also has a --continue flag that lets it pick up where it left off across sessions — it maintains a kind of short-term memory for the project you're working in. Useful when a task spans more than one sitting. Gemini doesn't have an equivalent. Every session starts fresh.

What I actually use each one for now

Gemini CLI: exploring new codebases, quick questions about how something works, prototyping where I want to iterate fast and don't care if the output needs polish. Also good for cases where I genuinely can't afford to spend API credits.

Claude Code: any refactoring that touches multiple files, production code where I care about the output quality, anything where getting it wrong costs me time to fix.

There's also a third case: sometimes I run Gemini first to get oriented, then switch to Claude for the actual implementation. The 1M context makes it good for reconnaissance.

Genuine downsides

Gemini CLI: The output is sometimes verbose in a way that's hard to pin down. Like it explains more than you asked for, or wraps code in caveats that aren't necessary. Minor thing, but it adds friction when you're trying to move fast. Also, it occasionally misses cross-file dependencies in ways that are hard to predict — you don't know it's wrong until you run the code.

Claude Code: The context limit genuinely bites you on larger projects. And the pricing is real — if you're running it on a big refactor session and you're on API pricing, the token usage adds up faster than you expect. The Max subscription at $20/mo is probably the right call if you use it daily, but that's a meaningful cost for a hobby project or solo developer.

Neither tool is magic. Both will confidently give you wrong answers sometimes. (For what it is worth, I wrote more about evaluating AI coding tools and the tradeoffs involved.) The difference is mostly in how often they're wrong and how wrong they are.

If you're deciding which to try first

Start with Gemini CLI. It's free, the 1M context is genuinely impressive, and for most tasks it'll get you 80% of the way there. If you're working on production code or find yourself repeatedly cleaning up multi-file changes, try Claude Code for a week and see if the quality delta justifies the cost for your workflow.

My guess is: most developers will keep Gemini around as a free fallback and reach for Claude when it actually matters. If you want a broader comparison that includes Cursor, Copilot, and a few others, OpenAI Tools Hub has a solid rundown worth checking.

Anyway, that's been my experience. Both tools are moving fast — what I've written here might be out of date in a month.

Top comments (1)

Agustin Sacco • Mar 17

Great comparison, Jim! I've been running into the same "Gemini for reconnaissance, Claude for surgery" pattern myself.

One thing that's been bugging me about the current crop of terminal assistants is the lack of true autonomous persistence—I hate having to keep a terminal tab open and wait for a complex refactor to finish.

I actually ended up building a "Supervisor" wrapper for Gemini CLI called Tars. It adds a background heartbeat service so you can trigger a task and get a Discord/WhatsApp notification when it's done, plus it maintains a persistent facts.json for better long-term context than the standard session saving.

Curious if you’ve looked into any ways to make these tools more "agentic" in the background, or if you prefer the interactive-only loop?