Developer Harsh for Composio

Posted on Jul 2 • Originally published at composio.dev

I burnt 10M tokens to compare Claude Code and Gemini CLI. Here's what I found out 🤯

#ai #productivity #bash #python

Introduction

Gemini CLI was recently launched, and the internet is talking about it. So, I thought, why not test it out myself?

In the past, I have tested similar CLI tools and found Claude's code to be amazing and worthy of the test. In an effort to test the limits, I built a CLI tool that integrates file tools and other apps via Composio.

In this blog, I will share my experience building with them so that you have a clear idea of which one is better, despite all the hype.

Let's start by looking at the prompt (single-shot PRD).

TL; DR

Overview: Compared Claude Code vs Gemini CLI using the same PRD to build an agentic CLI tool.
Speed: Claude finished faster (1h17m) with full autonomy, while Gemini needed manual nudging and retries.
Cost: Claude cost $4.80 with smooth execution; Gemini’s fragmented attempts pushed cost to $7.06.
Token Usage: Claude used fewer tokens efficiently with auto-compaction; Gemini consumed more without optimization.
Code Quality & UX: Claude delivered cleaner structure and smoother UX; Gemini was decent but less polished overall.

Prompt

The prompt is the same for both Claude Code & Gemini CLI. Check it out here. (basic prompt + some gemini 2.5 magic :)

The important part in a prompt is to give a clear set of instructions to the prompt, which is achieved by providing:

Objective - Overall goal
Core Technology - docs, resources & target audience
Project Specifications - HLL overview of the project.
Folder Structure (very important)
Toolset Definition - what all tools are required, and an explanation
Key Features - most important features
Development Milestones - break the project into parts, build separately, and merge them while being coordinated
Deliverables: What agents need to provide back to the user.

Here is a snapshot of what final product looks like

CLI Agent with Claude Code + Composio

CLI Agent with Gemimi CLI + Composio

However, as this is a battle of wits, I would like to address a few factors so you can make a more informed decision.

Speed

In terms of speed, Claude Cde took the lead with completing the entire project in 1hr 17min, compared to Gemini CLI, which did it in 2hr 2min. This is the total API time.

Apart from that:

Claude Code did it in a single shot in auto mode, with no interference.
For Gemini CLI, it took me multiple tries & multiple times I had to press ESC and then provide it context to nudge it in the right direction.

Claude Code Summary

Gemini CLI Summary

So, if you are prioritizing speed, Claude Code can be your go-to.

Next, let’s look at the cost.

Cost

In terms of cost, Claude spent a total of $4.80, while Gemini CLI consumed $7.06 across its three tries.

In case you were wondering, the cost was approximately $2.56, with just a repository and broken code (milestones 4 and 5 remaining) for the Gemini CLI.

So, if we do math:

Completing the remaining milestone (not to mention the additional two tires and the middle context addition) will cost $4.50.
That's the cost Claude took to complete the entire project.

However, using Claude Code involves a hefty fee; on the other hand, Gemini CLI is generally free.

In case you want to utilize gemini-2.5-pro massive context window within Claude Code or vice versa, you can follow this process.

So, if you prioritise performance and quality at the cost, go with Calude Code. Otherwise, go with Gemini CLI + manual context additions.

Now let’s look at the token's usage!

Tokens

Claude Code - Input & Output Tokens

Gemini CLI- Input & Output Tokens

In terms of tokens used:

Claude Code took a total of 260.8K input and returned 69K tokens with 7.6M read cache (CLAUDE.md) - with auto compact
Gemini CLI took a total of 432K input and returned 56.4K tokens with 8.5M read cache (GEMINI.md)

However, one thing I noticed while evaluating the tokens is that Gemini doesn’t use an auto-compaction mode, which may be the cause of this issue. Also, sometimes, API keys can max out due to this.

So, if you are concerned about efficient token usage, Claude Code is a great choice. However, if you're comfortable with small projects in teams, Gemini CLI might be a good choice.

Now let’s have a look at the generated Code Quality

Quality

Claude code

Gemini CLI

In terms of quality, both Claude Code and Gemini CLI were amazing.

Claude Code generated a production-ready codebase, with organized folders, a readme, tests, git and workspace files.
Gemini also generated a good codebase but lacked the structural organization of files for test cases. It added it to the root folder along with some extra files (probably to debug issues).

You can check out the repo to learn more!

So, if you are serious about repository organization in production-grade settings, go for Claude Code. For small projects, prefer Gemini CLI.

Now let’s look at UX.

UX

Claude Code UI/UX

Gemini CLI UI/UX

Personally, Claude Code can be my go-to due to this!

Claude Code

Provides a premium experience while using, generating code and performing evaluations.
I like its bash mode for quick checks and Ctrl+R to enlarge the generation data. Also auto compact can be enabled to save tokens. Really enjoyed working with it.

On contrary

Gemini CLI

Tries to mimic Claude Code but lacks the premium experience Claude provides.
I specially didn’t like its verbose generation (ctrl+K can be applied), no control to change settings (can keep the /command as setting in editor), no plan mode 💀 and UI feels little buggy after /clear command.

To conclude, if you demand premium experience, go for Claude Code, else for simple task Gemini CLI is a good fit.

However, there is a caveat here!

Interesting Fact!

Initially when I was working with Gemini, it was stuck with test cases. Even after multiple nudges, the model wasn’t able to fix it. But I wanted it to get done.

After a bit of research, I learnt that Gemini CLI have pipeline mode invoked using gemini -p <prompt> , which works as a headless agent, and someone on Reddit used it to use Gemini CLI within Claude Code. So, I updated my CLAUDE.md with the same.

The idea was simple → Wrap all the execution with the gemini-p command and tell Claude to do the same when performing task completions.

This way I was able to use a massive 1m+ context window of Gemini 2.5 Pro with Claude Code and get work done in a single step, which took me 7 fails I tried earlier 😅.

So, who won?

Final Thoughts

Ofc it’s Claude Code 🥳.

Let me be clear here, why?

In all categories except Output Quality, Claude Code performed way better than Gemini CLI.
The UX and code & generation flow was quite polished, smooth and premium
In fact, 80% autonomous, I started the agent and then went on to study.
Just few permissions management at initial required for YOLO mode.
Above all, it is less frustrating and optimized for token usage.

On a final note, I would like to say:

I have been a huge fan of Google Products, but being late, and still releasing a on par product didn’t feels right. I know Google can do much better and hope to see that reflected in the next version.

That said, I want to emphasize that both Claude Code and its competitors have immense potential and market relevance. However, it's crucial for users to handle them responsibly.

We're in the early days of truly intelligent coding assistants, and the landscape is evolving fast. Instead of picking sides, let's focus on thoughtful use, continuous learning, and giving constructive feedback.

The best is yet to come—and it will be shaped by how we as business choose to engage with these tools today.

Thanks for reading, see you in the next one.
Bye 👋