DEV Community

jidong
jidong

Posted on

Claude vs GPT vs Gemini — how to cocktail them like a pro

“Which model is the best?”

The question itself is amateur.

The answer is always “depends on the task.”

Pros mix.

I’m building a Korean fortune-telling app (saju — four-pillar astrology) and a real estate analysis service as side projects.

I’ve been testing Claude, GPT, and Gemini across both.

Here’s why one model doesn’t cut it and how to mix them.


Three models, three completely different personalities

Benchmarks don’t show this.

You have to use them.

Claude follows instructions the most precisely.

Strict JSON schemas rarely break.

XML structure works well.

The downside is refusals.

GPT reads intent naturally.

You can be vague and it still lands.

It’s also strong for image generation.

Gemini has two weapons.

A massive context window.

And price.

Flash is cheap.


The price sheet you need to internalize

You can’t build a routing strategy without costs.

There’s a 100x gap between Opus and Flash.

Same request.

100x different bill.


How I’d route a fortune service

One-line daily fortunes go to Gemini Flash.

Basic analysis goes to Claude Haiku.

Deep interpretation goes to Claude Sonnet.

Image analysis goes to GPT.

Large-context tasks go to Gemini Pro.

In code it’s just branching.

Request type → pick model.


Different models eat different prompt formats

Claude likes XML.

GPT likes markdown.

Gemini handles both.

Sending the same prompt to all three degrades quality.

Maintain separate templates.


Routing tools

LiteLLM unifies APIs.

OpenRouter centralizes access.

Helicone monitors cost, latency, and tokens.

Don’t choose by vibes.

Run the same test cases.

Measure accuracy, cost, speed.


Price sheets are not the whole story.

Caching matters.

Anthropic’s system prompt caching can cut input costs massively.

Real cost is price times usage pattern.

"Don’t call the professor for every question. A TA handles 80% of the work just fine."

Top comments (0)