DEV Community

🤖 OpenAI o3 vs. Gemini 2.5 vs. OpenAI o4-Mini on Coding 🤔

Shrijal Acharya on April 22, 2025

TL;DR If you want to skip to the conclusion, here’s a quick summary of the findings comparing OpenAI o3, Gemini 2.5, and OpenAI o4-Mini:...

Read full post

Andrew C. Oliver • May 6

Interesting. I've had better luck with o4-mini and o3 for coding. Granted my work is very iterative and human in the loop. Gemini tends to make a mess of things for me after round 1. o4-mini is my goto but o3 is for debugging. I still use Claude a lot for the tradeoff of speed.

My expectations are lower though. Step one anything that runs and is roughly a game is pretty good. I'd expect to iterate more with more clarifying prompts. The fully autonomous thing seems like a pipe dream for now for anything beyond "implement this API"

Shekhar Rajput • May 9

Nicely said. I agree with you. o3 seems to be more better with debugging stuff.

Arindam Majumder • Apr 22

NIce Comparison, Shrijal!

Shrijal Acharya Composio • Apr 23

Appreciate you, Arindam 🫡

Bhaskar Prajapati • Apr 22

Im loving all the hype around these AI models nowadays. Gemini 2.5 seems to be the one to go for.

Nick Nine • Apr 22

It's the worst. Always hallucinates on a bigger codebase.

Shrijal Acharya Composio • Apr 22

Does it? Haven't really used it on a bigger codebase. What's your typical usecase?

Shrijal Acharya Composio • Apr 22

Not always. As said, choose the models based on your use case.

Arindam Majumder • Apr 22

Nice Comparison, Shrijal!

Shrijal Acharya Composio • Apr 23

Appreciate you, Arindam 🫡

Arya Joy • Apr 23

OpenAI O3 handles complex code well, while Gemini penumbra ffxiv 2.5 is strong in logic and structure.
O4-Mini is fast and efficient great for everyday coding tasks on the go

Shrijal Acharya Composio • Apr 23

Well said!

Nick Nine • Apr 22

Love the zero-shot concept for testing. This really tests raw performance. But I don't think Gemini is a good choice for bigger codebase. Maybe works for such smaller projects. Go for o1, that's more than enough for most of the use case. Jack of all.

Shrijal Acharya Composio • Apr 23

Glad you liked it. I've heard good things about this model o1, and it definitely sounds great. I haven't tried it, though…

Nevo David Composio • Apr 25

Honestly, I get super annoyed when I can't just pick one model for everything - feels like it's always tradeoffs. Makes me double check every AI response now, not gonna lie.

Shrijal Acharya Composio • Apr 25

Yeah, same for me, though. Thank you for checking out, Nevo! :)

Zako Mako • Apr 22

I WANT TO JOIN

Shrijal Acharya Composio • Apr 23

What exactly? 👀

Shrijal Acharya Composio • Apr 22

Share your thoughts on the comments! 👇

Lara Stewart - DevOps Cloud Engineer • Apr 22

Love this detailed comparison, Shrijal! 🤍

Shrijal Acharya Composio • Apr 22

Thank you, Lara! I appreciate you taking the time to check out the blog post. 😊

Satria Akbar Rizki • Apr 25

Wow amazing but deepseek why include for compare response?

Shrijal Acharya Composio • Apr 25

Thank you, Satria, but where did I mention DeepSeek exactly?

Shayne Villarin • Apr 22

Am I the only one sticking with codellama still? It just gets my job done. Never had to look back.

Shrijal Acharya Composio • Apr 22

I'm surprised. How do you use it? 👀 Maybe locally with Ollama?

Shayne Villarin • Apr 24

lol yeah, local setup. For vscode, I use an extension. Here you can find it: marketplace.visualstudio.com/items...

Shrijal Acharya Composio • Apr 24

🙌

Aavash Parajuli • Apr 24

Love this.

Shrijal Acharya Composio • Apr 24

Thanks!

Shrijal Acharya Composio • Apr 25

Thank you! I agree that the recent new smaller models are really great. I love the Gemini 1.5 flash and also the recent pretty small and solid model, Gemma 3 27B. Have you tried it out?

Rico Rizz • Apr 23

Impressive!

Shrijal Acharya Composio • Apr 23

Thank you, @ricorizz ✌️