Hey everyone,
I’ve been experimenting with GPT-5.1 Codex-Max and Gemini 3 Pro side by side in real coding tasks and wanted to share what I found.
I ran the same three coding tasks with both models:
• Create a Ping Pong Game
• Implement Hexagon game logic with clean state handling
• Recreate a full UI in Next.js from an image
What stood out with Gemini 3 Pro:
Its multimodal coding ability is extremely strong. I dropped in a UI screenshot and it generated a Next.js layout that looked very close to the original, the spacing, structure, component, and everything on point.
The Hexagon game logic was also more refined and required fewer fixes. It handled edge cases better, and the reasoning chain felt stable.
Where GPT-5.1 Codex-Max did well:
Codex-Max is fast, and its step-by-step reasoning is very solid. It explained its approach clearly, stayed consistent through longer prompts, and handled debugging without losing context.
For the Ping Pong game, GPT actually did better. The output looked nicer, more polished, and the gameplay felt smoother. The Hexagon game logic was almost accurate on the first attempt, and its refactoring suggestions made sense.
But in multimodal coding, it struggled a bit. The UI recreation worked, but lacked the finishing touch and needed more follow-up prompts to get it visually correct.
Overall take:
Both models are strong coding assistants, but for these specific tests, Gemini 3 Pro felt more complete, especially for UI-heavy or multimodal tasks.
Codex-Max is great for deep reasoning and backend-style logic, but Gemini delivered cleaner, more production-ready output for the tasks I tried.
I recorded a full comparison if anyone wants to see the exact outputs side-by-side:
Top comments (0)