DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

Claude vs GPT-4o: Beginner Coding Tasks Benchmark Results

Claude Solved 87% of Beginner Tasks. GPT-4o Solved 91%.

That's the headline number from 100 coding tasks designed for absolute beginners — and it's not the full story. The aggregate score flips depending on which beginner tasks you test. GPT-4o crushed string manipulation and basic data structures. Claude dominated anything requiring sustained reasoning across multiple functions or debugging broken code.

I built a test harness to run the same prompts through both models and measure correctness, not just speed or vibes. The goal: figure out which LLM a beginner should reach for when they're stuck on FreeCodeCamp, HackerRank, or their first Flask app. The results surprised me — and made me rethink how we talk about "beginner-friendly" AI.

From below of modern building facade with number four on glass entrance door and stairs

Photo by ready made on Pexels

The Test Setup: 100 Tasks, Zero Hand-Holding

I pulled 100 tasks from common beginner sources: LeetCode Easy, Python for Everybody exercises, Real Python tutorials, and actual questions from r/learnprogramming. Each task got a single prompt with:


Continue reading the full article on TildAlice

Top comments (0)