❄️ What I’m Doing
Every day I’ll solve the Advent of Code puzzle in:
- JavaScript
- Rust
- Python
Then I’ll ask a lineup of current AI coding models to produce their own solutions in the same three languages:
- GPT-5.1 Codex
- Gemini 3 Pro
- Composer-1
- Opus-4.5
- Sonnet-4.5
So for each puzzle, there will be:
- My human solutions (in three languages)
- Five AI solutions (also in three languages each)
It’s essentially a coding “showdown,” but not a competition. The real aim is to explore how different kinds of reasoning appear in code, how approaches vary from model to model, and how my own thinking compares.
Methodology
Not a lab-grade study, just a consistent, lightweight workflow so the comparison stays fair.
For each challenge, after solving the problem manually, I will drop the challenge text into a txt file in the prompts folder, prepended with a brief prompt:
You are a developer taking on the Advent of Code Challenge 2025.
Create a solution for this problem.
This puzzle has two parts, solve both in the same solution. The program output should just be the two answers on separate lines.
I will aim to keep this prompt consistent across the days, only varying on the output format as needed. I then drop the input file into the inputs folder. In cursor, I select the model under test and then @ the file, the prompt, and the target directory, then hit run. No MCPs, no MAX mode, or anything else to avoid any confounding variables, and to minimise context bloat.
I then run the run_solutions.py python script to verify the output and review the "thinking" of the model. I'll improve this and the reporting as I progress - but this works as a starting point.
Once I've verified the output, I add the model directory to the .cursorignore
Note: The model solves for all three languages and is able to reference its own prior solutions, e.g. it can start in Python and then translate that to JS and Rust. I appreciate it would be interesting to see how it stacks up when the model can only work in one language at a time, and I may run that as a second iteration of this experiment. However - the prompt does not instruct the model to execute in a particular language first, and I too can reference my own solution in other languages - so I thought this would be interesting in itself.
🎁 Why This Experiment?
To understand how AI actually solves problems
Advent of Code puzzles are a perfect testbed: small enough to be self-contained (and not burn too many tokens/£!), but clever enough to require genuine reasoning and creativity. Watching how different models break them down is already proving fascinating, and analysing the end results may yield some interesting insights.
To improve my own fluency
Solving each puzzle three times in three different languages forces me to think more deeply about patterns, algorithms, and idioms. It’s a great way to keep my skills sharp, and explore some different languages to what I use day to day.
To observe differences in style and structure
- Where a model chooses brute force, I choose planning and analysis.
- Where I focus on solving the problem and don't worry about failure modes (as the code is only run with known inputs in a known environment, AI may be much more defensive and write more flexible code.
- Where AI might use packages to solve a problem, I may avoid it and stick to language features and standard lib. These contrasts say a lot about how “AI thinking” manifests in code.
To build a dataset of human + AI approaches
By the end of AoC, even with only 12 challenges this year, that still gives me dozens of solutions across languages and models — plenty to analyse! I’ll have dozens of solutions that all answer the same questions from different angles. That’s an interesting resource in itself.
🤖 What I’ll Be Sharing
As the month goes on, I’ll post insights on things like:
- patterns the AIs gravitate toward
- performance of the output code (hey, perf stats can be fun)
- common mistakes or blind spots
- whether models one-shot solutions or needed some handholding
- places where models outperform my first instincts or develop more novel solutions
- language-specific quirks (Rust borrow checker vs AI… pray for it)
- what it feels like to “pair” with multiple coding models
- other random thoughts and musings on AI, Cursor and model nuances
The goal isn’t to crown a winner. It’s to understand the landscape of coding in 2025, where AIs shine, where they show their limitations, and how the two complement each other. Will the models one-shot all the solutions or will it pivot into re-prompting or "pair-programming"?
🌟 Follow Along
If you enjoy Advent of Code, programming language experiments, or the evolving relationship between developers and AI tooling, stick around. I’ll be posting reflections and curiosities throughout the month, followed by a round-up at the end of the challenges.
Here’s to a December full of puzzles, head-scratching, learning, and some very weird debugging moments.
Happy coding, and an even happier Advent. 🎅🔥
Top comments (0)