DEV Community

Cover image for I Made Two AI Models Read My Git Commits. It Got Uncomfortably Personal.
Sukriti Singh
Sukriti Singh

Posted on

I Made Two AI Models Read My Git Commits. It Got Uncomfortably Personal.

I ran a blind AI duel this week on VibeCode Arena by HackerEarth with a prompt that sounded a lot less revealing than it turned out to be.

The idea was simple: build a Developer Mood Analyzer.

User pastes recent git commit messages, the app scans them, assigns one mood — Chaotic, Caffeinated, In Denial, Silently Judging Everyone, Burnout Mode, or 10x Developer — and then throws back a result card with a big emoji and a one-line summary.

Mostly just a fun little AI build challenge.

At least that was the plan.

VibeCode Arena runs these prompts side by side, so two AI models generate against the same instruction, and you compare them blind before the model names are revealed. In this duel, one side was Gemini 2.5 Flash, and the other was my own challenge entry.

What both models produced looked nearly the same in the preview, which was mildly suspicious. Same dark UI, same textarea, same oversized Analyze button, same general result card concept. It was polished enough, but not the kind of duel where you immediately think one side has obviously crushed the other.

So I did the only useful thing at that point and started trying to break them.

More specifically, I pasted in my own recent commit messages. These were not curated. This was just actual recent history:

fix: hotfix for the hotfix
changed stuff
It works now
misc updates
temp temp remove later
ok fine

Press enter or click to view image in full size
Screenshot showing the duel side by side.
One of the analyzers thought for a second and labeled me Burnout Mode.

Its explanation:

You are one bad deploy away from quitting tech and starting a farm.

That felt a little too targeted for something that was only supposed to be scanning commit text.

I sent it to a few developer friends, mostly because I wanted confirmation that this thing was being unfair to everyone, not just me. Same pattern every time: they opened it, laughed once, and immediately started digging through their own git logs.

Join The Writer's Circle event
Nobody asked what prompt I used. Nobody asked which model built the better version. Nobody even really cared about the duel part.

They just wanted to know what mood they got.

One friend landed on Chaotic after feeding it a string of temp, misc, and trying again commits. Another got In Denial, which was deserved. Someone else got 10x Developer and became much more confident than the situation called for.

That was the moment I realized the hook here is not really the app. It is the fact that commit messages are way more revealing than developers think they are.

If you run “git log --oneline” right now and actually read the last ten entries, there is a good chance it looks less like disciplined software engineering and more like a low-grade distress signal.

Something along the lines of:

temp
why is this failing
quick fix maybe
final final final
do not touch

We all write commit messages in whatever mental state we happen to be in when we are trying to ship, patch, undo, or quietly survive. Which means git history ends up recording two things at once: changes to the codebase, and a surprisingly honest trace of the person making those changes.

That second part is what this analyzer latches onto.

The AI duel itself got more interesting the longer I played with it. On first glance, both outputs looked interchangeable enough that I thought this would come down to tiny cosmetic preferences. But after enough weird inputs, edge cases, and absurd commit dumps, differences started showing up. One version was slightly better at reading intent from vague commit language. One handled sparse input without producing nonsense. One just felt cleaner in how it presented the result.

Nothing dramatic. Just enough to remind me that AI-generated apps can look almost identical in screenshots and still behave very differently once you stop admiring the preview and start using them like an impatient real person.

That is probably my favorite part of these blind duels now. They force a more honest kind of evaluation.

Since this turned out to be much more entertaining than expected, I posted it as an open challenge instead of leaving it as a disposable one-off. So now anyone can jump in, try the current analyzer, improve the prompt, redesign the interface, make the mood detection smarter, make it meaner, whatever — and submit a better version.

There is obvious room to push it further. Better roast lines. Terminal-looking visuals. Combo moods for truly cursed commit histories. Shareable result cards. Maybe some kind of developer stability score, which feels medically irresponsible but compelling.

Even in its current form, though, it does something very reliably:

People instantly want to paste in their own commits.

This is usually a good sign that you have accidentally built something with a real participation loop.

So if you feel like being judged by your own repository, paste in the last ten actual commits from a week where things were not going smoothly and see what it tells you.

Try the challenge here: [https://vibecodearena.ai/duel/d42f49c8-d609-4096-9dec-1a9416557fe6]

Also, I am genuinely curious what the worst recent commit message sitting in other people’s repos looks like, because I refuse to believe “ok fine” is even close to the bottom of this barrel.

Top comments (0)