Sujal Gupta

Posted on May 23

I Fed React's Entire Hooks Transition History to Gemma 4. Here's What It Found That We Missed.

#devchallenge #gemmachallenge #gemma #git

Gemma 4 Challenge: Build With Gemma 4 Submission

"Which commit broke everything?"
Every developer who has inherited a legacy codebase has asked this. We just never had a good way to answer it.

The Problem That Started This

Six months ago I was debugging a production issue in a codebase I'd inherited. The bug had been there for a long time — I could tell because the workarounds had workarounds. But I couldn't figure out when it started, or what change introduced it.

I opened git log. 2,847 commits. Three years of history. Everything was in there — every decision, every mistake, every refactor — locked inside commit messages that ranged from "fix critical auth bug" to "stuff".

I needed a historian, not a search engine.

That's why I built CodeDNA.

What I Wanted to Know

The question I couldn't answer manually: When did this codebase's quality start degrading, and what caused it?

Standard git tools answer how much happened. Commit graphs show velocity. git blame shows who touched what. But none of them answer why a period was chaotic, or connect a March 2019 API change to a June 2019 bug cluster.

That connection requires reasoning across time — holding 180 commits in context and tracing causal chains between them. That's exactly what Gemma 4's Thinking Mode is designed for.

Why Gemma 4 Specifically

I want to be honest about this, because "I used an LLM" is not the same as "I used Gemma 4 intentionally."

Thinking Mode is the reason this project exists.

I tested the same prompt against several models during development. Standard instruction-tuned models produce summaries. They count keywords and report patterns. Gemma 4 with Thinking Mode reasons about patterns — it traces why a cluster of fix commits appeared after a specific API change, not just that they appeared.

The live reasoning stream in the UI is not a gimmick. It's the proof. When you paste your git log and watch the right panel stream Gemma's analysis in real time, you're watching it build the causal chain before it outputs the structured result. That's not post-hoc storytelling — that's the actual analysis process made visible.

128K context is the other prerequisite.

180 commits with file stat data is roughly 35,000–40,000 tokens of compressed history. The only way to detect that a March pivot caused a June bug storm is to have both in the same context window. Without 128K, you're forced into chunking — which destroys the causal chain entirely.

Privacy is structural, not optional.

Your git history contains proprietary module names, security patch descriptions, unreleased feature branches, and often enough context to reverse-engineer your business logic. I built CodeDNA to run under your own API key with zero data retention. This isn't a feature toggle — it's the only way I could imagine a real engineering team actually using it with their private repos.

The Experiment: React's Hooks Transition

I chose the React repository's 2018–2019 Hooks transition period as my primary test case for one specific reason: any developer who knows React can verify the output in 2 minutes.

This is the verifiability test that every other project idea I considered failed. Financial anomaly detection? A judge would need domain expertise. CVE scanning? Knowledge cutoff problems. Food photo analysis? Blurry curry images break the demo. Git history? The raw commits are public. Anyone can check.

I fed Gemma 4 the commits from September 2018 through June 2019. 24 commits in the demo, roughly 180 in a fuller run. The Hooks era: one of the most architecturally significant transitions in any major open-source project.

Here's what it found:

What Gemma 4 Said About React's History

The milestone Gemma 4 identified first: A feature burst in July–September 2018 — Scheduler time-slicing infrastructure (Scheduler.js, 144 insertions in one commit), then React.lazy, Suspense, and createContext v2 added within 6 weeks of each other.

This is factually accurate. Any React developer recognizes this as the foundation-laying period before Hooks went public.

The milestone that surprised me: Gemma 4 flagged January–February 2019 as a stability → bug storm transition, citing ca53456 (fix for useRef) and cb54567 (fix for infinite useEffect loops) within days of the 16.8.0 release. It specifically noted that ReactFiberHooks.js had 8 modifications in this period versus 2 in the preceding stable phase.

I had to look this up to verify. It's correct. The Hooks release in 16.8.0 (February 6, 2019) was followed by a cluster of hotfixes addressing edge cases in the hooks implementation that weren't caught before release. The file-level evidence in the commit stats makes this visible — but only if you're looking across all 24 commits simultaneously, not one at a time.

The health score: 79/100. Breakdown: +15 for high commit message quality, +10 for clear refactor era visible in 2019-05, -10 for 21% bug-fix ratio, and a neutral note for concentrated churn in ReactFiberHooks.js. Every factor is displayed, with evidence. No black-box number.

The Prompt Engineering That Took the Longest

Getting Gemma 4 to produce specific insights — not corporate-speak — was the hardest part of this project. It took three major iterations.

Iteration 1 (bad): Asked the model to "tell me the story of this codebase." It produced beautifully written hallucinations. "The team worked hard to address technical debt during a difficult refactoring period." Sounds good. Means nothing. Not a single commit hash cited.

Iteration 2 (better): Added a list of forbidden phrases: "technical debt", "the team", "seems like", "likely indicates", "possibly", "perhaps". Insights became slightly more factual but still vague. The model would say "there were many fixes in this period" without specifying how many, which period, or what was being fixed.

The breakthrough (Iteration 3): Injecting pre-computed metadata. Before any model call, my preprocessor now extracts:

A monthly commit histogram: 2019-03: 47 commits, 2019-02: 12 commits
Top changed files: ReactFiberHooks.js modified 8 times
Bug-fix ratio: 21% When this metadata is in the prompt, the model's job changes from counting to interpreting. Instead of asking "were there a lot of fixes in early 2019?" I'm telling it "there were 14 fix commits in 6 weeks targeting ReactFiberHooks.js — what does that mean?" The insights became specific and verifiable almost immediately.

The map-reduce split was the second major insight. Asking Gemma 4 to simultaneously produce flowing Thinking Mode prose and valid structured JSON produced neither well. Splitting into Step 1 (reasoning stream — clean markdown, no JSON) and Step 2 (JSON structuring from the reasoning trace) dramatically improved both. The live reasoning panel now shows actual analytical prose. The timeline data is reliably structured.

What I Learned About Gemma 4's Limitations

Small repos produce low-confidence analysis. Under 50 commits, the patterns aren't there yet. CodeDNA handles this gracefully — it labels the output clearly as "micro-analysis" rather than inventing dramatic narratives. But the wow factor is absent.

Vague commit messages are the real enemy. A repo full of "fix", "update", "wip" commits gives Gemma very little signal to reason from. The model tries — it picks up on date clustering and file patterns — but the confidence is honest: "data_quality: low", "insufficient evidence for causal narrative". I considered trying to work around this but decided against it. Honest uncertainty is more valuable than confident fabrication.

Model size vs. speed tradeoff is real. The 31B model produces noticeably richer reasoning. The 26B model is faster and more reliable on the free API tier. I defaulted to 26B primary with 31B fallback, and added OpenRouter as a second fallback layer with dynamic model discovery. For a solo developer on a laptop with no GPU, the API-first approach was the only realistic path.

The reasoning stream can't gracefully fall back mid-stream. If the primary model fails after the SSE connection is open, the stream errors rather than switching providers. The JSON structuring call (which runs in parallel) handles fallback correctly, but the stream panel may show an error while the timeline still renders successfully. I documented this honestly rather than hiding it.

The Moment That Made It Worth Building

There's a specific interaction that convinced me this tool has real value beyond the hackathon.

I ran CodeDNA on my own FundTrace project — a fraud detection system my team built for a hackathon earlier this year. 47 commits, 3 months.

Gemma 4 flagged a stability period followed by a pivot, then a concentrated feature burst in the final week before submission. The health score was 61/100. The main reason: 38% bug-fix ratio in the final 10 days.

That's exactly what happened. The last week was a scramble. We knew it at the time. Seeing it reflected back as a pattern — with specific commit references and a calculated ratio — was oddly clarifying. Not because it told us something we didn't know, but because it quantified what we felt.

That's the use case I want to build toward: not just forensics on old code, but a continuous mirror for engineering health.

Try It on Your Repo

git clone https://github.com/acchasujal/codeDNA.git
cd codeDNA/backend
pip install -r requirements.txt
cp .env.example .env
# Add your Google AI Studio key
uvicorn main:app --reload

# New terminal:
cd ../frontend && npm install && npm run dev

Then run this in any repository you're curious about:

git log --stat | head -3000 > my_history.txt

Upload the file. Click Analyze.

What would Gemma 4 find in your codebase?

Post your results in the comments — I'm genuinely curious what it finds in different projects and domains.

What's Next

This is v1. The things I want to build next:

GitHub URL input — analyze any public repo without manual log export
Trend alerts — "your current sprint's bug-fix ratio is 2x your baseline"
Team patterns — author-level analysis (with appropriate consent and privacy controls)
CI/CD integration — flag risky commit pattern spikes as part of a PR check The core insight — that Gemma 4's Thinking Mode + 128K context makes this class of analysis possible — still has a lot of room to run.

GitHub:

acchasujal / codeDNA

CodeDNA — AI Codebase Archaeologist

Feed Gemma 4 your git history. Discover exactly when — and why — your codebase evolved.

Every codebase has a turning point. The moment before is clean commits and clear intent The moment after is hotfixes, reverts, and growing entropy. CodeDNA finds it.

What It Does

Maps your codebase history with Gemma 4 — up to 400 commits, preprocessed and compressed for maximum analytical signal. The preprocessor extracts monthly commit histograms and per-file change frequency before sending to the model, so insights are grounded in observable data.

Returns a structured archaeological report — health score with transparent breakdown, milestone timeline (bug storms, refactors, pivots, feature bursts), and key metrics. Every claim cites a specific commit hash, date, or metadata value.

Streams Gemma 4's live reasoning — watch the Thinking Mode trace in real-time as the model identifies causal patterns across years of history. Verifiable: the…

View on GitHub