Sujal Gupta

Posted on May 22

CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode

#devchallenge #gemmachallenge #gemma #webdev

Gemma 4 Challenge: Build With Gemma 4 Submission

You inherited this codebase 6 months ago. You can feel something went wrong around 2021. Bug reports spiked. Velocity dropped. The original authors left. The commit history has 3,000 entries — and every answer is in there.

Nobody has time to read 3,000 commits.

CodeDNA does.

What I Built

CodeDNA is an AI Codebase Archaeologist. You paste your git log, and Gemma 4 — using Thinking Mode — reconstructs the story of your codebase: bug storms, architectural pivots, refactor eras, feature bursts, and an overall health score with a transparent breakdown.

The output is 100% verifiable. You can check every milestone against your actual commit history. No hallucinated CVEs, no unverifiable financial claims — just pattern-extracted facts from structured text you already own.

GitHub:

acchasujal / codeDNA

CodeDNA — AI Codebase Archaeologist

Feed Gemma 4 your git history. Discover exactly when — and why — your codebase evolved.

Every codebase has a turning point. The moment before is clean commits and clear intent The moment after is hotfixes, reverts, and growing entropy. CodeDNA finds it.

What It Does

Maps your codebase history with Gemma 4 — up to 400 commits, preprocessed and compressed for maximum analytical signal. The preprocessor extracts monthly commit histograms and per-file change frequency before sending to the model, so insights are grounded in observable data.

Returns a structured archaeological report — health score with transparent breakdown, milestone timeline (bug storms, refactors, pivots, feature bursts), and key metrics. Every claim cites a specific commit hash, date, or metadata value.

Streams Gemma 4's live reasoning — watch the Thinking Mode trace in real-time as the model identifies causal patterns across years of history. Verifiable: the…

View on GitHub

The Problem It Solves

You inherit a codebase. Something went wrong around late 2021 — you can feel it. Bug reports spiked, velocity dropped, the original authors left. The commit history has everything, but nobody has time to read 3,000 commits manually.

Traditional tools give you graphs of commit frequency. That tells you how much happened, not what happened or why one period was chaotic and another stable.

CodeDNA uses Gemma 4's Thinking Mode to reason across your entire commit history and surface the narrative that was always there.

Live Demo

The live demo in action: CodeDNA processing the React repository’s architectural transition history.

Core Features

Feature	Description
Animated timeline	Color-coded milestones — red = bug storm, yellow = refactor, green = pivot, blue = feature burst
Health score + breakdown	0–100 score with transparent factor table (not a black-box number)
Live Thinking Mode stream	Watch Gemma 4 reason step-by-step as it analyzes your history
Smart preprocessing	Caps at 180 commits, extracts monthly histograms and file hotspots before inference
Multi-provider fallback	Google AI Studio (26B → 31B) → OpenRouter (gemma-2-27b-it → gemma-3-12b-it → more)
Analysis caching	Same git log = instant results on repeat runs
Markdown export	Download a complete archaeological report
Messy commit handling	Detects vague history and gives honest, low-confidence analysis instead of hallucinating

Screenshots

The timeline builds milestone by milestone. Red = bug storm, yellow = refactor, green = pivot.

Health Score is never a black-box number. Every factor cites commit evidence.

The reasoning panel shows Gemma 4's step-by-step analysis as it happens. This is Thinking Mode — not post-hoc summarization.

Architecture

git log --stat (your paste or .txt upload)
        ↓
preprocessor.py
  → parse commits, build monthly histogram, extract file hotspots
  → metadata header injected: MONTHLY_COUNTS, TOP_CHANGED_FILES, BUG_FIX_RATIO
        ↓
Step 1: Reasoning Stream (REASONING_SYSTEM_PROMPT)
  → Gemma 4 Thinking Mode streams clean markdown report
  → Visible live in right panel
        ↓
Step 2: JSON Structuring (JSON_SYSTEM_PROMPT)
  → Separate Gemma call converts reasoning → typed AnalysisResult JSON
  → Pydantic v2 validates schema
        ↓
React UI
  → Health Score ring + breakdown table (center, always visible)
  → Animated vertical timeline (left)
  → Live reasoning stream (right)
  → Markdown export

Map-reduce design: By splitting reasoning (Step 1) from JSON structuring (Step 2), Thinking Mode output is clean prose instead of polluted with schema enforcement constraints. Insight quality is significantly higher.

Stack:

Backend: FastAPI + httpx async + SSE streaming
Frontend: React 18 + Vite + Tailwind CSS
LLM: Gemma 4 via Google AI Studio (primary) + OpenRouter (fallback)
State: In-memory + disk cache (no database)

Why Gemma 4 — Not "Just Any LLM"

This is the most important section for me to get right.

1. Thinking Mode for causal chain reasoning — not summarization

Standard completion models count keywords. Gemma 4's Thinking Mode traces why patterns emerged. When it sees 14 "fix" commits targeting ReactFiberHooks.js in a 3-week window after a large API change, it connects them causally — it doesn't just report a spike.

The live reasoning stream in the UI makes this directly observable. Judges (and users) can watch Gemma's chain-of-thought in real time. This is the intentional use criterion — not decorative AI, but AI whose reasoning process is the deliverable.

2. 128K context — the archaeology window

180 commits × ~200 tokens each = ~36K tokens of compressed history in one request. No chunking, no context loss, no multi-call stitching. Gemma 4 holds the full narrative arc in one reasoning window, which is the only way to detect multi-month causal patterns (e.g., a March 2019 API change causing a June 2019 bug cluster).

3. Structured output drives the UI deterministically

The JSON schema is strict (Pydantic v2 validated). If Gemma returns valid JSON, the timeline renders. If not, the error is surfaced honestly. No post-processing guesswork.

4. Privacy-first by design

Git history contains proprietary code, unreleased feature names, security patches, and competitive intelligence in commit messages. CodeDNA passes everything under your own API key. Zero data retention. This is not a UX choice — it's the only architecture engineering teams will actually trust with real repositories.

Demo: React Hooks Era (2018–2019)

I ran CodeDNA on React's public git history during the Hooks transition — one of the most architecturally significant periods in any major open-source project.

What Gemma 4 found:

2018-07: Feature burst — Scheduler time-slicing and Fiber pool infrastructure added (5 commits, Scheduler.js dominant)
2018-09–10: Pivot — React.lazy, Suspense, and createContext v2 introduced across 6 commits
2019-01–02: Stability → Bug storm — 4 rapid fixes for useRef and useEffect infinite loops following the 16.8.0 release
2019-05: Feature burst — useTransition, useDeferredValue, unstable_createRoot (5 commits, ReactFiberHooks.js dominant)

Health score: 58/100 — justified by 21% bug-fix ratio, two high-severity bug storms in 2019-01 and 2019-02, partially offset by clear feature burst eras and high commit message quality (83% of commits have descriptive messages ≥8 words).

Quick Start

# Clone
git clone https://github.com/acchasujal/codeDNA.git
cd codeDNA

# Backend
cd backend
pip install -r requirements.txt
cp .env.example .env
# Add your Google AI Studio key as GEMINI_API_KEY
uvicorn main:app --reload

# Frontend (new terminal)
cd ../frontend
npm install
npm run dev
# Opens http://localhost:5173

Get your git log:

# Any repo you have locally:
git log --stat | head -3000 > my_history.txt
# Upload the .txt file or paste directly

# React demo (what the screenshots use):
git clone https://github.com/facebook/react
cd react
git log --stat --after="2018-09-01" --before="2019-06-01" | head -3000 > react_hooks.txt

.env.example:

GEMINI_API_KEY=your_google_ai_studio_key_here
GEMMA_MODEL=models/gemma-4-26b-a4b-it
MAX_COMMITS=180
OPENROUTER_API_KEY=optional_for_fallback

Technical Highlights

Multi-provider fallback chain — At startup, CodeDNA queries the OpenRouter API to dynamically discover available Gemma models and builds a priority chain. Google AI Studio is primary; OpenRouter provides up to 9 additional Gemma models as fallback. The chain is logged at startup so you always know what's running.

Preprocessor intelligence — Before any model call, the preprocessor extracts a MONTHLY_COMMIT_COUNTS histogram and TOP_CHANGED_FILES list from the raw git log. This ground-truth metadata is injected directly into the prompt, so Gemma cites real numbers ("commit count tripled to 47 in March 2019") rather than inferring from prose.

Anti-fluff enforcement — The system prompt contains an explicit FORBIDDEN_PHRASES list ("technical debt", "the team", "seems like", "likely indicates", and 12 others). Every insight must cite a specific commit hash, date, file name, or count — or say "insufficient evidence."

Honest confidence — Every milestone includes a confidence field (high | medium | low) with a justification sentence. Low-quality commit histories get a QUALITY_WARNING header and produce conservative, clearly-labeled micro-analyses rather than dramatic fabrications.

The Reasoning System Prompt

The full prompt that drives Step 1 (the reasoning stream):

See the REASONING_SYSTEM_PROMPT

You are CodeDNA, a concise git-history analyst.
Produce a clean public report, not private reasoning.

Rules:
- Output markdown prose only. No JSON. No code fences.
- No meta-commentary, self-correction, planning notes, or internal monologue.
- Never write "wait", "I used", "the prompt says", or any phrase from this
  forbidden list: technical debt, the team, engineers, developers, working hard,
  prioritized, decided to, management, business logic, seems like, appears to,
  it looks like, likely indicates, possibly, perhaps, might have.
- Use only observable evidence from the metadata header and commit log.
- Cite commit hashes, dates/months, file names, commit counts, and ratios
  whenever making a claim.
- If evidence is thin, say "insufficient evidence" and name the missing signal.
  Do not invent intent, people, architecture, risk, or causality.
- Keep every sentence useful. Avoid repetition.

Format exactly:
## Overview
Two to three factual sentences covering commit count, date range,
most changed files or file types, and BUG_FIX_RATIO.

## Milestones
Four to eight bullets when evidence allows. Each bullet:
- **YYYY-MM** - type - concise evidence sentence with commit hash(es),
  changed file(s), and count(s).
  Allowed types: bug_storm, refactor, pivot, feature_burst, stability.

## Health Signals
Three bullets: one positive signal, one negative signal, one confidence note.
Each bullet must cite evidence.

## Churn Summary
One concise sentence naming the peak period and the files or commits behind it.

The Hardest Problem: Making Gemma Say Something Real

The biggest technical challenge wasn't the UI, the SSE streaming, or the fallback chain. It was getting Gemma 4 to produce specific, verifiable insights instead of confident-sounding nonsense.

Here's what the first version produced on a repo with commits like "fix navbar bug", "update readme", "refactor utils":

"This period reflects a time of organizational growth and technical maturity. The team worked hard to address accumulated complexity while balancing feature delivery with stability concerns."

That output is useless. It contains zero commit references, zero file names, zero numbers. A junior consultant could have written it without looking at the code. A judge would mark it dead on arrival.

Three iterations to fix it.

Iteration 1 — Forbidden phrases list.
Added an explicit blocklist to the system prompt:

FORBIDDEN PHRASES — never use these:
"technical debt", "the team", "engineers", "developers",
"working hard", "prioritized", "decided to", "management",
"seems like", "appears to", "it looks like", "likely indicates",
"possibly", "perhaps", "might have"

The output became less flowery but still vague: "There were many fixes in early 2019." How many? Which files? Which period exactly?

Iteration 2 — Mandatory evidence citation.
Added to the prompt: "Every milestone description must cite at least one commit hash, date/month, file name, count, or ratio. If you cannot cite evidence, write 'insufficient evidence' and stop."

Better, but Gemma was still counting commits itself — and sometimes miscounting.

Iteration 3 — Pre-computed metadata injection (the breakthrough).

Instead of asking Gemma to figure out what happened, I tell it what happened and ask it to interpret it.

The preprocessor now builds a metadata header before any model call:

# META: 180tot|180ana|Q:HIGH|Fx:21%|Vg:0%
# DATES: 2019-06-20..2018-07-02
# MONTHS: 2018-09:3,2018-10:3,2019-01:4,2019-02:2,2019-05:5,2019-06:2
# HOTSPOTS: ReactFiberHooks.js:8,Scheduler.js:5,package.json:4

Now instead of asking "were there a lot of fixes in early 2019?", I'm asking "given that commits spiked to 5 in 2019-05 and ReactFiberHooks.js was modified 8 times — what does that pattern indicate?"

The model's job shifted from counting to interpreting. The output became:

"2019-01 through 2019-02 saw 6 commits (bf32345, ca53456, cb54567, cc55678, cd56789, ce57890) concentrated in ReactFiberHooks.js and ReactFiberBeginWork.js. ca53456 fixed an incorrect useRef identity across re-renders; cb54567 resolved an infinite useEffect loop triggered by object dependency comparison. The 16.8.0 release on 2019-02-06 (cd56789) was followed two days later by ce57890 — a hooks state regression fix, indicating at least one edge case reached production."

Every claim is checkable. Every hash is real. That's the difference.

The map-reduce split was the second breakthrough.

Asking Gemma 4 to simultaneously produce flowing Thinking Mode prose and valid JSON produces neither well. I split it:

Step 1 (stream): REASONING_SYSTEM_PROMPT — output clean markdown only, no JSON, no schema constraints
Step 2 (analyze): JSON_SYSTEM_PROMPT — read the reasoning trace, output strict AnalysisResult JSON The reasoning panel now shows actual analytical prose. The timeline data is reliably structured. Both improved dramatically when separated.

Limitations (Honest)

Works best with 100–200 commits. Very large histories (1000+) need more aggressive preprocessing.
Commit message quality determines insight quality. A repo full of "fix", "wip", "update" commits will produce low-confidence analysis (CodeDNA tells you this clearly rather than inventing drama).
The reasoning stream uses the primary model; fallback models handle JSON structuring. If all Google models are slow, the stream may be empty — but the timeline will still render from the fallback result.
Currently runs locally only. Cloud deployment would require careful handling of API key security.

What's Next

Actual GitHub API integration (analyze any public repo by URL, no manual log export)
Branch comparison (main vs. feature branch health)
Team velocity metrics (authors per period, bus factor analysis)
CI/CD integration — run CodeDNA as a PR check to flag risky commit patterns

Built solo in 4 days for the Google Gemma 4 Challenge. Every commit in this repo is real — you can run CodeDNA on its own history.

GitHub:

acchasujal / codeDNA

CodeDNA — AI Codebase Archaeologist

Feed Gemma 4 your git history. Discover exactly when — and why — your codebase evolved.

Every codebase has a turning point. The moment before is clean commits and clear intent The moment after is hotfixes, reverts, and growing entropy. CodeDNA finds it.

What It Does

Maps your codebase history with Gemma 4 — up to 400 commits, preprocessed and compressed for maximum analytical signal. The preprocessor extracts monthly commit histograms and per-file change frequency before sending to the model, so insights are grounded in observable data.
Returns a structured archaeological report — health score with transparent breakdown, milestone timeline (bug storms, refactors, pivots, feature bursts), and key metrics. Every claim cites a specific commit hash, date, or metadata value.
Streams Gemma 4's live reasoning — watch the Thinking Mode trace in real-time as the model identifies causal patterns across years of history. Verifiable: the…

View on GitHub

DEV Community

CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode

What I Built

acchasujal / codeDNA

CodeDNA — AI Codebase Archaeologist

What It Does

The Problem It Solves

Live Demo

Core Features

Screenshots

Architecture

Why Gemma 4 — Not "Just Any LLM"

Demo: React Hooks Era (2018–2019)

Quick Start

Technical Highlights

The Reasoning System Prompt

The Hardest Problem: Making Gemma Say Something Real

Limitations (Honest)

What's Next

acchasujal / codeDNA

CodeDNA — AI Codebase Archaeologist

What It Does

Top comments (0)