Ask your team what percentage of your production codebase was written by an AI last quarter. You'll get silence — not because nobody cares, but because there's no way to measure it.
We instrument everything else. Deployments, latency, error rates, test coverage. But code provenance? Nothing. Git blame still assumes a human wrote every line.
I built aigit to fix this.
The problem
As AI coding tools became part of my workflow, I noticed something uncomfortable: I had no visibility into the quality or longevity of AI-generated code versus hand-written code. Was AI code churning faster? Did it correlate with bug fixes? Which files were effectively AI-authored?
These aren't philosophical questions — they're engineering metrics that every team should be tracking.
How it works
Step 1: Session ingestion
Claude Code stores every session as JSONL under ~/.claude/projects//. Each assistant message contains either markdown fenced code blocks or — more importantly — tool_use blocks from Write and Edit calls. That's where the actual code written to disk lives.
Extract from both text responses AND Write/Edit tool calls
if block.get("type") == "tool_use" and block.get("name") in ("Write", "Edit"):
code_text = inp.get("content") or inp.get("new_string", "")
Step 2: Tiered fuzzy matching
Before hashing, code is normalized — comments stripped, whitespace collapsed, lowercased. Then matched against git diff hunks at three tiers:
Exact SHA-256 match → confidence 1.0 (verbatim copy-paste)
TLSH distance < 30 → confidence 0.9 (lightly reformatted)
TLSH distance < 100 → confidence 0.7 (substantially edited)
TLSH (Trend Micro Locality Sensitive Hash) is designed for fuzzy file matching — it measures structural similarity rather than exact content, which is exactly what you need when AI code gets tweaked before committing.
Step 3: Attribution overlay
Rather than rebuilding line provenance from scratch, aigit piggybacks on git blame --porcelain. It already tracks lines across renames, rebases, and cherry-picks. We just annotate its output:
$ aigit blame src/api/routes.py
4 a1b2c3d [claude 100%] def get_user(user_id: int):
5 a1b2c3d [claude 100%] return db.query(User).get(user_id)
6 f9e8d7c
7 f9e8d7c def delete_user(user_id: int):
8 f9e8d7c db.query(User).filter_by(id=user_id).delete()
$ aigit stats
src/api/routes.py 73% AI ████████████░░░░
src/core/engine.py 51% AI ████████░░░░░░░░
Repo-wide: 61% AI-attributed
What I found dogfooding it
I ran aigit on itself — the entire codebase was built in a single Claude Code session. Result: 89.8% AI-attributed across 2,171 lines. The 10.2% that wasn't AI-attributed were the lines I added manually to fix bugs the AI introduced. Which is itself an interesting metric.
**
Current limitations**
- Claude Code only — the provider architecture is pluggable, but Cursor and Copilot support isn't built yet
- Requires local session logs — tools that don't store sessions locally (Devin, cloud-based agents) can't be supported without an API
- Cold start — existing commits before you started using aigit won't be attributed
Install and try it
pip install getaigit
cd your-repo
aigit index
aigit blame src/yourfile.py
aigit stats
The attribution database lives at .aigit/attribution.db — commit it to share attribution data across your team.
GitHub:
Curious what metrics you'd want to see beyond AI% and churn rate — and whether you're seeing the same gap in your teams.
Top comments (0)