<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gokul P</title>
    <description>The latest articles on DEV Community by Gokul P (@gocoolp).</description>
    <link>https://dev.to/gocoolp</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3975028%2Fe4335aa7-df22-461a-abaa-c1ae051786b1.png</url>
      <title>DEV Community: Gokul P</title>
      <link>https://dev.to/gocoolp</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gocoolp"/>
    <language>en</language>
    <item>
      <title>I built a tool to measure how much of my codebase is AI-written — here's how it works</title>
      <dc:creator>Gokul P</dc:creator>
      <pubDate>Tue, 09 Jun 2026 01:57:57 +0000</pubDate>
      <link>https://dev.to/gocoolp/i-built-a-tool-to-measure-how-much-of-my-codebase-is-ai-written-heres-how-it-works-12hg</link>
      <guid>https://dev.to/gocoolp/i-built-a-tool-to-measure-how-much-of-my-codebase-is-ai-written-heres-how-it-works-12hg</guid>
      <description>&lt;p&gt;Ask your team what percentage of your production codebase was written by an AI last quarter. You'll get silence — not because nobody cares, but because there's no way to measure it.&lt;/p&gt;

&lt;p&gt;We instrument everything else. Deployments, latency, error rates, test coverage. But code provenance? Nothing. Git blame still assumes a human wrote every line.&lt;/p&gt;

&lt;p&gt;I built aigit to fix this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As AI coding tools became part of my workflow, I noticed something uncomfortable: I had no visibility into the quality or longevity of AI-generated code versus hand-written code. Was AI code churning faster? Did it correlate with bug fixes? Which files were effectively AI-authored?&lt;/p&gt;

&lt;p&gt;These aren't philosophical questions — they're engineering metrics that every team should be tracking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Session ingestion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code stores every session as JSONL under ~/.claude/projects//. Each assistant message contains either markdown fenced code blocks or — more importantly — tool_use blocks from Write and Edit calls. That's where the actual code written to disk lives.&lt;/p&gt;

&lt;h1&gt;
  
  
  Extract from both text responses AND Write/Edit tool calls
&lt;/h1&gt;

&lt;p&gt;if block.get("type") == "tool_use" and block.get("name") in ("Write", "Edit"):&lt;br&gt;
    code_text = inp.get("content") or inp.get("new_string", "")&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Tiered fuzzy matching&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before hashing, code is normalized — comments stripped, whitespace collapsed, lowercased. Then matched against git diff hunks at three tiers:&lt;/p&gt;

&lt;p&gt;Exact SHA-256 match      → confidence 1.0  (verbatim copy-paste)&lt;br&gt;
TLSH distance &amp;lt; 30       → confidence 0.9  (lightly reformatted)&lt;br&gt;
TLSH distance &amp;lt; 100      → confidence 0.7  (substantially edited)&lt;/p&gt;

&lt;p&gt;TLSH (Trend Micro Locality Sensitive Hash) is designed for fuzzy file matching — it measures structural similarity rather than exact content, which is exactly what you need when AI code gets tweaked before committing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Attribution overlay&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Rather than rebuilding line provenance from scratch, aigit piggybacks on git blame --porcelain. It already tracks lines across renames, rebases, and cherry-picks. We just annotate its output:&lt;/p&gt;

&lt;p&gt;$ aigit blame src/api/routes.py&lt;/p&gt;

&lt;p&gt;4 a1b2c3d  [claude 100%] def get_user(user_id: int):&lt;br&gt;
  5 a1b2c3d  [claude 100%]     return db.query(User).get(user_id)&lt;br&gt;
  6 f9e8d7c&lt;br&gt;
  7 f9e8d7c                def delete_user(user_id: int):&lt;br&gt;
  8 f9e8d7c                    db.query(User).filter_by(id=user_id).delete()&lt;/p&gt;

&lt;p&gt;$ aigit stats&lt;/p&gt;

&lt;p&gt;src/api/routes.py     73% AI  ████████████░░░░&lt;br&gt;
  src/core/engine.py    51% AI  ████████░░░░░░░░&lt;/p&gt;

&lt;p&gt;Repo-wide: 61% AI-attributed&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I found dogfooding it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I ran aigit on itself — the entire codebase was built in a single Claude Code session. Result: 89.8% AI-attributed across 2,171 lines. The 10.2% that wasn't AI-attributed were the lines I added manually to fix bugs the AI introduced. Which is itself an interesting metric.&lt;br&gt;
**&lt;br&gt;
Current limitations**&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code only&lt;/strong&gt; — the provider architecture is pluggable, but Cursor and Copilot support isn't built yet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Requires local session logs&lt;/strong&gt; — tools that don't store sessions locally (Devin, cloud-based agents) can't be supported without an API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cold start&lt;/strong&gt; — existing commits before you started using aigit won't be attributed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Install and try it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pip install getaigit&lt;br&gt;
cd your-repo&lt;br&gt;
aigit index&lt;br&gt;
aigit blame src/yourfile.py&lt;br&gt;
aigit stats&lt;/p&gt;

&lt;p&gt;The attribution database lives at .aigit/attribution.db — commit it to share attribution data across your team.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/getaigit/getaigit" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;br&gt;
Curious what metrics you'd want to see beyond AI% and churn rate — and whether you're seeing the same gap in your teams.&lt;/p&gt;

</description>
      <category>git</category>
      <category>ai</category>
      <category>opensource</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
