I built a CLI tool a few months ago that does one thing: it scans a codebase and estimates what percentage of the code was likely written by AI versus a human.
I called it ai-score. I was proud of it. I wrote a blog post, got some stars, moved on.
Last week I ran it on everything.
The Tool
If you haven't seen it, ai-score works by running a combination of pattern analysis — structural regularity, comment-to-code ratios, naming consistency, docstring presence — against a set of heuristics trained on code I labeled by hand. It's not perfect. It's not trying to be. But it catches the things I've noticed about my own AI-assisted code: the eerily even function lengths, the variable names that are always exactly descriptive enough, the error handling that covers every case but somehow feels hollow.
You can install it and run it on your own repos:
pip install ai-score
ai-score ./your-project
Source: github.com/LakshmiSravyaVedantham/ai-score
The output looks like this:
$ ai-score ./skill_building/backend
Scanning 23 files across 1,847 lines...
AI likelihood score: 81%
High-signal files:
main.py 91%
pipeline.py 88%
rag/__init__.py 74%
Human-leaning files:
requirements.txt 12%
runtime.txt 8%
Simple. Brutal.
What I Found
I ran it across six repos I've actively worked on over the last six months.
| Project | Files | Lines | AI Score |
|---|---|---|---|
| skill_building (RAG finance tutor) | 23 | 1,847 | 81% |
| Salestrend (AI video analytics) | 18 | 2,103 | 89% |
| api_scaffold (internal tool) | 9 | 614 | 71% |
| data_pipeline_utils | 14 | 988 | 76% |
| personal_portfolio_api | 11 | 731 | 83% |
| ai-score itself | 16 | 1,204 | 94% |
I stared at that last line for a while.
The tool I built to measure AI writing — the one I use as a portfolio piece, the one I explained in detail at that meetup, the one I feel most technically confident talking about — scored 94% on its own detector.
The Part I Didn't Expect to Feel
I want to be honest here because I've seen a lot of takes about AI-assisted coding that land in two easy camps: "AI is cheating and your skills are rotting" or "AI is a superpower, ship faster, touch grass." I'm not making either argument.
What I felt was something quieter and harder to name.
I scrolled through main.py in the ai-score repo. I remembered writing it. I remember the afternoon, I remember having two tabs of Claude open, I remember copy-pasting the pattern-matching logic and tweaking it. And looking at it now — the docstrings, the type hints on every function, the clean separation of concerns, the way the error messages are worded — I genuinely cannot tell which decisions were mine.
Did I choose to name that function compute_structural_regularity? Did Claude? Does it matter?
I think it matters a little. Not because the name is wrong — it's a good name — but because I want to know what I reach for when I'm thinking alone. And I've spent enough time in the passenger seat that I'm not sure I remember.
The Specific Things That Got Me
It's not the big architectural decisions. Those feel like mine — I decided to use a heuristic approach instead of a model, I decided against a web UI, I designed the scoring weights myself (mostly).
It's the small things.
The variable names. Every single one is clean and accurate and professional. likelihood_threshold. file_metadata. token_pattern_score. They're all correct. None of them are the slightly weird names I used to write, the ones that made sense only to me, the ones that sometimes came out better and sometimes came out embarrassing but were always mine.
The tests. I have 34 tests across the ai-score repo. I can explain what every one of them tests. But I'm not sure I would have written tests for half of them unprompted. The discipline is real. The ownership feels partial.
The comments. They're good. They're clear. They're written by someone who was thinking about future readers. That person was me, but I was thinking through Claude's voice.
What I'm Not Saying
I'm not saying I didn't build these things. I did. I made the calls, I debugged the failures, I deployed and re-deployed and argued with myself about tradeoffs. The work is mine.
And I'm not saying I'm going to stop. AI assistance is genuinely useful and the alternative isn't "more authentic code" — the alternative is slower code that might be worse.
But I've been trading something away without naming it, and running my own tool on my own repos finally named it for me.
There's a particular kind of thinking that happens when you're stuck. When you can't figure out what to name something, or you're not sure how to structure this module, or you don't know yet what the error handling should look like. That stuck-ness is uncomfortable. And it used to be where a lot of my judgment got built.
I've been bypassing it. Efficiently. With good results. And I notice something missing when I look at the output.
What I'm Doing About It
Not much, honestly. I don't have a clean resolution here.
I'm trying to write the first draft of things myself more often — even when it's slow, even when it's worse. I'm treating Claude as a reviewer instead of a co-author on some things. I'm paying attention to which decisions feel like mine and which ones I let slide because the suggestion was good enough.
I don't know if that will change my scores. Probably not much. But I think it'll change something.
If you've run any kind of AI-detection on your own work — or if you've thought about it and decided not to — I'm curious what stopped you, or what you found.
Because I think a lot of us know, roughly, what the number would be. The question is whether we want to see it.
Top comments (0)