I built a baseline-aware Python code health tool for CI and AI-assisted coding
If you write Python with AI tools today, you’ve probably felt this already:
the code usually works, tests may pass, lint is green, but the structure gets worse in ways that are hard to notice
until the repository starts fighting back.
Not in one dramatic commit. More like this:
- the same logic gets rewritten in slightly different ways across multiple files;
- helper functions quietly grow until nobody wants to touch them;
- coupling increases one import at a time;
- framework callbacks look unused even when they are not;
- dead code accumulates because generated code tends to leave leftovers behind.
That is the problem space I built CodeClone for.
CodeClone 2.0.0b1 is the first version where the tool really matches the model I wanted from the beginning: not just
“find some clones,” but track structural code health over time, in CI, with a trusted baseline.
This post is an introduction to that version and the design choices behind it.
First: I know the ecosystem is not empty
I’m not pretending this is the first serious tool in this space.
There are already strong tools around adjacent problems:
- SonarQube / SonarCloud for broad code quality, governance, and quality gates
- PMD CPD as one of the classic copy/paste detectors
- jscpd for practical duplicate-code scanning across multiple languages
- Vulture for Python dead-code detection
- Radon / Xenon for complexity-related checks
- and newer tools like pyscn, which also move toward structural/code-health analysis for Python
That matters, because I don’t think useful tools should be framed as “everything before this was wrong.”
CodeClone is not trying to replace all of the above.
Its angle is narrower and, I think, pretty specific:
- structural duplication is a first-class signal;
- baseline-aware governance is the center of the workflow, not an extra feature;
- deterministic output is non-negotiable;
- and the UI/report layer is not allowed to invent conclusions the analysis engine did not produce.
If I had to summarize the difference in one sentence, it would be this:
CodeClone is built around separating accepted debt from new regressions.
That sounds simple, but it changes the entire shape of the tool.
Why I think this matters more now
AI coding assistants are genuinely useful. I use them. They speed things up.
But they also change the failure mode of a codebase.
The biggest risk is often not “the AI wrote something syntactically invalid.” That part is easy to catch.
The harder problem is that AI tools are very good at producing locally plausible code:
- one more handler,
- one more service method,
- one more variant of the same logic,
- one more utility that overlaps with three existing ones.
Each individual change looks reasonable.
The repository as a whole gets worse.
That is why I think structural analysis is especially useful for AI-assisted teams. If you are using Claude Code,
Cursor, Codex, or similar tools, the important question is often not:
“Is this code valid?”
but:
“Did this change make the repository structurally worse?”
That is exactly the question a baseline-aware tool can answer well.
What CodeClone focuses on
At the core, CodeClone analyzes Python projects and looks at structural signals such as:
- function clones
- block clones
- segment clones
- structural findings like duplicated branch families
- dead code
- complexity
- coupling
- cohesion
- dependency cycles
- a combined health score
The outputs come in multiple formats:
- HTML
- JSON
- Markdown
- SARIF
- Text
But they all come from a single canonical report document. That was important to me because I wanted consistency between
machine-readable outputs and the human-facing report.
The key idea: baseline-aware governance
This is the part I care about most.
A lot of code quality tools can tell you that your repository has problems. That is useful, but it is not enough for
real CI.
In a non-trivial codebase, there is usually historical debt:
- old duplication
- old complexity hotspots
- old dead code
- old architectural compromises
If a tool only says “you have 400 problems,” that doesn’t help much. Most teams will either ignore it or disable it.
CodeClone is designed around a different model:
- take the current state as a baseline;
- trust and validate that baseline explicitly;
- keep accepted debt visible;
- block new regressions.
That makes the tool much more usable in practice.
Instead of asking teams to become perfect overnight, it asks a much more realistic question:
“Did this branch make the codebase worse than the state we already accepted?”
That is the main reason I describe CodeClone as baseline-aware before I describe it as a clone detector.
What changed in 2.0.0b1
Version 2.0.0b1 is the point where that model became much more complete.
1. A real code-health model
CodeClone now computes a health score from multiple dimensions:
- clones
- complexity
- coupling
- cohesion
- dead code
- dependencies
- coverage
I did not want this to become a decorative “AI score.” The point is not the number by itself; the point is whether the score can be traced back to concrete structural reasons.
That is why the new HTML overview is built around:
- a health gauge
- KPI cards
- an executive summary
- source-scope breakdown
- a health profile chart
The goal is to answer not only “what failed?” but also “what should I look at first?”
2. Baseline became a first-class contract
In 2.0.0b1, baseline handling is no longer just a convenience file.
It is now a stricter contract with:
- trust semantics
- compatibility checks
- integrity fields
- deterministic payload handling
- unified clone + metrics baseline flow
That matters a lot in CI. If the baseline itself is not trustworthy, the entire gating story becomes shaky.
3. Dead code arrived, but with explicit suppressions
Dead-code analysis is now part of the model, but I did not want to solve dynamic Python behavior with magic heuristics.
So for intentional runtime-driven cases, CodeClone uses explicit inline suppressions:
# codeclone: ignore[dead-code]
def handle_exception(exc: Exception) -> None:
...
or:
class Middleware: # codeclone: ignore[dead-code]
...
That is a deliberate design choice.
I would rather have a local, visible policy mechanism than silently broaden the detector until it becomes hard to reason
about.
4. SARIF was added in 2.0.0b1
This is worth calling out explicitly because I do not want to misrepresent the release: SARIF is new in 2.0.0b1.
I wanted it to be useful beyond “technically yes, there is a SARIF file.”
So the current implementation is designed to work better with IDE/code-scanning workflows, including:
-
%SRCROOT%anchoring - artifacts
- richer rule metadata
- location alignment
- baseline state for clone results when applicable
5. Detection thresholds got more practical
The default thresholds are now more permissive than before.
That means CodeClone filters out less and analyzes more. For example:
- function-level
min_locwas lowered from15to10 - block thresholds were relaxed
- segment thresholds were relaxed
This does increase analysis volume, so it has performance implications. But it also makes the tool more honest. It stops politely ignoring a bunch of small-but-real structural issues.
Why this is useful for AI-generated code
I want to be careful here, because “AI code quality” can turn into hand-wavy marketing really fast.
I am not claiming that CodeClone can detect whether a human or an LLM wrote a piece of code.
That is not the point.
The point is simpler:
AI-assisted development tends to amplify a certain class of structural problems:
- repeated patterns with small variations
- copy-pasted orchestration logic
- overgrown functions
- dead callback surfaces
- architecture drift that happens in many individually “reasonable” steps
CodeClone is a good fit for that environment because it is:
- structural rather than stylistic
- deterministic enough for CI
- baseline-aware, so it can focus on regression control
- explicit about suppressions instead of hiding runtime ambiguity behind heuristics
If your team ships a lot of AI-assisted code, the practical question is not “is AI bad?” It is:
“How do we keep the repository readable, stable, and governable while code is being produced faster?”
That is the problem I think CodeClone helps with.
What it is not
I think first posts do better when they are honest about scope, so here is the short version.
CodeClone is not:
- a replacement for SonarQube
- a style linter
- a security scanner
- a magic AI-code detector
- a claim that every other tool got the problem wrong
It is a Python-focused, baseline-aware, structural analysis tool with a strong CI orientation.
And yes, it is still beta.
Quick start
If you want to try the prerelease:
pip install --pre codeclone
or:
uv tool install --pre codeclone==2.0.0b1
Then:
codeclone .
codeclone . --html
codeclone . --ci
If you want to adopt the baseline workflow:
codeclone . --update-baseline
codeclone . --ci
Where to look next
- Docs: https://orenlab.github.io/codeclone/
- Live sample report: https://orenlab.github.io/codeclone/examples/report/
- PyPI: https://pypi.org/project/codeclone/
- GitHub: https://github.com/orenlab/codeclone
Closing thought
If I had to summarize CodeClone 2.0.0b1 in one line, it would be this:
It is the point where the project stopped being “just a clone detector” and became a baseline-aware structural quality
tool for Python CI.
That is the direction I wanted from the beginning.
And with AI-assisted development becoming normal, I think tools in this category are becoming more important, not less.
If this sounds useful, I would be glad to hear what breaks, what feels noisy, what you would want from the CI workflow,
and what kinds of repositories you would actually trust a tool like this on.
Top comments (0)