orenlab

Posted on Mar 26 • Originally published at habr.com

I built a baseline-aware Python code health tool for CI and AI-assisted coding

#ai #opensource #devops #python

I built a baseline-aware Python code health tool for CI and AI-assisted coding

If you write Python with AI tools today, you’ve probably felt this already:

the code usually works, tests may pass, lint is green, but the structure gets worse in ways that are hard to notice
until the repository starts fighting back.

Not in one dramatic commit. More like this:

the same logic gets rewritten in slightly different ways across multiple files;
helper functions quietly grow until nobody wants to touch them;
coupling increases one import at a time;
framework callbacks look unused even when they are not;
dead code accumulates because generated code tends to leave leftovers behind.

That is the problem space I built CodeClone for.

CodeClone 2.0.0b1 is the first version where the tool really matches the model I wanted from the beginning: not just
“find some clones,” but track structural code health over time, in CI, with a trusted baseline.

This post is an introduction to that version and the design choices behind it.

First: I know the ecosystem is not empty

I’m not pretending this is the first serious tool in this space.

There are already strong tools around adjacent problems:

SonarQube / SonarCloud for broad code quality, governance, and quality gates
PMD CPD as one of the classic copy/paste detectors
jscpd for practical duplicate-code scanning across multiple languages
Vulture for Python dead-code detection
Radon / Xenon for complexity-related checks
and newer tools like pyscn, which also move toward structural/code-health analysis for Python

That matters, because I don’t think useful tools should be framed as “everything before this was wrong.”

CodeClone is not trying to replace all of the above.

Its angle is narrower and, I think, pretty specific:

structural duplication is a first-class signal;
baseline-aware governance is the center of the workflow, not an extra feature;
deterministic output is non-negotiable;
and the UI/report layer is not allowed to invent conclusions the analysis engine did not produce.

If I had to summarize the difference in one sentence, it would be this:

CodeClone is built around separating accepted debt from new regressions.

That sounds simple, but it changes the entire shape of the tool.

Why I think this matters more now

AI coding assistants are genuinely useful. I use them. They speed things up.

But they also change the failure mode of a codebase.

The biggest risk is often not “the AI wrote something syntactically invalid.” That part is easy to catch.

The harder problem is that AI tools are very good at producing locally plausible code:

one more handler,
one more service method,
one more variant of the same logic,
one more utility that overlaps with three existing ones.

Each individual change looks reasonable.

The repository as a whole gets worse.

That is why I think structural analysis is especially useful for AI-assisted teams. If you are using Claude Code,
Cursor, Codex, or similar tools, the important question is often not:

“Is this code valid?”

but:

“Did this change make the repository structurally worse?”

That is exactly the question a baseline-aware tool can answer well.

What CodeClone focuses on

At the core, CodeClone analyzes Python projects and looks at structural signals such as:

function clones
block clones
segment clones
structural findings like duplicated branch families
dead code
complexity
coupling
cohesion
dependency cycles
a combined health score

The outputs come in multiple formats:

HTML
JSON
Markdown
SARIF
Text

But they all come from a single canonical report document. That was important to me because I wanted consistency between
machine-readable outputs and the human-facing report.

The key idea: baseline-aware governance

This is the part I care about most.

A lot of code quality tools can tell you that your repository has problems. That is useful, but it is not enough for
real CI.

In a non-trivial codebase, there is usually historical debt:

old duplication
old complexity hotspots
old dead code
old architectural compromises

If a tool only says “you have 400 problems,” that doesn’t help much. Most teams will either ignore it or disable it.

CodeClone is designed around a different model:

take the current state as a baseline;
trust and validate that baseline explicitly;
keep accepted debt visible;
block new regressions.

That makes the tool much more usable in practice.

Instead of asking teams to become perfect overnight, it asks a much more realistic question:

“Did this branch make the codebase worse than the state we already accepted?”

That is the main reason I describe CodeClone as baseline-aware before I describe it as a clone detector.

What changed in 2.0.0b1

Version 2.0.0b1 is the point where that model became much more complete.

1. A real code-health model

CodeClone now computes a health score from multiple dimensions:

clones
complexity
coupling
cohesion
dead code
dependencies
coverage

I did not want this to become a decorative “AI score.” The point is not the number by itself; the point is whether the score can be traced back to concrete structural reasons.

That is why the new HTML overview is built around:

a health gauge
KPI cards
an executive summary
source-scope breakdown
a health profile chart

The goal is to answer not only “what failed?” but also “what should I look at first?”

2. Baseline became a first-class contract

In 2.0.0b1, baseline handling is no longer just a convenience file.

It is now a stricter contract with:

trust semantics
compatibility checks
integrity fields
deterministic payload handling
unified clone + metrics baseline flow

That matters a lot in CI. If the baseline itself is not trustworthy, the entire gating story becomes shaky.

3. Dead code arrived, but with explicit suppressions

Dead-code analysis is now part of the model, but I did not want to solve dynamic Python behavior with magic heuristics.

So for intentional runtime-driven cases, CodeClone uses explicit inline suppressions:

# codeclone: ignore[dead-code]
def handle_exception(exc: Exception) -> None:
    ...

or:

class Middleware:  # codeclone: ignore[dead-code]
    ...

That is a deliberate design choice.

I would rather have a local, visible policy mechanism than silently broaden the detector until it becomes hard to reason
about.

4. SARIF was added in 2.0.0b1

This is worth calling out explicitly because I do not want to misrepresent the release: SARIF is new in 2.0.0b1.

I wanted it to be useful beyond “technically yes, there is a SARIF file.”

So the current implementation is designed to work better with IDE/code-scanning workflows, including:

%SRCROOT% anchoring
artifacts
richer rule metadata
location alignment
baseline state for clone results when applicable

5. Detection thresholds got more practical

The default thresholds are now more permissive than before.

That means CodeClone filters out less and analyzes more. For example:

function-level min_loc was lowered from 15 to 10
block thresholds were relaxed
segment thresholds were relaxed

This does increase analysis volume, so it has performance implications. But it also makes the tool more honest. It stops politely ignoring a bunch of small-but-real structural issues.

Why this is useful for AI-generated code

I want to be careful here, because “AI code quality” can turn into hand-wavy marketing really fast.

I am not claiming that CodeClone can detect whether a human or an LLM wrote a piece of code.

That is not the point.

The point is simpler:

AI-assisted development tends to amplify a certain class of structural problems:

repeated patterns with small variations
copy-pasted orchestration logic
overgrown functions
dead callback surfaces
architecture drift that happens in many individually “reasonable” steps

CodeClone is a good fit for that environment because it is:

structural rather than stylistic
deterministic enough for CI
baseline-aware, so it can focus on regression control
explicit about suppressions instead of hiding runtime ambiguity behind heuristics

If your team ships a lot of AI-assisted code, the practical question is not “is AI bad?” It is:

“How do we keep the repository readable, stable, and governable while code is being produced faster?”

That is the problem I think CodeClone helps with.

What it is not

I think first posts do better when they are honest about scope, so here is the short version.

CodeClone is not:

a replacement for SonarQube
a style linter
a security scanner
a magic AI-code detector
a claim that every other tool got the problem wrong

It is a Python-focused, baseline-aware, structural analysis tool with a strong CI orientation.

And yes, it is still beta.

Quick start

If you want to try the prerelease:

pip install --pre codeclone

or:

uv tool install --pre codeclone==2.0.0b1

Then:

codeclone .
codeclone . --html
codeclone . --ci

If you want to adopt the baseline workflow:

codeclone . --update-baseline
codeclone . --ci

Where to look next

Docs: https://orenlab.github.io/codeclone/
Live sample report: https://orenlab.github.io/codeclone/examples/report/
PyPI: https://pypi.org/project/codeclone/
GitHub: https://github.com/orenlab/codeclone

Closing thought

If I had to summarize CodeClone 2.0.0b1 in one line, it would be this:

It is the point where the project stopped being “just a clone detector” and became a baseline-aware structural quality
tool for Python CI.

That is the direction I wanted from the beginning.

And with AI-assisted development becoming normal, I think tools in this category are becoming more important, not less.

If this sounds useful, I would be glad to hear what breaks, what feels noisy, what you would want from the CI workflow,
and what kinds of repositories you would actually trust a tool like this on.

DEV Community

I built a baseline-aware Python code health tool for CI and AI-assisted coding

I built a baseline-aware Python code health tool for CI and AI-assisted coding

First: I know the ecosystem is not empty

Why I think this matters more now

What CodeClone focuses on

The key idea: baseline-aware governance

What changed in 2.0.0b1

1. A real code-health model

2. Baseline became a first-class contract

3. Dead code arrived, but with explicit suppressions

4. SARIF was added in 2.0.0b1

5. Detection thresholds got more practical

Why this is useful for AI-generated code

What it is not

Quick start

Where to look next

Closing thought

Top comments (0)