Kwansub Yun

Posted on Apr 15 • Originally published at flamehaven.space

AI-SLOP Detector v3.5.0 — Every Claim, Verified Against Source Code

#aitools #opensource #codequality #python

I published a LinkedIn post about AI-SLOP Detector's self-calibration system and download numbers. Someone asked the reasonable question: "Can you actually back that up?"

Yes. Here's the source.

This isn't a feature announcement. It's a line-by-line audit of seven claims against the actual codebase. Every VERDICT links to a real file and real line numbers. The repo is public — go check it yourself.

What was claimed

Claim	Verdict
Every scan is recorded	✅ TRUE
Repeat scans become calibration signal	✅ TRUE
Updates only when signal is strong enough	✅ TRUE
Visible policy artifact (`.slopconfig.yaml`)	✅ TRUE
Explicit numeric limits govern calibration	✅ TRUE
Detects empty/stub/phantom/disconnected code	✅ TRUE
~1.4K downloads last week	✅ TRUE

All seven. No fabrications. No inflated numbers. Here's the proof.

Claim 1: "Every scan is recorded"

Source: src/slop_detector/history.py, lines 116–180

def record(self, file_analysis, git_commit=None, git_branch=None, project_id=None) -> None:

Auto-invoked on every CLI run. The only opt-out is --no-history. Each scan writes to SQLite at ~/.slop-detector/history.db and stores:

deficit_score, ldr_score, inflation_score, ddc_usage_ratio
n_critical_patterns, fired_rules
git_commit, git_branch, project_id

Schema is now at v5, auto-migrated on startup through every release from v2.9.0 to v3.5.0.

VERDICT: TRUE. The record() call is real. The schema is versioned. The behavior is not optional.

Claim 2: "Every re-scan becomes signal"

Source: src/slop_detector/history.py, lines 221–246

def count_files_with_multiple_runs(self, project_id=None) -> int:
    # Only files scanned >= 2 times count as calibration events
    SELECT file_path FROM history GROUP BY file_path HAVING COUNT(*) >= 2

Source: src/slop_detector/ml/self_calibrator.py, lines 301–309

def _extract_events(self, project_id=None):
    rows = self._load_history(project_id=project_id)
    by_file = self._group_runs_by_file(rows)

Single-scan files produce no calibration events. Only repeat scans generate improvement or fp_candidate labels. The threshold is hardcoded in SQL, not assumed.

VERDICT: TRUE. The repeat-scan requirement is enforced at the query level, not in documentation.

Claim 3: "Updates only when the signal is strong enough"

Source: src/slop_detector/ml/self_calibrator.py, lines 37–54 (constants) and 251–262 (enforcement)

CONFIDENCE_GAP: float = 0.10   # min gap between #1 and #2 candidate
MIN_IMPROVEMENTS: int = 5       # improvement events required
MIN_FP_CANDIDATES: int = 5      # fp_candidate events required

Gate 1 — confidence gap check (line 251):

if result.confidence_gap < CONFIDENCE_GAP:
    result.status = "insufficient_data"
    result.message = (
        f"Confidence gap {result.confidence_gap:.4f} < {CONFIDENCE_GAP}. "
        f"Candidates are too close — need more history data for reliable calibration."
    )
    return result  # NO UPDATE APPLIED

Gate 2 — score delta check (line 262):

if current_score - winner_score < 0.02:
    result.status = "no_change"  # also does not apply

Two independent guards. Both must pass before any weight update applies.

VERDICT: TRUE. Ambiguous signal is rejected twice before touching configuration.

Claim 4: "Leaves behind a visible policy every time it changes"

Source: src/slop_detector/ml/self_calibrator.py, docstring line 17–18

Return CalibrationResult; optionally write to .slopconfig.yaml via --apply-calibration

When --apply-calibration is passed and status == "ok", optimal weights are written to .slopconfig.yaml. Plain-text YAML. Human-readable. Git-versionable. Every calibration change is a diff.

VERDICT: TRUE. The policy artifact is explicit. You can git blame it.

Claim 5: "Explicit limits govern calibration"

Source: src/slop_detector/ml/self_calibrator.py, lines 37–54

MIN_W: float = 0.10             # minimum allowed weight per dimension
MAX_W: float = 0.65             # maximum allowed weight per dimension
MAX_PURITY_WEIGHT: float = 0.25 # purity ceiling
DOMAIN_TOLERANCE: float = 0.15  # max per-dimension deviation from domain anchor
DOMAIN_DRIFT_LIMIT: float = 0.25 # warn when optimal weight drifts this far
GRID_STEP: int = 20             # 0.05 increment resolution

No ML model. No learned bounds. Every constraint is a named constant with a comment explaining why it exists. The calibration space is a bounded grid, not an open optimization landscape.

VERDICT: TRUE. Every limit is auditable. Nothing is opaque.

Claim 6: "Detects empty implementations, phantom dependencies, disconnected pipelines"

These are the three canonical defect patterns AI code generation produces at scale. Each has a dedicated module.

Defect class	Implementation
Empty/stub functions	`src/slop_detector/metrics/ldr.py` — LDRCalculator detects `pass`, `...`, `raise NotImplementedError`, `TODO`
Phantom/unused imports	`src/slop_detector/metrics/hallucination_deps.py` — AST-based import vs usage analysis via `HallucinatedDependency` dataclass
Disconnected pipelines	`src/slop_detector/metrics/ddc.py` — DDC (Declared Dependency Completeness) usage ratio
Function clone clusters	`src/slop_detector/patterns/python_advanced.py` — Jensen-Shannon Divergence on 30-dim AST histograms, JSD < 0.05 = clone

The clone detection is worth noting. JSD on AST histograms catches structural duplication that string similarity misses entirely. LLMs produce a lot of this — same function logic, slightly renamed.

VERDICT: TRUE. Each defect class has a named module with a working implementation.

Claim 7: "~1.4K downloads in the past week"

Source: pypistats.org API (mirrors=false), queried 2026-04-15

last_week:  1,407  (mirrors excluded — actual pip install traffic)
last_month: 1,787
last_day:   83

"~1.4K" is within 0.5% of 1,407. Mirrors excluded means bot traffic is stripped — these are real install invocations.

VERDICT: TRUE. Verified against pypistats in real time. The number is not rounded up.

Why this format exists

Most open-source project posts make claims. Few back them up with file paths and line numbers.

That gap is the same problem AI-SLOP Detector is built to close. AI-generated code makes claims too — functions that look complete, imports that look used, pipelines that look connected. Static analysis finds the gap between what the code says and what it does.

This post applies the same standard to the project's own marketing copy. If a claim can be verified, it should be. If it can't, it shouldn't be made.

The codebase is public: github.com/flamehaven01/AI-SLOP-Detector

Pull requests welcome. Audits welcome more.

Verified by static code analysis + pypistats API, 2026-04-15