Kwansub Yun

Posted on Mar 7

We Accidentally Rewarded AI Spaghetti Code. Here is the Math We Used to Fix It. : AI-Slop Detector v2.8.0

#codequality #opensource #resources #ai

I'll be honest: I wasn't sure if I'd write this post.

Not because there isn't enough to say about v2.8.0 — there's plenty. But because every time I say "this release is the big one," I feel a little self-conscious about it.

So I'll just say this plainly: if your team is writing code with AI assistance, I genuinely think this tool is worth your time. Not because I built it, but because the problem it solves is real, and v2.8.0 finally solves it the right way.

Here's the full story.

1. The Problem We Keep Running Into

If you've been using AI coding assistants long enough, you know the smell of "AI Slop."

Deeply nested functions that do nothing. Unused imports of tensorflow just to look like an ML script. Docstrings packed with "state-of-the-art scalable synergistic microservices" — for a function that literally just returns None.

We built AI-Slop-Detector to hunt down and block these patterns automatically in CI/CD pipelines. But as AI models grew more capable, we found they were also getting better at hiding their slop. And worse — our own math was helping them do it.

2. 🛑 The Critical Flaw We Had to Fix

In earlier versions, the "Inflation-to-Code Ratio" (ICR) was supposed to penalize files that used too many buzzwords relative to their actual logic. The old formula looked like this:

# OLD Formula (v2.7.x)
inflation_score = (jargon_count * weight) / (avg_complexity + 1)

Notice the problem?

Because avg_complexity sat in the denominator, a massive, unreadable "God Function" with a cyclomatic complexity of 30 would reduce its own jargon penalty. We were mathematically rewarding AI for writing longer, more convoluted spaghetti code — because complexity was diluting the score.

That's not a minor bug. That's the engine working backwards.

3. 📐 The v2.8.0 Fix: Complexity as an Amplifier

In v2.8.0, we inverted the logic entirely. Complexity no longer dilutes the penalty — it multiplies it.

# NEW Formula (v2.8.0)
density = unjustified_jargon_count / max(logic_lines, 1)

# Complexity modifier: strictly >= 1.0, only goes up
complexity_modifier = max(1.0, 1.0 + (avg_complexity - 3.0) / 10.0)

inflation_score = min(density * complexity_modifier * 10.0, 10.0)

A function with complexity 13 now receives double the penalty for the same jargon as a simple function. The message to AI-generated code is clear:

Complex code must prove its worth through pure logic, not narrative padding.

4. ⚔️ Live Demo: Feeding It the Worst Code I Could Write

To validate the new models, I crafted the most egregious piece of AI Slop I could. Unused heavy imports, five levels of nesting, a bare except, a mutable default argument, and a docstring that reads like a startup pitch deck.

The victim (slop_test_sample.py):

import os
import sys
import json
import asyncio
import multiprocessing # Unused but heavy
import tensorflow as tf # Fake/Unused AI import

def optimize_synergistic_neural_blockchain_backend(data_payload):
    """
    This state-of-the-art serverless function utilizes a highly scalable,
    fault-tolerant architecture to perform deep learning semantic reasoning 
    on the latent space embedding. It is a robust, enterprise-grade solution.
    """
    # HACK: temporary fix for production
    if data_payload is None:
        pass

    try:
        for i in range(10):
            if i > 5:
                while True:
                    if i == 7:
                        # Deep nesting level 5!
                        print("Cutting-edge optimization complete.")
                        break
    except:
        # Bare except (Critical structural issue)
        pass

    return None

def _process_microservices_byzantine_fault():
    ... # Ellipsis placeholder

def dummy_func(items=[]): # Mutable default arg
    return items

Running the detector:

slop-detector slop_test_sample.py

Result: 100.0/100 CRITICAL_DEFICIT.

Let's break down exactly what it caught.

1) 📦 Fake Dependencies

Warnings:
- CRITICAL: Only 0.00% of imports used
- FAKE IMPORTS: tensorflow

tensorflow was imported to look like an ML script. It was never called. The detector doesn't just count unused imports — it specifically flags known heavyweight AI/ML libraries as "Fake Imports" when they're never invoked.

2) 🌳 AST-Based Deep Nesting Detection

Pattern Issues:
  L8 [HIGH] Function 'optimize_synergistic_..._backend' has nesting depth 5 (limit 4)

This is new in v2.8.0. Instead of scanning text, the engine walks the actual Abstract Syntax Tree — try → for → if → while → if — and counts the cognitive depth directly. No regex tricks.

Unlike standard linters that rely on brittle regex, v2.8.0 parses the actual Python AST. It doesn't just read the text; it understands the structure.

3) 🤥 Calling Out the Claims

CRITICAL QUESTIONS:
2. Jargon density is 5.4x normal. Is this documentation or sales copy?
4. Claims like 'scalable' have ZERO supporting evidence.
14. (Line 12) 'enterprise-grade' claim lacks: error handling, logging, integration tests.
    Only 14% of required evidence present.

This is the part I'm most proud of. The engine reads the docstring, extracts architectural claims like "scalable" or "enterprise-grade," then cross-references the AST to check for actual evidence — connection pooling, caching, logging, proper error handling. When those structures are absent, it generates specific review questions automatically.

4) 🚨 Structural Anti-Patterns

  L26 [CRITICAL] Bare except catches everything including SystemExit and KeyboardInterrupt
  L35 [CRITICAL] Mutable default argument - shared state bug
  L32 [HIGH] Empty function with only ... - placeholder not implemented

Classic patterns. The bare except silently swallows crashes. The items=[] default argument is a shared-state bug that surprises developers for decades. Both caught immediately.

5. 🔮 Other Major Changes in v2.8.0

SR9 Project Aggregation
Project-level scoring is no longer a simple mean. The new formula is:

project_ldr = 0.6 * min(file_scores) + 0.4 * mean(file_scores)

This is our conservative SR9 Aggregation. In a project with 99 perfect files and 1 absolute garbage file, a simple average says everything is fine. SR9 drags the score down to expose the weakest link — because in production, the weakest link is all that matters.

Function-Scoped Justification
Previously, importing torch at the top of any file was a "free pass" to use ML jargon everywhere. Now, jargon is only justified when the relevant import or decorator is within the same function's scope in the AST.

Optional ML Secondary Signal
A 16-dimensional feature vector (RandomForest/XGBoost) is now available as a secondary validation layer, fully sandboxed from the zero-dependency core so it doesn't slow down CI pipelines that don't need it.

188 Tests. Zero regressions.

6. A Genuine Recommendation

I try not to oversell things I've built. But I'll say this:

If your team uses Copilot, Cursor, Claude, or any AI assistant to generate code — and you're not reviewing the structural and semantic quality of that output automatically — you are accumulating debt that is harder to see than normal technical debt, because it looks like production code.

This tool won't catch everything. But it will catch the patterns that slip through even careful human review, the ones that only show up when you interrogate the AST directly.

pip install ai-slop-detector

If you try it and find something it misses — or catches incorrectly — I genuinely want to hear about it. That's how v2.8.0 got built.

8. Repository & Documentation

📦 VS Code Extension

Install directly from the marketplace:

https://marketplace.visualstudio.com/items?itemName=flamehaven.vscode-slop-detector

flamehaven01 / AI-SLOP-Detector

Stop shipping AI slop. Detects empty functions, fake documentation, and inflated comments in AI-generated code. Production-ready.

AI-SLOP Detector

Catches the slop that AI produces — before it reaches production.

The problem isn't that AI writes code.
The problem is the specific class of defects AI reliably introduces:
unimplemented stubs, disconnected pipelines, phantom imports, and buzzword-heavy noise.

The code speaks for itself.

Navigation: Quick Start • What's New v3.0.2 • What's New v3.0.0 • What It Detects • Scoring Model • Structural Coherence • Self-Calibration • History Tracking • CI/CD • Docs • Changelog

Quick Start

pip install ai-slop-detector

slop-detector mycode.py               # single file
slop-detector --project ./src         # entire project
slop-detector mycode.py --json        # machine-readable output
slop-detector --project . --ci-mode hard --ci-report  # CI gate

# Optional extras
pip install "ai-slop-detector[js]"     # JS/TS tree-sitter analysis
pip install "ai-slop-detector[ml]"     # ML secondary signal
pip install "ai-slop-detector[ml-data]"  # real training data pipeline

# uvx (no install required)
uvx ai-slop-detector mycode.py

What's New in

…

View on GitHub

The tool is open for feedback—I'm actively iterating based on real-world usage.

What's the worst piece of AI-generated code you've seen slip into a real codebase? Drop it in the comments — I might add it to the test suite.

DEV Community