AI-Generated Code Auditing: Build a Static Analysis Framework That Catches What LLMs Get Wrong

#webdev #programming

What We're Building

Let me show you a pattern I use in every project that involves AI-generated code: a repeatable audit framework. By the end of this workshop, you'll have a custom Detekt ruleset configured to catch the five most common anti-patterns LLMs produce, a review checklist calibrated for real codebases, and a CI gate that blocks the dangerous stuff before it ships.

I built this after auditing a 40K-line Android + Kotlin backend codebase that was predominantly AI-generated. The team shipped fast — and then two senior engineers spent six weeks untangling the result. This framework is what I wish they'd had from day one.

Prerequisites

A Kotlin or Android project with Gradle
Detekt added as a dependency (we'll cover setup if you haven't)
Basic familiarity with CI/CD pipelines (GitHub Actions, GitLab CI, etc.)
Roughly 20 minutes

Step 1: Understand the Failure Patterns

Before we configure tooling, you need to know what you're scanning for. Here are the five anti-patterns LLMs produce with remarkable consistency across Cursor, Devin, and Claude Code outputs.

The one that burns you hardest is the Exception Black Hole. AI tools wrap operations in broad try-catch blocks that silently swallow failures. Here is the minimal example:

// What the AI generates — silent data corruption in production
fun syncOrders() {
    try {
        val orders = api.fetchOrders()
        database.insertAll(orders)
    } catch (e: Exception) {
        Log.e("Sync", "Failed", e)  // Logged and forgotten
    }
}

// What production code needs — propagate the failure
suspend fun syncOrders(): Result<Unit> {
    return runCatching {
        val orders = api.fetchOrders()
        database.insertAll(orders)
    }
}

The other four: Copy-Paste Architecture (duplicate implementations across modules), Flat Dependency Graphs (bypassing DI with direct instantiation), Configuration Sprawl (hardcoded URLs and timeouts), and Over-Abstraction (unnecessary AbstractBaseStrategyFactoryProvider hierarchies). Every audit I run surfaces these same five.

Step 2: Configure the Detekt Ruleset

Create a file called detekt-ai-audit.yml in your project root. Here is the configuration that catches the highest-signal issues:

# detekt-ai-audit.yml
complexity:
  LongMethod:
    threshold: 30
  CyclomaticComplexMethod:
    threshold: 8
  TooManyFunctions:
    threshold: 15

style:
  MagicNumber:
    active: true
  ForbiddenComment:
    values: ['TODO', 'FIXME', 'HACK']

exceptions:
  TooGenericExceptionCaught:
    active: true
  SwallowedException:
    active: true

Now wire it into your build.gradle.kts:

detekt {
    config.setFrom("detekt-ai-audit.yml")
    buildUponDefaultConfig = true
    allRules = false
}

Run ./gradlew detekt and review what surfaces. The docs don't mention this, but SwallowedException alone will likely flag dozens of issues in any AI-generated codebase.

Step 3: Add the DI Bypass Check

The Flat Dependency Graph pattern is subtle because the code compiles and runs fine. Here is the gotcha that will save you hours: search your codebase for ViewModels constructing their own dependencies.

// Flag this pattern in code review — always
class OrderViewModel : ViewModel() {
    private val repository = OrderRepository(ApiClient())
}

// Require this instead
class OrderViewModel @Inject constructor(
    private val repository: OrderRepository
) : ViewModel()

Run a quick scan with grep -rn "= .*Repository\|= .*ApiClient\|= .*Service" --include="*.kt" across your source tree. If you get more than zero hits in ViewModel or UseCase classes, you have work to do.

Step 4: Wire It Into CI

Add Detekt as a mandatory gate. In GitHub Actions:

- name: AI Code Audit
  run: ./gradlew detekt
  continue-on-error: false

This adds roughly 45 seconds to your pipeline. In my projects, it catches an average of 3.2 critical issues per week that would have reached production without it.

The Review Checklist

Pin this to every PR that contains AI-generated code:

Check	What to Look For	Severity
Error propagation	Do catch blocks propagate or silently swallow?	Critical
Thread safety	Are shared mutable states protected in coroutine contexts?	Critical
Dependency injection	Objects constructed inline or through DI?	High
Test quality	Do tests cover failure paths or only happy paths?	High
API surface area	Are internal details exposed publicly?	High
Duplication	Near-identical implementations across modules?	Medium
Configuration	URLs, timeouts, flags hardcoded or in config?	Medium

Gotchas

AI-generated tests lie. A codebase can show 61% line coverage where every test is a happy-path assertion. Check that tests verify failure behavior, not just success.
Duplication compounds faster with AI. The tool has no memory of what it already built. Set a measurable threshold (I use <3% via Detekt's CopyPastedCode rule) and treat violations as blocking.
Don't skip the ForbiddenComment rule. AI tools leave TODO and FIXME markers that never get resolved. These are deferred bugs.
Over-correcting is real. When you ask an AI to refactor duplication, it often creates deep abstraction hierarchies. Review refactoring PRs with extra scrutiny.

Wrapping Up

The key insight: treat AI as an intern, not an architect. An intern writes code that a senior engineer reviews with specific, calibrated scrutiny. The checklist and ruleset above are that calibration.

Start with the Detekt config — it takes 15 minutes and pays for itself on the first PR it blocks. Then audit error handling paths across your project. Those silent catch blocks are where production incidents hide.

For deeper reading, check the Detekt rule documentation and SonarQube's Kotlin analyzer.