Tetsuya Wakita

Posted on Feb 22 • Edited on Mar 3

Enforcing Clean Architecture on AI-Generated Kotlin with Custom Detekt Rules

#kotlin #ai #cleanarchitecture #llm

How custom detekt rules, specialized AI agents, and specification-driven development keep Clean Architecture intact — even when an LLM writes the code.

The moment an LLM broke the architecture

I asked an AI coding assistant to add a new use case to the application layer. It wrote the code in under a minute. It also added @Service to the class, imported org.springframework.stereotype.Service, and threw an IllegalArgumentException for invalid input.

Three architecture rules broken in a single generation.

LLMs are fast. They're also statistically inclined to produce the most common pattern they've seen — and in the Kotlin ecosystem, that's Spring annotations on everything, exceptions for error handling, and no concept of layer boundaries. Your architecture documentation might say "no Spring in application layer." The LLM has seen 100,000 Spring Boot examples that do exactly the opposite.

In the previous parts of this series, I built a Clean Architecture in Kotlin with Gradle module boundaries, Arrow-kt Either error handling, and CQRS — all enforced at compile time. Those guarantees held because humans followed the rules. Now the developer is an LLM. The question becomes: how do you make the architecture enforce itself on code the LLM generates?

Custom detekt rules: architecture as static analysis

Gradle modules prevent cross-layer dependencies — domain can't import from infrastructure because the dependency doesn't exist. But within the allowed imports, an LLM can still break the architecture in two fundamental ways:

Importing forbidden libraries into pure layers (Spring annotations in domain/application)
Throwing exceptions instead of returning Either<Error, T>

Both are valid Kotlin. Both compile. Both violate the architecture. Gradle can't catch them. So I wrote two custom detekt rules that can.

Rule 1: ForbiddenLayerImportRule — whitelist-based import enforcement

Instead of blacklisting specific imports (which the LLM can always find creative alternatives for), this rule whitelists what's allowed. Anything not on the list is rejected.

class ForbiddenLayerImportRule(config: Config) : Rule(config, "...") {

    private val commonAllowedPrefixes = listOf(
        "kotlin.", "java.",
        "arrow.core.", "arrow.fx.coroutines.",
        "kotlinx.coroutines.", "org.slf4j.",
    )

    override fun visitImportDirective(importDirective: KtImportDirective) {
        super.visitImportDirective(importDirective)

        val file = importDirective.containingKtFile
        if (file.virtualFilePath.contains("/test/")) return

        val packageName = file.packageFqName.asString()
        val importPath = importDirective.importedFqName?.asString() ?: return

        val layer = packageName.removePrefix("$projectBase.").substringBefore(".")
        val layerOwnPackages = when (layer) {
            "domain" -> listOf("$projectBase.domain.")
            "application" -> listOf("$projectBase.domain.", "$projectBase.application.")
            else -> return  // only enforce on domain and application
        }

        val allowed = commonAllowedPrefixes + layerOwnPackages

        if (allowed.none { importPath.startsWith(it) }) {
            report(Finding(
                Entity.from(importDirective),
                "Import '$importPath' is not allowed in $layer layer.",
            ))
        }
    }
}

The logic is deliberate:

Domain layer: can only import Kotlin stdlib, Arrow-kt, coroutines, SLF4J, and its own packages
Application layer: the same, plus domain layer packages
Test files: skipped — tests can import whatever they need (MockK, Kotest, etc.)
Other layers: not enforced — infrastructure and presentation have legitimate framework needs

When an LLM generates import org.springframework.stereotype.Service in a use case class, detekt fails the build with a clear message: "Import 'org.springframework.stereotype.Service' is not allowed in application layer." The LLM can read that message and fix its own output.

Why whitelist instead of blacklist? Because I can't anticipate every library an LLM might try to import. The whitelist is short and complete: Kotlin, Arrow, coroutines, SLF4J, and own-layer packages. Everything else is denied by default.

Rule 2: NoThrowOutsidePresentationRule — enforcing Either for all errors

In Part 1, every fallible operation returns Either<XxxError, T>. No exceptions. The return type is the complete error specification.

LLMs don't naturally write this way. They reach for throw IllegalArgumentException("invalid input") because that's what most Kotlin code does.

class NoThrowOutsidePresentationRule(config: Config) : Rule(config, "...") {

    private val forbiddenLayers = setOf("domain", "application", "infrastructure")

    override fun visitThrowExpression(expression: KtThrowExpression) {
        super.visitThrowExpression(expression)

        val file = expression.containingKtFile
        if (file.virtualFilePath.contains("/test/")) return

        val packageName = file.packageFqName.asString()
        val layer = packageName.removePrefix("$projectBase.").substringBefore(".")

        if (layer in forbiddenLayers) {
            report(Finding(
                Entity.from(expression),
                "throw detected in '$packageName'. Use Either<XxxError, T> instead.",
            ))
        }
    }
}

Three layers are covered: domain, application, and infrastructure. The only layer where throw is permitted is presentation — because Spring's exception handler infrastructure (ResponseStatusException, GraphQLException) requires it.

Together, these two rules turn architecture guidelines into compiler-level enforcement. The LLM writes code, the Kotlin compiler checks types, and detekt checks architecture. Violations are build errors, not code review comments.

From documentation to specification: SDD for LLMs

Part 2 introduced the idea that the application module is a direct translation of the product spec. Tests validate the implementation matches the spec. That workflow was manual — an engineer reads the requirements, writes the interfaces, then writes the tests.

For LLM-driven development, the spec needs to be machine-readable. Not a PDF. Not a Confluence page. A structured markdown file in the repository that an agent can parse directly.

The spec template

Every feature starts with a spec file in .claude/specs/:

# Feature Spec: [Feature Name]

## 1. Context
[Why this feature exists. Business problem.]

## 2. Domain Model Changes
### New Value Objects
- `OrderId` (@JvmInline value class): Long, must be positive
  - Invalid: `OrderError.InvalidId(value)` -> "Invalid order ID: $value"

### Domain Errors
sealed interface OrderError : DomainError:
| Variant      | When                  | Message format              |
|--------------|-----------------------|-----------------------------|
| InvalidId    | ID <= 0               | "Invalid order ID: $value"  |
| NotFound     | No record for this ID | "Order not found: $id"      |

## 3. Use Cases
### UC-1: Find Order by ID
Input: id (Long)
Happy path: validate -> find -> return Order
Error cases:
| Error              | Application Error Type           | HTTP |
|--------------------|----------------------------------|------|
| Invalid ID         | OrderFindByIdError.InvalidId     | 400  |
| Not found          | OrderFindByIdError.NotFound      | 404  |

## 4. Presentation
| Method | Path            | Success | Errors    |
|--------|-----------------|---------|-----------|
| GET    | /api/orders/{id}| 200     | 400, 404  |

The spec is structured enough for an LLM to extract: which value objects to create, what validation rules they have, what error types to define, what use cases to implement, and what HTTP status codes to return. Every field maps directly to a Clean Architecture construct.

This is the key connection from Part 2: the spec-to-code mapping that was implicit in the engineer's head is now explicit in a file. The LLM doesn't need to interpret requirements — it reads a table and generates the corresponding Kotlin class.

Specialized agents: one agent per layer

Here's the central design insight: each agent should know only the rules of its own layer.

A domain-implementer agent that knows about Spring is an agent that might use Spring. An infrastructure-implementer that knows about domain logic is an agent that might put logic in the wrong place. The same principle that separates layers in the code separates agents in the pipeline.

The agent architecture

orchestrator (opus)
  ├── spec-writer (sonnet)      -> writes .claude/specs/[feature].md
  ├── test-designer (sonnet)    -> designs test cases from spec
  ├── tester (sonnet)           -> writes failing tests (RED)
  ├── domain-implementer (sonnet)
  ├── app-implementer (sonnet)
  ├── infra-implementer (sonnet)
  ├── presentation-implementer (sonnet)
  └── reviewer / linter / security-checker (parallel QA)

Each agent is a markdown file with explicit constraints. Here's the opening of the domain-implementer:

# Domain Implementer Agent

You implement the domain layer: entities, value objects, error types,
and repository interfaces.

## Constraints — STRICTLY ENFORCED

- **NO** Spring annotations (@Component, @Repository, @Service)
- **NO** JooQ, R2DBC, JDBC imports
- **NO** `throw` statements (use Either / raise() / ensure())
- **ONLY** allowed: Kotlin stdlib, Arrow-kt, SLF4J

The constraints mirror the detekt rules — but at the prompt level. The agent is told what it cannot do before it writes a single line. If it still violates, detekt catches it at build time.

Skills as domain knowledge

Each agent references skills — structured knowledge about specific patterns:

Skill	What it contains
`ca-kotlin`	Layer rules, module dependencies, what each layer can import
`fp-kotlin`	Immutability patterns, Either chaining, sealed error hierarchies
`tdd-kotlin`	Test patterns per layer, MockK/Kotest conventions, property-based testing
`sdd-spec`	Spec template, how to extract domain constructs from requirements
`arrow-kt`	Arrow-kt specific patterns: `either { }`, `.bind()`, `mapLeft`
`jooq-ddl`	jOOQ DSL patterns, DDL codegen, reactive Mono/Flux usage

The domain-implementer loads ca-kotlin and fp-kotlin. The infra-implementer loads ca-kotlin and jooq-ddl. The tester loads tdd-kotlin. Each agent gets the knowledge relevant to its layer — no more, no less.

Model selection: right model for the right task

Not all tasks need the same model. The orchestrator — which coordinates the entire pipeline, decides when to retry, and synthesizes results — runs on Opus (highest reasoning capability). The implementers and spec-writer run on Sonnet (faster, cheaper, sufficient for structured generation).

This isn't just cost optimization. It's about matching capability to task complexity. Writing a value object from a spec is pattern-matching. Deciding whether to re-run the implementation phase after a test failure is judgment.

The pipeline: from spec to shipped

Here's what a complete implementation cycle looks like using slash commands:

/spec order-api        -> spec-writer creates .claude/specs/order-api.md
/spec-review order-api -> spec-reviewer validates completeness
/test-design order-api -> test-designer creates test plan from spec
/impl order-api        -> orchestrator runs the full pipeline
/qa                    -> QA pipeline: build + lint + security + review

The /impl command is where the interesting engineering happens. It has to respect the dependency graph between layers while maximizing parallelism.

The dependency-aware parallelization

Phase 1 (sequential):  spec read -> domain-implementer -> app-implementer
Phase 2 (parallel):    infra-implementer | presentation-implementer | test-implementer
Phase 3 (parallel):    builder | linter | security-checker
Phase 4 (sequential):  code-reviewer -> documenter

Domain must complete before application — use cases depend on domain entities, value objects, and repository interfaces. Application must complete before infrastructure, presentation, and tests — all three consume the use case interfaces but don't depend on each other.

This maps directly to the Clean Architecture dependency graph. The implementation order follows the dependency direction: domain -> application -> (infrastructure | presentation | tests). The architecture that makes code modular also makes the build pipeline parallelizable.

What the orchestrator actually does

The orchestrator is the only agent that doesn't write code. It reads the spec, creates a layer sketch (mapping spec elements to Kotlin classes), spawns agents in the correct order, reads test results, and decides whether to retry.

### Phase 0 — Analyze
- Read the spec
- Extract and output a Layer Sketch:
  - Domain entity (fields + types)
  - Value objects (backing type + validation)
  - Repository interface (method signatures)
  - Use cases (interface name + execute signature)
- List every file to create before spawning any agent

### Phase 1 — Red (tester first, alone)
Spawn only the tester agent. Wait for completion.

### Phase 2 — Green (implementer reads tests)
Spawn implementer alone. Wait for completion.

### Phase 3 — Verify + Review (parallel)
Spawn reviewer, linter, security-checker simultaneously.
Run: ./gradlew test koverVerify detekt

### Phase 4 — Synthesize
If tests are red, re-spawn implementer with failure output.
Cap iterations at 3.

The "red -> green -> verify" loop is the SDD + TDD workflow from Part 2 — automated. Tests are written before implementation. Implementation is written to pass the tests. The architecture rules are checked after every cycle. The orchestrator doesn't trust the implementer — it verifies.

Hooks: the last line of defense

Claude Code supports hooks — shell commands that trigger on specific events. Two hooks close the loop between the LLM and the architecture rules:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [{
          "type": "command",
          "command": "cd $CLAUDE_PROJECT_DIR && ./gradlew ktlintFormat 2>&1 | tail -60",
          "timeout": 30
        }]
      }
    ],
    "Stop": [
      {
        "hooks": [{
          "type": "command",
          "command": "cd $CLAUDE_PROJECT_DIR && ./gradlew detekt 2>&1 | tail -60",
          "timeout": 120
        }]
      }
    ]
  }
}

PostToolUse on Write/Edit: Every time the LLM creates or modifies a file, ktlint auto-formats it. No style drift. No formatting discussions. The code matches the project style before the LLM's next turn.

Stop: When the LLM finishes its work, detekt runs automatically. If the custom rules detect a throw in the domain layer or a Spring import in the application layer, the violation is surfaced immediately. The LLM sees the failure output and can self-correct.

This creates a feedback loop: write -> auto-format -> continue -> finish -> architecture check -> fix if violated. The LLM operates within the enforcement boundary in real time.

In Part 3, I proved that swapping the database changed zero files in domain or application. In Part 4, CQRS actually simplified the domain by removing read responsibilities that didn't belong there. Hooks guarantee those architectural properties hold not just for those one-time changes — but for every code change an LLM makes, continuously.

The enforcement stack

Here's the complete picture of how architecture rules are enforced across the stack:

Layer	Mechanism	What it prevents
Gradle modules	Compile-time dependency graph	`domain` importing from `infrastructure`
`ForbiddenLayerImportRule`	AST-level import whitelist	Spring/JPA/framework imports in pure layers
`NoThrowOutsidePresentationRule`	AST-level throw detection	Exceptions in domain/application/infrastructure
Kover 80% threshold	Line coverage gate	Untested error paths
Agent constraints	Prompt-level rules	Agent writing code outside its layer
Skills	Domain knowledge injection	Agent using wrong patterns for its layer
PostToolUse hook	Auto-format on every write	Style drift
Stop hook	Detekt on every completion	Architecture violations in final output

Each layer catches what the layer above might miss. The agent prompt says "no throw." If the LLM ignores it, detekt catches it. If detekt isn't run, Kover might catch the untested exception path. Defense in depth — the same principle that makes Clean Architecture resilient makes the LLM pipeline resilient.

The honest tradeoffs

This setup has real costs:

Initial investment is high. 18 agent definitions, 7 skills, 6 commands, 2 custom detekt rules, and a hook configuration. That's a significant upfront cost before a single feature is implemented through the pipeline.

Agent definitions need maintenance. When the project's patterns evolve — say, migrating from jOOQ to Exposed — every agent and skill that references jOOQ needs updating. The knowledge isn't centralized in one place; it's distributed across agent prompts and skill files.

Model costs add up. Running an Opus orchestrator that spawns multiple Sonnet agents per feature isn't cheap. For a solo developer on a side project, the cost-per-feature may not justify the automation. For a team shipping multiple features per sprint, the math changes.

The pipeline is only as good as the spec. An ambiguous or incomplete spec produces ambiguous or incomplete code. The automation amplifies whatever you feed it. Garbage spec, garbage implementation — faster.

What makes the cost worth it

The payoff is in the second feature, not the first. Once the agents, skills, and rules exist, adding a new feature follows the same path: write a spec, run /impl, review the output. The architectural guarantees from Parts 1-4 are maintained automatically. No code review needed to catch Spring imports in the domain layer. No manual checks for exception-based error handling. The pipeline enforces what the architecture demands.

For teams with multiple engineers (including AI agents) contributing to the same codebase, the enforcement stack is the difference between architecture that degrades over time and architecture that holds.

What this series built

Five articles. One architecture. Database migration, new protocols, CQRS optimization, LLM-generated code — each article applied a different pressure. The architecture held every time. Not because engineers were careful, but because the constraints were structural.

Clean Architecture was designed to make software maintainable by humans. It turns out the same constraints — explicit dependencies, typed errors, pure layers — are exactly what LLMs need to write code safely. The architecture didn't change. The developer did.

The full source is on GitHub: https://github.com/wakita181009/clean-architecture-kotlin/tree/v4

DEV Community