
A living spec, a few guardrails, and an architecture the model isn't allowed to break.
Describe a feature in plain English — get it designed, built, tested, and reviewed across Android + iOS, with Clean Architecture enforced, not hoped for.
Here's the catch nobody warns you about when you generate features with an AI: each one looks fine on its own. The first screen is clean. So is the second. By the fifth, the app has two different state conventions, a repository quietly reaching into a ViewModel, and a folder of screens that don't agree on how anything is laid out. No single prompt was wrong. The codebase drifted anyway, because nothing held the architecture in place between prompts.
You don't fix that with a better prompt. You fix it by making the structure a constraint the model has to satisfy, not a suggestion it's free to reinterpret tomorrow.
That's what KMPilot is: a template I built to hold AI to exactly that. I put it to work on a real app: Kickoff26, a 2026 World Cup companion built feature by feature, design first. This post is what it taught me about keeping AI-generated code in shape.
Why AI-generated code drifts
Four things go wrong when you build an app one prompt at a time.
Patterns drift. Prompt one produces a UiState sealed class. Prompt five "improves" on it, and now two state conventions live in the same app.
Layers leak. The fastest path from A to B is often a shortcut through a layer that wasn't meant to know about the other. The model takes it, because the shortcut compiles.
Tests evaporate. They're the first casualty of "just make it work," and nobody notices until there's nothing left to run.
The design never matches the screen. A mockup has 24dp corners and a specific tinted header. What ships is close-ish. Multiply that across twenty screens and the app stops looking designed.
None of these are intelligence problems. They're memory and enforcement problems. The model has no durable record of how this codebase does things, and nothing stops it from doing them differently today.
The model was never the problem. Nothing was holding its work in place.
The pattern: spec-driven, applied to KMP
Spec-Driven Development isn't something I came up with. Tools like GitHub's spec-kit and OpenSpec have already made it popular for general codebases: write the spec first and make it the artifact the AI builds from, instead of planning in a chat that scrolls away. The idea is simple: put the contract in writing, keep it beside the code, make the code answer to it. What I did was aim it at one domain, Kotlin Multiplatform, with three parts:
- The architecture is the constraint. Clean Architecture, the same shape for every feature, and non-negotiable. Not by politeness, either: a hook physically blocks raw edits to feature code, so the model can't quietly reshape it. It executes inside the structure.
- The design is an input, not an afterthought. A screen begins as an approved mockup, and the mockup's tokens flow into the code.
-
A living
spec.mdis the contract. One per feature, versioned, updated as the code changes. It's the memory the model reads before it touches anything.
Under the hood, KMPilot is a set of skills and agents running on top of Claude Code. That's what built every screen in Kickoff26.
How a feature actually gets built
Take the Matches tab: a group-stage browser plus a knockout bracket, Round of 32 to the Final. No single prompt built it. It came together as a short sequence of skills, each with one job, each leaving behind an artifact the next one reads.
It starts with the design, with /ui-designer. You describe the screen in plain language:
/ui-designer Matches — a tab that switches between a group-stage fixtures list, filterable by matchday and group, and a knockout bracket running from the Round of 32 to the Final
It drives Stitch (Google's AI design tool) over an MCP connection: it generates a mockup, and you refine it by talking to it (make the live badge red, tighten the bracket spacing) until it's right. Approving it is where the interesting part happens. The skill pulls the finished screen back from Stitch and runs it through a token-extraction script that lifts every color, radius, font, and spacing value straight from the design instead of eyeballing them, then downloads the exact icons and images the mockup uses. All of it lands in a blueprint: the design captured as explicit Compose instructions, down to a negative goal difference turning error red and the bracket's connectors becoming a Canvas. The screen leaves Stitch as a contract, not a screenshot, so the next step builds it token-for-token, not approximately.
The shipped Matches screen: Group Stage / Knockout, matchday and group filters, real flags and scores. Every color, radius, font, and spacing value came from the Stitch mockup, not eyeballing.
Then the build, with /creating-kmp-feature. This skill is the heart of the system. It already has the design, from the blueprint; what you hand it is the data contract, the API endpoint and the shape that comes back:
/creating-kmp-feature matches — fixtures from GET /get/games, where each game has home_team_id, away_team_id, local_date, stadium_id, matchday, type, finished and time_elapsed
From there it works in stages, stopping for your sign-off between each:
- It turns the request into a short PRD (what the feature does, its screens, its data, the edge cases), then waits.
- Once you approve, it breaks the PRD into discrete tasks (data layer, UI, wiring), then waits again.
- Only after that second confirmation does it hand the tasks to specialized agents that run in parallel, each owning one layer:
- data: a serializable model for that JSON, the repository, the Ktor call
- ui: the ViewModel, the Compose screen and its components
- integration: dependency injection, navigation, the Gradle wiring
- platform: per-platform code behind a shared interface, only when a feature reaches for a device capability (GPS, camera, biometrics)
Matches is plain network, so the first three covered it. Because each agent owns a separate layer, they never collide, and what comes back isn't a sketch you finish by hand. It's a complete, wired feature module, laid out the same way every feature is:
feature/matches/
├── data/
│ ├── model/MatchesDtos.kt # @Serializable, mirrors the API JSON
│ └── repository/ # MatchesRepository + Impl
├── presentation/
│ ├── MatchesViewModel.kt
│ ├── MatchesUiModel.kt # one state container
│ ├── ui/
│ │ ├── MatchesScreen.kt # screen + screen-root, nothing else
│ │ ├── MatchesUtils.kt
│ │ ├── motion/MatchesMotion.kt
│ │ └── components/ # 20 files — one composable each
│ │ ├── SegmentedControl.kt
│ │ ├── MatchCard.kt
│ │ ├── BracketColumn.kt
│ │ └── …
│ └── navigation/MatchesNavigation.kt
└── di/MatchesModules.kt # Koin module
Thirty-two files across data, presentation, and DI, every one generated from a single design and a single build command, laid out exactly like every other feature in the app. (Browse it on GitHub.)
Predictable structure means you review behavior, not boilerplate.
But the module is only half of what /creating-kmp-feature produces. Alongside it the skill writes a spec.md and stores it outside the feature tree, at .claude/docs/matches/spec.md, versioned and committed with the project. That spec is the feature's memory, structured rather than freeform: a metadata header (version, status, dates), the feature's goals and non-goals, a table of design decisions with the rationale and the alternatives rejected, and requirements written as GIVEN / WHEN / THEN scenarios. A trimmed slice of the Matches spec:
# Matches Specification
Version 1.2.2 · Status: Active · Updated 2026-06-16
## Design Decisions
| Decision | Choice | Rationale |
| Re-filter, no refetch | cache games/teams, recompute in memory | a chip tap shouldn't hit the API |
## Requirement: Group Stage fixtures browsing
The system SHALL let users browse group-stage fixtures, filtered by matchday and group.
Scenario: filters narrow the list
- GIVEN dataState = Success and the Group Stage tab is active
- WHEN the user picks a different matchday or group chip
- THEN MatchesDto.dateSections MUST be recomputed from cached data
- AND if the result is empty, EmptyContent MUST render
## Last Updated
- 2026-06-16 v1.2.2 Knockout tab: render the real 16/8/4/2/1 bracket
- 2026-06-15 v1.2.1 Live detection: API field is "live", not "playing"
- 2026-06-15 v1.2.0 Add Persian (fa) locale — 22 strings translated
That's the short version. The full matches/spec.md lives in the repo. The version number and the dated changelog make each feature's history trackable at a glance.
Changing it later, with /modifying-kmp-feature. Once a feature exists you never hand-edit it; you describe the change:
/modifying-kmp-feature matches — add a "Live" filter chip that shows only in-progress matches
It reads the spec first, plans the change against the decisions already recorded there, and edits the feature through the same agents, never by hand: a hook physically blocks raw edits to files under feature/. When it's done it writes the spec back: a new version, and a fresh dated line in that changelog. Because it starts from the spec, it builds on the existing design instead of relitigating it, and the code and the spec are never updated apart, so neither drifts from the other.
Asking the model nicely isn't enforcement. Blocking the write is.
The remaining skills are gates, run the same way. /verify-ui matches re-checks the built screen against the design tokens, /feature-test matches writes the test suite (fixtures, repository, ViewModel, UI, and an end-to-end pass), and /feature-review matches audits the result against the architecture rules. Any of them can hand the work back. That's the core loop; the full skill catalog covers the rest.
The honest part
I didn't trust this at first. For weeks I half-expected to open the project and find the usual AI sprawl: three ways of doing the same thing, a layer quietly leaking into another, tests I'd end up writing myself anyway. It never showed up. The closest I came to a mess was my own: I'd let each feature keep its own copy of the network layer, and by the fourth one the duplication was impossible to ignore. That's normally the kind of cleanup you keep putting off, because it touches everything. Here I described the change once, the specs told me exactly what each feature had decided and why, and it was done in an afternoon without breaking a thing.
It's young: Kickoff26 still says under development, and KMPilot has rough edges I haven't sanded. But after months of watching AI-assisted codebases rot in fast-forward, the part I keep coming back to is that this one hasn't. Nothing drifted. That was the whole point.
Try it
If you write Kotlin Multiplatform and you've watched a codebase drift under AI-generated code, the pattern is worth stealing even without the template. Make the architecture a constraint. Make the design an input. Give the model a living spec to read.
KMPilot is the version I actually use, MIT-licensed, one command to start:
curl -fsSL https://raw.githubusercontent.com/ThisIsSadeghi/KMPilot/main/install.sh \
| bash -s <MyApp> <com.acme.myapp>
Repo and the full pipeline: github.com/ThisIsSadeghi/KMPilot. If the idea resonates, a star is the cheapest way to tell me to keep building it.


Top comments (0)