Stoyan Minchev

Posted on Mar 29

I built a 126K-line Android app with AI — here is the workflow that actually worked for me

#ai #productivity #devops #programming

Most developers trying AI coding tools hit the same wall. They open a chat, type "build me a todo app," get something that looks right, and then spend 3 hours fixing the mess. They try again with a bigger project and it falls apart faster. They conclude AI coding is overhyped.

I had the same experience. Then I changed my approach — not the tool, the process around it.

Over 4 months I built How Are You?!, a safety-critical Android app that monitors elderly people living alone. 126,000 lines of Kotlin. 144 versions. 130 test files. 3 languages. Solo developer with zero Kotlin experience when I started. The entire codebase was AI-generated — I never wrote Kotlin manually.

This article is not about the app. It is about the workflow that made this possible.

Why most people fail with AI coding

Two reasons:

Expectations are wrong. People expect to describe a feature in plain English and get production code. That works for a function. It does not work for a system. AI is not a replacement for engineering — it is an amplifier. If your input is vague, the output is vague.
No structure around the AI. They open a chat, prompt, get code, paste it, prompt again. There is no architecture. No shared context. No accumulated knowledge. Every conversation starts from zero.

The fix is not better prompting. It is better engineering process — with the AI as a participant.

Step 1: Architecture before code (BMAD)

Before writing a single line of code, I used BMAD (a structured methodology for AI-assisted development) to create:

Product Requirements Document — what the app does, who it is for, what the constraints are
Architecture document — module boundaries, layer responsibilities, error handling patterns, data flow
Project context — coding standards, naming conventions, DO/DON'T lists

This took about a week. It felt slow. It was the most valuable week of the entire project.

Why? Because every conversation with the AI after that point had a shared foundation. The AI was not guessing what my app looked like — it knew. Module boundaries were defined. Error handling was standardized. The AI could generate code that fit into a real system because the system was documented.

Without architecture docs, AI generates code that looks correct in isolation but conflicts with everything else. You spend all your time merging inconsistent outputs instead of building features.

Step 2: CLAUDE.md — the constitution

Claude Code loads a CLAUDE.md file from your project root at the start of every conversation. This is the most important file in my repository.

Mine contains:

Module boundaries enforced by Gradle (which module can import what)
Core patterns (all use cases return Result<T>, ViewModels expose StateFlow, never GlobalScope)
Critical DON'Ts — a condensed list of rules that came from production bugs
Subsystem quick reference — a table pointing to detailed rules for each area (AlarmManager, sensors, AI, email, billing, GPS, permissions)

Every rule in that file exists because I violated it once and something broke. The file grows with the project.

This is the key insight: CLAUDE.md turns one-time lessons into permanent constraints. The AI never forgets a rule I put there. I forget constantly.

Step 3: Living documentation with start/stop commands

I built custom slash commands that bookend every development session:

/howareyou-start — loads the developer briefing, critical rules, release notes, and current version. The AI reads everything before I write a single prompt. It takes 30 seconds and prevents 80% of the mistakes I used to make.

/howareyou-stop — updates release notes, archives old entries, updates CRITICAL_DONTS.md with any new lessons, updates the developer briefing, bumps the version, commits, and pushes.

The documentation is never stale because updating it is part of the release process, not a separate task. I do not update docs manually. The AI does it as part of shipping.

This creates a flywheel: better docs -> better AI output -> fewer bugs -> lessons captured -> better docs.

Step 4: Concrete technical specs

When I need a new feature, I do not say "add travel detection." I use BMAD's tech spec workflow to produce a document that specifies:

Exact state machine (HOME -> DAY_1 -> TRAVELING -> TRIP_ENDED)
Database schema changes (table names, column types, indexes)
Which existing classes are affected and how
Edge cases and error handling
What tests to write

The spec is 2-5 pages. Writing it takes 30 minutes with BMAD's guided conversation. It saves hours of back-and-forth with the AI during implementation and eliminates the "it generated something but it does not fit" problem.

The rule: if I cannot describe the feature precisely enough for a spec, I am not ready to build it. I brainstorm first (also with the AI), then spec, then build.

Step 5: Brainstorming sessions

I use BMAD brainstorming for everything — not just code. Pricing strategy. UX decisions. Marketing approaches. Whether to support SMS notifications or stick with email.

The pattern: open a session, describe the problem, let the AI challenge my assumptions. I keep the transcripts. Some of my best architectural decisions came from brainstorming sessions where the AI pointed out an edge case I had not considered.

Step 6: Automated audits that run weekly

My app has to survive Android OEM battery killers (Samsung, Xiaomi, Honor, OPPO — they all kill background apps differently). These OEMs ship updates constantly that can break my compatibility layer.

I built two audit commands:

/howareyou-oem-audit — searches the web for recent OEM changelog entries and breaking changes, then scans my codebase for affected areas and proposes fixes.

/howareyou-gps-audit — does the same for GPS and location API changes (FusedLocationProvider updates, OEM GPS power management changes).

/howareyou-full-audit — runs both in parallel and produces a combined report with a prioritized action plan.

I run these weekly. They have caught breaking changes before they hit my users — Samsung silently resetting battery optimization exemptions after OTA updates, Honor changing wakelock tag whitelisting behavior, Google deprecating location API parameters.

This is the kind of thing that would take a human developer hours of manual searching. The AI does it in minutes and maps the findings directly to my source code.

Step 7: One-command publishing

/howareyou-build-test    → builds signed release AAB
/howareyou-publish-testingMode  → uploads to Google Play internal + closed testing

From "the code is ready" to "testers have the update" in under 5 minutes, without leaving the terminal. No browser, no Play Console clicking.

Step 8: Infrastructure monitoring

I use 6 Google Cloud projects for Gemini API key rotation (each project gets 10K free requests/day — 60K total). Things break. Billing gets disabled. Keys expire.

/howareyou-monitor — checks all 6 shards, reports which are healthy, which failed, and why.

/howareyou-fix-billing — automatically re-links disabled shards to the shared billing account.

These are not development tasks. They are operational tasks that I handle from the same terminal where I write code.

Step 9: Code reviews with a second model

After implementing a feature, I run a code review using BMAD's adversarial review workflow. It is configured to find 3-10 specific problems in every review — it never says "looks good." It checks:

Architecture compliance (are module boundaries respected?)
Test coverage (are edge cases tested?)
Security (any hardcoded keys? SQL injection? XSS?)
Performance (unnecessary allocations? missing indexes?)
Consistency with project patterns

This catches things I miss because I have been staring at the code for hours. The adversarial framing is important — a review that always approves is useless.

Step 10: Lessons learned as a living document

Every production bug becomes a rule in CRITICAL_DONTS.md. The file is organized by subsystem:

AlarmManager: never call setAlarmClock() more than 3x/day (Honor flags you)
Sensor: always flush FIFO and discard stale readings (Honor rebases timestamps)
Email: per-recipient sends, never batch (Resend delivery tracking breaks)
GPS: full priority fallback chain, never trust a single getCurrentLocation() call

There are 50+ rules in that file. Each one has a version number (when it was added) and a rationale (why it matters). The AI reads this file at the start of every session via the /howareyou-start command.

This is the most underrated part of the workflow. Most developers keep lessons in their head. Heads forget. Files do not.

The daily workflow

Here is what a typical development day looks like:

/howareyou-start — AI loads all context (30 seconds)
Describe the task — with a tech spec if it is a feature, or a bug description if it is a fix
AI implements — I review the diff, run tests
Iterate — usually 1-3 rounds
/howareyou-stop — docs updated, version bumped, committed, pushed
/howareyou-publish-testingMode — testers have the update

I ship multiple versions per day with this flow. Not because I rush — because the overhead between "code works" and "testers have it" is near zero.

What this is NOT

It is not "no-code.". If you know the language, it is worth checking and correcting if needed. With time, the needed small fixes will become less. It is always good to understand the architecture and to make the design decisions yourself.
It is not effortless. The workflow took months to build. The documentation is extensive.
It is not magic. The AI makes mistakes. The difference is that mistakes are caught by the process (tests, reviews, rules, audits) instead of by users.

The numbers

126,000 lines of Kotlin across 398 files
45,000 lines of tests across 130 files
144 versions shipped
3 languages (English, Bulgarian, German)
50+ production lessons captured in CRITICAL_DONTS.md
4 months from zero Kotlin experience to production app on Google Play
9 custom commands automating the full development lifecycle
0 lines of Kotlin written manually by me

The takeaway

AI coding tools are not magic code generators. They are force multipliers for engineering process. If your process is "open chat, type prompt, hope for the best," you will be disappointed.

If your process is "document the architecture, define the rules, automate the lifecycle, capture every lesson, review everything adversarially" — the AI becomes unreasonably effective.

The investment is not in better prompts. It is in better engineering.

The app is How Are You?! — AI safety monitoring for elderly parents. It will be released soon. The code workflow described here uses Claude Code with BMAD. Both are tools I use daily and genuinely recommend.