I spent months building a complete document scanner app in Kotlin with Jetpack Compose. 110 files. 21,000+ lines of code. Along the way I hit problems that no tutorial prepared me for. Here are the hard parts and how I solved them.
1. CameraX Frame Stability Detection
The "auto-capture" feature sounds simple: detect when the document is steady and snap. In reality, you need frame-to-frame stability analysis. My approach: calculate an RMS difference between consecutive preview frames. If the RMS stays below a threshold for N consecutive frames, the document is stable. The key insight: sample every 10th pixel. Processing every pixel kills frame rate. Sampling gives you 95% accuracy at 10% of the cost.
2. Invisible OCR Text Layer in PDFs
ML Kit gives you OCR text, but positioning it correctly inside a PDF so it is selectable but invisible? That is where tutorials stop and real engineering begins. ML Kit returns bounding boxes for each text block. You map those coordinates from image space to PDF page space, then draw the text with zero opacity at the exact positions. Result: a PDF that looks like a scanned image but has fully searchable, selectable text underneath.
3. Image Enhancement Without OpenCV
Most tutorials say "just use OpenCV." But adding OpenCV means 30MB+ of native libraries. For a scanner app, you only need a few operations. I implemented Otsu's thresholding for B&W conversion and convolution kernel sharpening in pure Kotlin. No native libraries. Works on every device.
4. RevenueCat Paywall with A/B Testing
I implemented 4 paywall variants controlled by Firebase Remote Config: comparison table, demo video, single price (no choice paralysis), and urgency countdown. The A/B test selection happens at app launch, and RevenueCat handles the actual subscription logic. This separation means you can test pricing psychology without touching payment code.
5. PDF Encryption Done Right
AES-256-GCM encryption with PBKDF2 key derivation at 120K iterations. Not 1000. Not 10000. 120K, because that is what OWASP recommends.
The Stack
Kotlin 2.0, Jetpack Compose, Material 3, CameraX 1.4, ML Kit (Document Scanner + Text Recognition + Barcode), Room 2.6, Hilt DI, RevenueCat 8.3, Firebase (Analytics + Crashlytics + Remote Config), WorkManager, Coil, Lottie.
What I Learned
Building the app was the fun part. The real challenge? Getting anyone to look at it. I packaged the entire codebase as a template on Gumroad so other Android devs can skip the 3-month build and ship in days.
If you are building a scanner app, a document management tool, or anything with CameraX + ML Kit, this might save you real time.
What is the hardest "simple-sounding" feature you have had to build? Drop it in the comments.
Top comments (0)