<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Stoyan Minchev</title>
    <description>The latest articles on DEV Community by Stoyan Minchev (@stoyan_minchev).</description>
    <link>https://dev.to/stoyan_minchev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3837616%2Fd4e8aa4b-1bf2-4fba-8492-c70bffc2e5f8.jpg</url>
      <title>DEV Community: Stoyan Minchev</title>
      <link>https://dev.to/stoyan_minchev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stoyan_minchev"/>
    <language>en</language>
    <item>
      <title>A single Kotlin lambda silently broke my app for 21 hours - and I only found the bug because someone crossed a border</title>
      <dc:creator>Stoyan Minchev</dc:creator>
      <pubDate>Sun, 12 Apr 2026 20:38:19 +0000</pubDate>
      <link>https://dev.to/stoyan_minchev/a-single-kotlin-lambda-silently-broke-my-app-for-21-hours-and-i-only-found-the-bug-because-3mgj</link>
      <guid>https://dev.to/stoyan_minchev/a-single-kotlin-lambda-silently-broke-my-app-for-21-hours-and-i-only-found-the-bug-because-3mgj</guid>
      <description>&lt;p&gt;I build a safety-critical Android app that monitors elderly people living alone. It watches their phone 24/7 — motion, GPS, screen activity — and emails their family when something looks wrong. No buttons to press, no wearable to charge. Install it on grandma's phone and forget about it.&lt;/p&gt;

&lt;p&gt;I took a trip from Bulgaria to Romania in early April to test the app in real conditions and have a small vacation with my family. I drove across the Danube at the Vidin - Calafat bridge. Everything was working fine. Then at 14:55, the app went completely silent.&lt;/p&gt;

&lt;p&gt;Not crashed. Not killed by the OS. Silent.&lt;/p&gt;

&lt;p&gt;For the next &lt;strong&gt;21 hours and 42 minutes&lt;/strong&gt;, the motion sensor recorded 682 events. The GPS hardware was acquiring satellite fixes with 11-meter accuracy. The app was running, awake, doing its job. But not a single location reached the database.&lt;/p&gt;

&lt;p&gt;The next morning, the AI looked at the last known position — a border crossing — and the 12-hour data gap, and did what it was designed to do: it sent an URGENT alert. Except I was fine. I was in Craiova, 200km away, sleeping in a hotel. The alert was anchored to a stale coordinate from the previous afternoon.&lt;/p&gt;

&lt;p&gt;I spent two days tracing this. The root cause was one line of Kotlin.&lt;/p&gt;

&lt;h2&gt;
  
  
  The interface that lies to you
&lt;/h2&gt;

&lt;p&gt;Android's &lt;code&gt;Geocoder&lt;/code&gt; class converts GPS coordinates into street addresses. On API 33+, there's an async callback version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="n"&gt;geocoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getFromLocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;longitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;addresses&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="c1"&gt;// do something with the result&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That trailing lambda is Kotlin's SAM (Single Abstract Method) conversion. It looks clean. It compiles. It works perfectly — until it doesn't.&lt;/p&gt;

&lt;p&gt;The interface behind this lambda is &lt;code&gt;Geocoder.GeocodeListener&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;GeocodeListener&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onGeocode&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@NonNull&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Address&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onError&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@Nullable&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;errorMessage&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See that second method? &lt;code&gt;onError&lt;/code&gt; has a &lt;strong&gt;default empty implementation&lt;/strong&gt;. When you use a SAM lambda, Kotlin only implements the single abstract method — &lt;code&gt;onGeocode&lt;/code&gt;. The default &lt;code&gt;onError&lt;/code&gt; stays empty.&lt;/p&gt;

&lt;p&gt;So what happens when geocoding fails? Network timeout. No roaming data after crossing a border. Play Services killed by the OEM battery manager. Any of a dozen things that go wrong on real Android devices in real countries.&lt;/p&gt;

&lt;p&gt;The framework calls &lt;code&gt;onError()&lt;/code&gt;. The empty default runs. Nothing happens. &lt;strong&gt;The continuation is never resumed. The coroutine hangs forever.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it killed everything, not just geocoding
&lt;/h2&gt;

&lt;p&gt;If the geocoder had hung in isolation, it would have been a minor bug — one address lookup fails, you move on. But my code looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="n"&gt;processLocationMutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withLock&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;address&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;reverseGeocode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;longitude&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// hangs here&lt;/span&gt;
    &lt;span class="nf"&gt;insertLocationData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;processLocationMutex&lt;/code&gt; exists for a good reason. Four independent systems can trigger a GPS write at the same time — the stillness detector, the periodic scheduler, the force probe, and the area stability detector. Without the mutex, they race on the stationarity filter and insert duplicate rows that defeat the drive-through filtering logic.&lt;/p&gt;

&lt;p&gt;But when &lt;code&gt;reverseGeocode()&lt;/code&gt; hung, the mutex was held forever. Every subsequent GPS fix from every trigger path called &lt;code&gt;processLocation()&lt;/code&gt;, tried to acquire the mutex, and blocked. Behind a coroutine that would never wake up.&lt;/p&gt;

&lt;p&gt;No exception. No crash. No log entry. Just a growing queue of frozen coroutines, each holding a perfectly good satellite fix that would never reach the database.&lt;/p&gt;

&lt;p&gt;The motion sensor kept firing. The GPS kept acquiring. The diagnostic logs show two successful HIGH_ACCURACY fixes at 21:37 and 21:38 — 11-meter accuracy, acquired in 2.5 seconds — both of which entered &lt;code&gt;processLocation()&lt;/code&gt; and silently queued behind the hung mutex holder from 7 hours earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  The only recovery was killing the process
&lt;/h2&gt;

&lt;p&gt;At 12:19 the next day — almost 22 hours after the hang started — I force-stopped the app from Android settings. The process died. The singleton mutex died with it. On restart, everything worked again.&lt;/p&gt;

&lt;p&gt;But by then, the damage was done. The AI had already sent a false URGENT alert based on 12-hour-old coordinates. And a weekly re-calibration job had run during the trip, learning the border crossing drive-through as a "frequent location," which caused a cascade of further false alerts over the following days.&lt;/p&gt;

&lt;p&gt;One hung lambda. One stale coordinate. Days of downstream consequences.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix has three layers
&lt;/h2&gt;

&lt;p&gt;I don't trust single fixes for problems that can kill 21 hours of data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1: Explicit object, both methods implemented.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;listener&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;object&lt;/span&gt; &lt;span class="err"&gt;: &lt;/span&gt;&lt;span class="nc"&gt;Geocoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GeocodeListener&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onGeocode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MutableList&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Address&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;hasResumed&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;continuation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isActive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;hasResumed&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
            &lt;span class="n"&gt;continuation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;formatAddress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addresses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;firstOrNull&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errorMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;hasResumed&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;continuation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isActive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;hasResumed&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
            &lt;span class="n"&gt;continuation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No SAM conversion. Both callbacks resume the continuation. The &lt;code&gt;hasResumed&lt;/code&gt; flag guards against the race where both fire, or either fires after timeout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2: Hard timeout ceiling.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nf"&gt;withTimeoutOrNull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10_000L&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;suspendCancellableCoroutine&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;continuation&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="n"&gt;geocoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getFromLocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;longitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even if some future Android version adds a third callback method with another empty default, the coroutine dies after 10 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3: Geocoding moved outside the mutex.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Geocoding is slow and can hang — never inside the mutex&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;address&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reverseGeocodingService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reverseGeocode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Only the database insert is protected (50ms critical section, not 10s+)&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;acquired&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;withTimeoutOrNull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60_000L&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;processLocationMutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withLock&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;insertLocationData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mutex timeout is a tripwire. If something else wedges the lock in the future, we log a diagnostic error and drop the fix rather than queuing forever.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SAM conversion is not a convenience. It's a contract you didn't read.&lt;/strong&gt; When you write a trailing lambda, you're implementing one method and accepting the defaults for everything else. If those defaults are no-ops, you've written code that silently drops errors. The compiler won't warn you. The IDE won't flag it. It works perfectly until it doesn't.&lt;/p&gt;

&lt;p&gt;The scary part is that &lt;code&gt;GeocodeListener&lt;/code&gt; isn't unusual. Android has dozens of interfaces with default error methods. &lt;code&gt;WebViewClient.onReceivedError()&lt;/code&gt; has a default. &lt;code&gt;MediaPlayer.OnErrorListener&lt;/code&gt; has patterns where partial implementation looks complete. Every SAM-converted lambda on an interface with default methods is a potential silent failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mutexes amplify hangs into outages.&lt;/strong&gt; A 10-second geocoding timeout would have been invisible — one null address, one row without a street name, nobody notices. But a mutex turned a local hang into a system-wide 21-hour data loss. If you're using a mutex to serialize writes, the critical section should contain only writes. Anything that touches the network, the filesystem, or a third-party service belongs outside the lock.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Silent failures are worse than crashes.&lt;/strong&gt; If the geocoder had thrown an exception, I would have found it in the first hour. Instead, it hung — producing no error, no log, no crash report. The only evidence was the absence of data in a database table. In a safety-critical app that monitors whether elderly people are still moving, silence is the most dangerous failure mode there is.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The app is called "How Are You?! Senior Safety" — soon it will be released, once I am confident, that there are no bad surprises popping up. Have you ever been bitten by a default interface method you didn't know existed?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>android</category>
      <category>kotlin</category>
      <category>programming</category>
      <category>debug</category>
    </item>
    <item>
      <title>I built a 126K-line Android app with AI — here is the workflow that actually worked for me</title>
      <dc:creator>Stoyan Minchev</dc:creator>
      <pubDate>Sun, 29 Mar 2026 08:58:53 +0000</pubDate>
      <link>https://dev.to/stoyan_minchev/i-built-a-126k-line-android-app-with-ai-here-is-the-workflow-that-actually-works-2llj</link>
      <guid>https://dev.to/stoyan_minchev/i-built-a-126k-line-android-app-with-ai-here-is-the-workflow-that-actually-works-2llj</guid>
      <description>&lt;p&gt;Most developers trying AI coding tools hit the same wall. They open a chat, type "build me a todo app," get something that looks right, and then spend 3 hours fixing the mess. They try again with a bigger project and it falls apart faster. They conclude AI coding is overhyped.&lt;/p&gt;

&lt;p&gt;I had the same experience. Then I changed my approach — not the tool, the process around it.&lt;/p&gt;

&lt;p&gt;Over 4 months I built &lt;a href="https://howareu.app" rel="noopener noreferrer"&gt;How Are You?!&lt;/a&gt;, a safety-critical Android app that monitors elderly people living alone. 126,000 lines of Kotlin. 144 versions. 130 test files. 3 languages. Solo developer with zero Kotlin experience when I started. The entire codebase was AI-generated — I never wrote Kotlin manually.&lt;/p&gt;

&lt;p&gt;This article is not about the app. It is about the workflow that made this possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why most people fail with AI coding
&lt;/h2&gt;

&lt;p&gt;Two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Expectations are wrong.&lt;/strong&gt; People expect to describe a feature in plain English and get production code. That works for a function. It does not work for a system. AI is not a replacement for engineering — it is an amplifier. If your input is vague, the output is vague.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No structure around the AI.&lt;/strong&gt; They open a chat, prompt, get code, paste it, prompt again. There is no architecture. No shared context. No accumulated knowledge. Every conversation starts from zero.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix is not better prompting. It is better engineering process — with the AI as a participant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Architecture before code (BMAD)
&lt;/h2&gt;

&lt;p&gt;Before writing a single line of code, I used &lt;a href="https://docs.bmad-method.org/" rel="noopener noreferrer"&gt;BMAD&lt;/a&gt; (a structured methodology for AI-assisted development) to create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Product Requirements Document&lt;/strong&gt; — what the app does, who it is for, what the constraints are&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture document&lt;/strong&gt; — module boundaries, layer responsibilities, error handling patterns, data flow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Project context&lt;/strong&gt; — coding standards, naming conventions, DO/DON'T lists&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This took about a week. It felt slow. It was the most valuable week of the entire project.&lt;/p&gt;

&lt;p&gt;Why? Because every conversation with the AI after that point had a shared foundation. The AI was not guessing what my app looked like — it knew. Module boundaries were defined. Error handling was standardized. The AI could generate code that fit into a real system because the system was documented.&lt;/p&gt;

&lt;p&gt;Without architecture docs, AI generates code that looks correct in isolation but conflicts with everything else. You spend all your time merging inconsistent outputs instead of building features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: CLAUDE.md — the constitution
&lt;/h2&gt;

&lt;p&gt;Claude Code loads a &lt;code&gt;CLAUDE.md&lt;/code&gt; file from your project root at the start of every conversation. This is the most important file in my repository.&lt;/p&gt;

&lt;p&gt;Mine contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Module boundaries&lt;/strong&gt; enforced by Gradle (which module can import what)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core patterns&lt;/strong&gt; (all use cases return &lt;code&gt;Result&amp;lt;T&amp;gt;&lt;/code&gt;, ViewModels expose &lt;code&gt;StateFlow&lt;/code&gt;, never &lt;code&gt;GlobalScope&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critical DON'Ts&lt;/strong&gt; — a condensed list of rules that came from production bugs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subsystem quick reference&lt;/strong&gt; — a table pointing to detailed rules for each area (AlarmManager, sensors, AI, email, billing, GPS, permissions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every rule in that file exists because I violated it once and something broke. The file grows with the project.&lt;/p&gt;

&lt;p&gt;This is the key insight: &lt;strong&gt;CLAUDE.md turns one-time lessons into permanent constraints.&lt;/strong&gt; The AI never forgets a rule I put there. I forget constantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Living documentation with start/stop commands
&lt;/h2&gt;

&lt;p&gt;I built custom slash commands that bookend every development session:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/howareyou-start&lt;/code&gt;&lt;/strong&gt; — loads the developer briefing, critical rules, release notes, and current version. The AI reads everything before I write a single prompt. It takes 30 seconds and prevents 80% of the mistakes I used to make.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/howareyou-stop&lt;/code&gt;&lt;/strong&gt; — updates release notes, archives old entries, updates CRITICAL_DONTS.md with any new lessons, updates the developer briefing, bumps the version, commits, and pushes.&lt;/p&gt;

&lt;p&gt;The documentation is never stale because updating it is part of the release process, not a separate task. I do not update docs manually. The AI does it as part of shipping.&lt;/p&gt;

&lt;p&gt;This creates a flywheel: better docs -&amp;gt; better AI output -&amp;gt; fewer bugs -&amp;gt; lessons captured -&amp;gt; better docs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Concrete technical specs
&lt;/h2&gt;

&lt;p&gt;When I need a new feature, I do not say "add travel detection." I use BMAD's tech spec workflow to produce a document that specifies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exact state machine (HOME -&amp;gt; DAY_1 -&amp;gt; TRAVELING -&amp;gt; TRIP_ENDED)&lt;/li&gt;
&lt;li&gt;Database schema changes (table names, column types, indexes)&lt;/li&gt;
&lt;li&gt;Which existing classes are affected and how&lt;/li&gt;
&lt;li&gt;Edge cases and error handling&lt;/li&gt;
&lt;li&gt;What tests to write&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The spec is 2-5 pages. Writing it takes 30 minutes with BMAD's guided conversation. It saves hours of back-and-forth with the AI during implementation and eliminates the "it generated something but it does not fit" problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rule:&lt;/strong&gt; if I cannot describe the feature precisely enough for a spec, I am not ready to build it. I brainstorm first (also with the AI), then spec, then build.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Brainstorming sessions
&lt;/h2&gt;

&lt;p&gt;I use BMAD brainstorming for everything — not just code. Pricing strategy. UX decisions. Marketing approaches. Whether to support SMS notifications or stick with email.&lt;/p&gt;

&lt;p&gt;The pattern: open a session, describe the problem, let the AI challenge my assumptions. I keep the transcripts. Some of my best architectural decisions came from brainstorming sessions where the AI pointed out an edge case I had not considered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Automated audits that run weekly
&lt;/h2&gt;

&lt;p&gt;My app has to survive Android OEM battery killers (Samsung, Xiaomi, Honor, OPPO — they all kill background apps differently). These OEMs ship updates constantly that can break my compatibility layer.&lt;/p&gt;

&lt;p&gt;I built two audit commands:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/howareyou-oem-audit&lt;/code&gt;&lt;/strong&gt; — searches the web for recent OEM changelog entries and breaking changes, then scans my codebase for affected areas and proposes fixes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/howareyou-gps-audit&lt;/code&gt;&lt;/strong&gt; — does the same for GPS and location API changes (FusedLocationProvider updates, OEM GPS power management changes).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/howareyou-full-audit&lt;/code&gt;&lt;/strong&gt; — runs both in parallel and produces a combined report with a prioritized action plan.&lt;/p&gt;

&lt;p&gt;I run these weekly. They have caught breaking changes before they hit my users — Samsung silently resetting battery optimization exemptions after OTA updates, Honor changing wakelock tag whitelisting behavior, Google deprecating location API parameters.&lt;/p&gt;

&lt;p&gt;This is the kind of thing that would take a human developer hours of manual searching. The AI does it in minutes and maps the findings directly to my source code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: One-command publishing
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/howareyou-build-test    → builds signed release AAB
/howareyou-publish-testingMode  → uploads to Google Play internal + closed testing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From "the code is ready" to "testers have the update" in under 5 minutes, without leaving the terminal. No browser, no Play Console clicking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 8: Infrastructure monitoring
&lt;/h2&gt;

&lt;p&gt;I use 6 Google Cloud projects for Gemini API key rotation (each project gets 10K free requests/day — 60K total). Things break. Billing gets disabled. Keys expire.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/howareyou-monitor&lt;/code&gt;&lt;/strong&gt; — checks all 6 shards, reports which are healthy, which failed, and why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/howareyou-fix-billing&lt;/code&gt;&lt;/strong&gt; — automatically re-links disabled shards to the shared billing account.&lt;/p&gt;

&lt;p&gt;These are not development tasks. They are operational tasks that I handle from the same terminal where I write code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 9: Code reviews with a second model
&lt;/h2&gt;

&lt;p&gt;After implementing a feature, I run a code review using BMAD's adversarial review workflow. It is configured to find 3-10 specific problems in every review — it never says "looks good." It checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture compliance (are module boundaries respected?)&lt;/li&gt;
&lt;li&gt;Test coverage (are edge cases tested?)&lt;/li&gt;
&lt;li&gt;Security (any hardcoded keys? SQL injection? XSS?)&lt;/li&gt;
&lt;li&gt;Performance (unnecessary allocations? missing indexes?)&lt;/li&gt;
&lt;li&gt;Consistency with project patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This catches things I miss because I have been staring at the code for hours. The adversarial framing is important — a review that always approves is useless.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 10: Lessons learned as a living document
&lt;/h2&gt;

&lt;p&gt;Every production bug becomes a rule in &lt;code&gt;CRITICAL_DONTS.md&lt;/code&gt;. The file is organized by subsystem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AlarmManager:&lt;/strong&gt; never call &lt;code&gt;setAlarmClock()&lt;/code&gt; more than 3x/day (Honor flags you)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sensor:&lt;/strong&gt; always flush FIFO and discard stale readings (Honor rebases timestamps)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email:&lt;/strong&gt; per-recipient sends, never batch (Resend delivery tracking breaks)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPS:&lt;/strong&gt; full priority fallback chain, never trust a single &lt;code&gt;getCurrentLocation()&lt;/code&gt; call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are 50+ rules in that file. Each one has a version number (when it was added) and a rationale (why it matters). The AI reads this file at the start of every session via the &lt;code&gt;/howareyou-start&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;This is the most underrated part of the workflow. Most developers keep lessons in their head. Heads forget. Files do not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The daily workflow
&lt;/h2&gt;

&lt;p&gt;Here is what a typical development day looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;/howareyou-start&lt;/code&gt; — AI loads all context (30 seconds)&lt;/li&gt;
&lt;li&gt;Describe the task — with a tech spec if it is a feature, or a bug description if it is a fix&lt;/li&gt;
&lt;li&gt;AI implements — I review the diff, run tests&lt;/li&gt;
&lt;li&gt;Iterate — usually 1-3 rounds&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/howareyou-stop&lt;/code&gt; — docs updated, version bumped, committed, pushed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/howareyou-publish-testingMode&lt;/code&gt; — testers have the update&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I ship multiple versions per day with this flow. Not because I rush — because the overhead between "code works" and "testers have it" is near zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is NOT
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;It is not "no-code.". If you know the language, it is worth checking and correcting if needed. With time, the needed small fixes will become less. It is always good to understand the architecture and to make the design decisions yourself.&lt;/li&gt;
&lt;li&gt;It is not effortless. The workflow took months to build. The documentation is extensive.&lt;/li&gt;
&lt;li&gt;It is not magic. The AI makes mistakes. The difference is that mistakes are caught by the process (tests, reviews, rules, audits) instead of by users.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;126,000 lines&lt;/strong&gt; of Kotlin across 398 files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;45,000 lines&lt;/strong&gt; of tests across 130 files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;144 versions&lt;/strong&gt; shipped&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 languages&lt;/strong&gt; (English, Bulgarian, German)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;50+ production lessons&lt;/strong&gt; captured in CRITICAL_DONTS.md&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4 months&lt;/strong&gt; from zero Kotlin experience to production app on Google Play&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9 custom commands&lt;/strong&gt; automating the full development lifecycle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0 lines&lt;/strong&gt; of Kotlin written manually by me&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;AI coding tools are not magic code generators. They are force multipliers for engineering process. If your process is "open chat, type prompt, hope for the best," you will be disappointed.&lt;/p&gt;

&lt;p&gt;If your process is "document the architecture, define the rules, automate the lifecycle, capture every lesson, review everything adversarially" — the AI becomes unreasonably effective.&lt;/p&gt;

&lt;p&gt;The investment is not in better prompts. It is in better engineering.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The app is &lt;a href="https://howareu.app" rel="noopener noreferrer"&gt;How Are You?!&lt;/a&gt; — AI safety monitoring for elderly parents. It will be released soon. The code workflow described here uses &lt;a href="https://claude.ai/code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; with &lt;a href="https://docs.bmad-method.org/" rel="noopener noreferrer"&gt;BMAD&lt;/a&gt;. Both are tools I use daily and genuinely recommend.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>programming</category>
      <category>devops</category>
    </item>
    <item>
      <title>What Android OEMs do to background apps, and the 11 layers I built to survive it</title>
      <dc:creator>Stoyan Minchev</dc:creator>
      <pubDate>Mon, 23 Mar 2026 12:02:33 +0000</pubDate>
      <link>https://dev.to/stoyan_minchev/what-android-oems-do-to-background-apps-and-the-11-layers-i-built-to-survive-it-28bb</link>
      <guid>https://dev.to/stoyan_minchev/what-android-oems-do-to-background-apps-and-the-11-layers-i-built-to-survive-it-28bb</guid>
      <description>&lt;p&gt;I spent over a year building a safety monitoring app that runs 24/7 on elderly parents' phones. If it gets killed, nobody gets alerted when something goes wrong. That constraint forced me into the deepest, most frustrating corners of Android background execution.&lt;/p&gt;

&lt;p&gt;This article covers what I learned about how Samsung, Xiaomi, Honor, OPPO, and Vivo actively kill background apps, why the standard Android approach is nowhere near sufficient, and the 11-layer recovery architecture I ended up building. I will also cover two related problems that surprised me: GPS hardware that silently stops working, and accelerometer data that lies about its age.&lt;/p&gt;

&lt;p&gt;126,000 lines of Kotlin, 125+ versions, solo developer. The app is called &lt;a href="https://howareu.app" rel="noopener noreferrer"&gt;How Are You?!&lt;/a&gt; — it learns an elderly person's daily routine over 7 days, then monitors around the clock and emails the family if something seems wrong. But this article is about the engineering, not the product.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem: Android wants your app dead
&lt;/h2&gt;

&lt;p&gt;Stock Android already makes continuous background work difficult. Doze mode, App Standby, background execution limits — Google has been tightening the screws since Android 6. A foreground service with &lt;code&gt;REQUEST_IGNORE_BATTERY_OPTIMIZATIONS&lt;/code&gt; is the standard answer.&lt;/p&gt;

&lt;p&gt;That is necessary. It is nowhere near sufficient.&lt;/p&gt;

&lt;p&gt;OEMs add their own proprietary battery management on top of stock Android, and they are far more aggressive. Here is what I encountered on the devices I tested:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Samsung&lt;/strong&gt; maintains a "Sleeping Apps" list. If your app has no foreground activity for 3 days, Samsung kills it. OTA updates silently reset your battery optimization exemption. The user opted you out of optimization? Samsung un-opted you after the update.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Xiaomi (MIUI/HyperOS)&lt;/strong&gt; kills background services aggressively and resets autostart permissions after OTA updates. Your app was whitelisted? Not anymore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Honor and Huawei&lt;/strong&gt; have PowerGenie, which monitors how often your app wakes the system. Call &lt;code&gt;setAlarmClock()&lt;/code&gt; more than about 3 times per day and you get flagged as "frequently wakes your system." They also have HwPFWService, which kills apps holding wakelocks longer than 60 minutes with non-whitelisted tags.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OPPO (ColorOS)&lt;/strong&gt; has "Sleep standby optimization" that freezes apps during the hours the phone detects the user is sleeping. A safety monitoring app for elderly people needs to run &lt;em&gt;especially&lt;/em&gt; during sleep hours — that is when falls and medical events go unnoticed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vivo (Funtouch OS)&lt;/strong&gt; has "AI sleep mode" that does the same thing.&lt;/p&gt;

&lt;p&gt;Each manufacturer found a different way to kill you. No single workaround survives all of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The answer: 11 layers of recovery
&lt;/h2&gt;

&lt;p&gt;The core insight is that no single mechanism is reliable across all OEMs and all device states. The answer is redundancy — each layer catches the failures of the layers above it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Foreground service with START_STICKY
&lt;/h3&gt;

&lt;p&gt;The foundation. &lt;code&gt;startForeground()&lt;/code&gt; with a persistent notification. The notification channel must use &lt;code&gt;IMPORTANCE_MIN&lt;/code&gt; — not &lt;code&gt;IMPORTANCE_DEFAULT&lt;/code&gt; or higher. Why? OEMs auto-grant &lt;code&gt;POST_NOTIFICATIONS&lt;/code&gt; on higher importance channels, bypassing the user's notification settings and making your persistent notification visible. &lt;code&gt;IMPORTANCE_MIN&lt;/code&gt; keeps it silent while &lt;code&gt;startForeground()&lt;/code&gt; still gives your process elevated priority.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;START_STICKY&lt;/code&gt; tells the system to restart the service after a kill. But "restart" can take minutes or never happen on aggressive OEMs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: onDestroy recovery scheduling
&lt;/h3&gt;

&lt;p&gt;When the system kills your service, &lt;code&gt;onDestroy()&lt;/code&gt; fires (most of the time). Use this 50ms window to schedule everything that will bring you back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onDestroy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onDestroy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nc"&gt;ServiceWatchdogReceiver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scheduleWithBackup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nc"&gt;MotionSnapshotReceiver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This fires both the AlarmManager chain and the motion snapshot chain. If &lt;code&gt;onDestroy()&lt;/code&gt; does not fire (force-stop, OEM kill without callback), the other layers cover it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: AlarmManager watchdog chain
&lt;/h3&gt;

&lt;p&gt;A self-chaining &lt;code&gt;setExactAndAllowWhileIdle()&lt;/code&gt; alarm at 15-minute intervals during active use. When it fires, it checks whether the service is alive and restarts it if not.&lt;/p&gt;

&lt;p&gt;The interval adapts to power state: 15 minutes when active, 30 minutes when idle, 60 minutes during deep sleep. This matters for OEM battery scoring — more frequent alarms get flagged.&lt;/p&gt;

&lt;p&gt;Important: never use &lt;code&gt;Handler.postDelayed()&lt;/code&gt; as a replacement for AlarmManager. Handlers do not fire during CPU deep sleep. I learned this the hard way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: WorkManager periodic watchdog
&lt;/h3&gt;

&lt;p&gt;A &lt;code&gt;PeriodicWorkRequest&lt;/code&gt; at 15-minute intervals that does the same thing — checks the service and restarts if needed. WorkManager survives service kills and uses JobScheduler under the hood, which OEMs are more reluctant to interfere with.&lt;/p&gt;

&lt;p&gt;But there is a subtle trap: &lt;code&gt;ExistingPeriodicWorkPolicy.KEEP&lt;/code&gt; silently discards new requests if a worker is already enqueued, even if the existing one has a stale timer from hours ago. And &lt;code&gt;REPLACE&lt;/code&gt; resets the countdown every time you call &lt;code&gt;schedule()&lt;/code&gt;. The solution: query &lt;code&gt;getWorkInfosForUniqueWork()&lt;/code&gt; first and only schedule when the worker is not already enqueued.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;workInfos&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workManager&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getWorkInfosForUniqueWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WATCHDOG_WORK_NAME&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;await&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;isEnqueued&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workInfos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;any&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;WorkInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;State&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ENQUEUED&lt;/span&gt; &lt;span class="p"&gt;||&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;WorkInfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;State&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RUNNING&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;isEnqueued&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;workManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueueUniquePeriodicWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;WATCHDOG_WORK_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;ExistingPeriodicWorkPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;KEEP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;watchdogRequest&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 5: Boot recovery
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;BOOT_COMPLETED&lt;/code&gt;, &lt;code&gt;LOCKED_BOOT_COMPLETED&lt;/code&gt;, &lt;code&gt;QUICKBOOT_POWERON&lt;/code&gt;, and &lt;code&gt;MY_PACKAGE_REPLACED&lt;/code&gt; receivers that re-establish the service and all alarm chains after reboot or app update.&lt;/p&gt;

&lt;p&gt;Some OEMs reset permissions after OTA updates. OnePlus, Samsung, Xiaomi, Redmi, and POCO all do this. You need to detect the OTA and re-prompt the user for battery optimization exemption.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 6: SyncAdapter for process priority
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ContentResolver.addPeriodicSync()&lt;/code&gt; gives your process elevated priority through the sync framework. OEMs are reluctant to kill sync adapter processes because the sync framework is a system concept — killing it could break contacts, calendar, and email sync.&lt;/p&gt;

&lt;p&gt;This is a ~1-hour periodic callback that checks service health. It will not bring you back fast, but it is extremely hard for OEMs to suppress.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 7: AlarmClock safety net
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;setAlarmClock()&lt;/code&gt; at 8-hour intervals — approximately 3 calls per day. This is the nuclear option. AlarmClock alarms get the highest delivery priority on Android because they are designed to wake users up.&lt;/p&gt;

&lt;p&gt;Why 8 hours and not shorter? Honor's PowerGenie specifically tracks AlarmClock frequency. At 15-minute intervals, it flags you as "frequently wakes your system" and kills you. At 8-hour intervals (~3/day), you fly under the radar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;scheduleSafetyNet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;intent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PendingIntent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getBroadcast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;REQUEST_CODE_ALARMCLOCK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;PendingIntent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FLAG_UPDATE_CURRENT&lt;/span&gt; &lt;span class="n"&gt;or&lt;/span&gt; &lt;span class="nc"&gt;PendingIntent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FLAG_IMMUTABLE&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;triggerAt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;currentTimeMillis&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="nc"&gt;SAFETY_NET_INTERVAL_MS&lt;/span&gt; &lt;span class="c1"&gt;// 8 hours&lt;/span&gt;
    &lt;span class="n"&gt;alarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setAlarmClock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;AlarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AlarmClockInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;triggerAt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;intent&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 8: Exact alarm permission recovery
&lt;/h3&gt;

&lt;p&gt;When the user revokes &lt;code&gt;SCHEDULE_EXACT_ALARM&lt;/code&gt;, &lt;strong&gt;all&lt;/strong&gt; pending AlarmManager chains die silently. No callback, no exception. Your watchdog, your snapshot receiver, your safety net — all gone.&lt;/p&gt;

&lt;p&gt;Listen for &lt;code&gt;ACTION_SCHEDULE_EXACT_ALARM_PERMISSION_STATE_CHANGED&lt;/code&gt; and re-establish everything on re-grant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExactAlarmPermissionReceiver&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;BroadcastReceiver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onReceive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Intent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;canScheduleExactAlarms&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;ServiceWatchdogReceiver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scheduleWithBackup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nc"&gt;MotionSnapshotReceiver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 9: Batched accelerometer sensing
&lt;/h3&gt;

&lt;p&gt;This is the layer that surprised me most. Keep the accelerometer registered with &lt;code&gt;maxReportLatencyUs&lt;/code&gt; during idle and deep sleep. The sensor HAL continuously samples into a hardware FIFO buffer and delivers readings via a sensor interrupt — this is completely invisible to OEM battery managers because it does not use AlarmManager, WorkManager, or any schedulable mechanism.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="n"&gt;sensorManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;batchedMotionListener&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;accelerometer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;SensorManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SENSOR_DELAY_NORMAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;maxReportLatencyUs&lt;/span&gt;  &lt;span class="c1"&gt;// 10 min in deep sleep&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The HAL batches readings and delivers them all at once when the buffer fills or the latency expires. You get continuous motion awareness with zero wakes visible to the OEM.&lt;/p&gt;

&lt;p&gt;One gotcha: a single SLIGHT_MOVEMENT reading (1.0-3.0 m/s^2) should not exit batched mode. Table vibrations and building micro-movements produce transient spikes. I require 3 consecutive SLIGHT_MOVEMENT readings (~15 seconds) before exiting. Anything above 3.0 m/s^2 (MODERATE_MOVEMENT) exits immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 10: Network restoration and app foreground triggers
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;CONNECTIVITY_ACTION&lt;/code&gt; receiver triggers a service health check when the network comes back. &lt;code&gt;ProcessLifecycleOwner&lt;/code&gt; fires when the user opens the app. These are opportunistic — they catch edge cases where the service died during airplane mode or extended offline periods.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 11: User-facing gap detection
&lt;/h3&gt;

&lt;p&gt;When all 10 layers fail (and on some devices, they do), the app detects the gap and shows the user device-specific instructions: "Your [Manufacturer] phone is stopping background apps. Open Settings &amp;gt; Battery &amp;gt; [OEM-specific path] and disable optimization for How Are You?!"&lt;/p&gt;

&lt;p&gt;This is the least satisfying layer because it requires user action. But on a few particularly aggressive OEM configurations, it is the only thing that works.&lt;/p&gt;




&lt;h2&gt;
  
  
  The wakelock tag problem on Honor
&lt;/h2&gt;

&lt;p&gt;HwPFWService on Honor and Huawei devices maintains a whitelist of allowed wakelock tags. If your app holds a wakelock for more than 60 minutes with a tag that is not on the whitelist, HwPFWService kills your app.&lt;/p&gt;

&lt;p&gt;The solution is embarrassingly simple: use a whitelisted tag on Honor/Huawei, your real tag everywhere else.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;WAKELOCK_TAG&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;manufacturer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Build&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MANUFACTURER&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;lowercase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;orEmpty&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;manufacturer&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"huawei"&lt;/span&gt; &lt;span class="p"&gt;||&lt;/span&gt; &lt;span class="n"&gt;manufacturer&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"honor"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"LocationManagerService"&lt;/span&gt;  &lt;span class="c1"&gt;// Whitelisted by HwPFWService&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="s"&gt;"HowAreYou:PulseBurst"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;LocationManagerService&lt;/code&gt; is whitelisted because it is a system service tag. I am not proud of this, but it works.&lt;/p&gt;




&lt;h2&gt;
  
  
  getCurrentLocation() hangs forever
&lt;/h2&gt;

&lt;p&gt;Once I had the service staying alive, I discovered a second problem: GPS does not work when you need it.&lt;/p&gt;

&lt;p&gt;At approximately 12% battery on my Honor test device, the OEM battery saver silently killed GPS hardware access. No exception, no error callback, no log entry. The foreground service was alive, the accelerometer worked. But &lt;code&gt;getCurrentLocation(PRIORITY_HIGH_ACCURACY)&lt;/code&gt; simply never completed. The Task from Play Services hung indefinitely — neither &lt;code&gt;onSuccessListener&lt;/code&gt; nor &lt;code&gt;onFailureListener&lt;/code&gt; ever fired.&lt;/p&gt;

&lt;p&gt;The code fell back to &lt;code&gt;getLastLocation()&lt;/code&gt;, which returned a 5-hour-old cached position from a completely different city.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 1: Always timeout
&lt;/h3&gt;

&lt;p&gt;Every &lt;code&gt;getCurrentLocation()&lt;/code&gt; call must be wrapped in a coroutine timeout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getLocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;withTimeoutOrNull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30_000L&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;suspendCancellableCoroutine&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;cont&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="n"&gt;fusedClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getCurrentLocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addOnSuccessListener&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;cont&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addOnFailureListener&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;cont&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fix 2: Priority fallback chain
&lt;/h3&gt;

&lt;p&gt;GPS hardware being dead does not mean all location sources are dead. Cell towers and Wi-Fi still work. I built a sequential fallback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PRIORITY_HIGH_ACCURACY (GPS, ~10m)
    | timeout or null
PRIORITY_BALANCED_POWER_ACCURACY (Wi-Fi + cell, ~40-300m)
    | timeout or null
PRIORITY_LOW_POWER (cell only, ~300m-3km)
    | timeout or null
getLastLocation() (cached, any age)
    | null
TotalFailure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step gets its own 30-second timeout. In practice, when GPS is killed, BALANCED_POWER_ACCURACY returns in 2-3 seconds because Wi-Fi scanning still works.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 3: GPS wake probe
&lt;/h3&gt;

&lt;p&gt;Sometimes the GPS hardware is not permanently dead — it has been suspended by the battery manager. A brief &lt;code&gt;requestLocationUpdates&lt;/code&gt; call can wake it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hoursSinceLastFreshGps&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;probeRequest&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocationRequest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Priority&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PRIORITY_HIGH_ACCURACY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000L&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setDurationMillis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5_000L&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setMaxUpdates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;withTimeoutOrNull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6_000L&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fusedClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;requestLocationUpdates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;probeRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;looper&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;fusedClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;removeLocationUpdates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five seconds, maximum once every 4 hours. On Honor, this recovers the GPS roughly 40% of the time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix 4: Explicit outcome types
&lt;/h3&gt;

&lt;p&gt;The original code returned &lt;code&gt;Location?&lt;/code&gt;. The caller had no way to distinguish a fresh 10-meter GPS fix from a 5-hour-old cached position. I changed the return type to make the quality of data explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;sealed&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;GpsLocationOutcome&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;FreshGps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;accuracy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;GpsLocationOutcome&lt;/span&gt;
    &lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;WakeProbeSuccess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;accuracy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;GpsLocationOutcome&lt;/span&gt;
    &lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;CellFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;accuracy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;GpsLocationOutcome&lt;/span&gt;
    &lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;StaleLastLocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;ageMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;GpsLocationOutcome&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="kd"&gt;object&lt;/span&gt; &lt;span class="nc"&gt;TotalFailure&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;GpsLocationOutcome&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the consumer can make informed decisions. A 3km cell tower reading is low precision, but it answers "is this person in the expected city or 200km away?" For a safety app, that distinction matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  The sensor HAL lies about timestamps
&lt;/h2&gt;

&lt;p&gt;At 3 AM, your app wakes up to check the accelerometer. You call &lt;code&gt;registerListener()&lt;/code&gt;, and the sensor HAL returns data. You check &lt;code&gt;event.timestamp&lt;/code&gt; against &lt;code&gt;SystemClock.elapsedRealtimeNanos()&lt;/code&gt;. The delta is small. The data looks fresh.&lt;/p&gt;

&lt;p&gt;It is not. It is 22-minute-old data sitting in the hardware FIFO buffer since the last time anyone read the sensor.&lt;/p&gt;

&lt;p&gt;This is the normal behavior of hardware sensor FIFOs. When the CPU sleeps, the sensor continues sampling into its buffer. When you register a listener after wakeup, the HAL dumps the entire buffer contents at you. The timestamps are real (the readings were taken at those times), but the data is stale — it describes what happened 22 minutes ago, not what is happening now.&lt;/p&gt;

&lt;p&gt;On most devices, you can catch this by comparing &lt;code&gt;event.timestamp&lt;/code&gt; (CLOCK_BOOTTIME nanoseconds) against &lt;code&gt;SystemClock.elapsedRealtimeNanos()&lt;/code&gt;. If the delta is large, the reading is stale.&lt;/p&gt;

&lt;p&gt;Honor broke this assumption. On Honor devices, the HAL rebases &lt;code&gt;event.timestamp&lt;/code&gt; on FIFO flush, so the delta check shows the data as fresh even when it is not.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix: flush, wait for callback, then collect
&lt;/h3&gt;

&lt;p&gt;Do not trust the first readings after &lt;code&gt;registerListener()&lt;/code&gt;. Instead:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Call &lt;code&gt;sensorManager.flush(this)&lt;/code&gt; to drain the stale FIFO data&lt;/li&gt;
&lt;li&gt;Wait for the &lt;code&gt;onFlushCompleted()&lt;/code&gt; callback from &lt;code&gt;SensorEventListener2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Only start collecting readings after the flush completes&lt;/li&gt;
&lt;li&gt;Set a 1000ms fallback timer in case the HAL never fires the callback
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MotionSnapshotReceiver&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;BroadcastReceiver&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;SensorEventListener2&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="py"&gt;isFlushPhase&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onSensorChanged&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;SensorEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isFlushPhase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;  &lt;span class="c1"&gt;// Discard stale FIFO data&lt;/span&gt;
        &lt;span class="nf"&gt;collectReading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onFlushCompleted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sensor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Sensor&lt;/span&gt;&lt;span class="p"&gt;?)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;endFlushPhase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;byHal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;endFlushPhase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;byHal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;isFlushPhase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;  &lt;span class="c1"&gt;// Guard against double-trigger&lt;/span&gt;
        &lt;span class="n"&gt;isFlushPhase&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
        &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;removeCallbacks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flushFallbackRunnable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;// Now start collecting real readings&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fallback timer at 1000ms is important. I originally used 200ms, which was insufficient for Honor devices — their deep FIFO drains at approximately 16Hz, and a full buffer can take over 200ms to flush.&lt;/p&gt;

&lt;p&gt;As a secondary safety net, I use dual-clock comparison: both &lt;code&gt;CLOCK_BOOTTIME&lt;/code&gt; and &lt;code&gt;CLOCK_MONOTONIC&lt;/code&gt; deltas must agree that the reading is fresh. If either delta exceeds 500ms of staleness, the reading is discarded.&lt;/p&gt;




&lt;h2&gt;
  
  
  A race condition in GPS processing
&lt;/h2&gt;

&lt;p&gt;I had multiple independent trigger paths (stillness detector, smart GPS scheduler, area stability detector) that could request GPS concurrently. Two of them fired within 33 milliseconds of each other. Both read the same &lt;code&gt;getLastLocation()&lt;/code&gt;, both passed the stationarity filter, and both inserted a GPS reading.&lt;/p&gt;

&lt;p&gt;My code uses a minimum-readings-per-cluster filter to discard drive-through locations — a place needs at least 2 GPS readings to count as a real visit. The duplicate entry from the race condition defeated this filter. A single drive-by at 60km/h became a "cluster of 2."&lt;/p&gt;

&lt;p&gt;The fix is a Mutex around the entire location processing path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;processLocationMutex&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Mutex&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;processLocation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;processLocationMutex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withLock&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;lastLocation&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getLastLocation&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;// The second concurrent caller now sees the just-inserted&lt;/span&gt;
        &lt;span class="c1"&gt;// location and correctly skips as duplicate&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Battery result
&lt;/h2&gt;

&lt;p&gt;After all 11 layers and three tiers of power state, the battery impact is under 1% per day. The key numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before optimization:&lt;/strong&gt; ~4,300 AlarmManager wakes per day. Every active-mode pulse (15s/30s) used AlarmManager. Every watchdog check (every 5 minutes) used AlarmManager. Honor flagged the app within hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After optimization:&lt;/strong&gt; ~240 wakes per day. Active-mode pulses use &lt;code&gt;Handler.postDelayed()&lt;/code&gt; (zero AlarmManager wakes). Watchdog intervals extended from 5 to 15 minutes. AlarmClock safety net reduced from every 15 minutes to every 8 hours.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a 94% reduction in system wakes while maintaining the same monitoring reliability.&lt;/p&gt;

&lt;p&gt;The insight: aggressive scheduling wastes more battery than it saves in reliability. A three-tier power state that backs off when the device is still (active at 15-second pulses, idle at 5-minute pulses, deep sleep at 30-minute pulses with batched accelerometer as safety net) achieves both low battery impact and high reliability.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I would do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Build the OEM compatibility layer first.&lt;/strong&gt; I treated background reliability as something I would fix later. It took 40+ versions across several months to get right. It should have been the architectural foundation from Day 1.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test on real OEM devices from the start.&lt;/strong&gt; The Android emulator and Pixel devices tell you nothing about OEM battery management. I did not discover the Honor wakelock whitelist problem, the GPS hardware suspension, or the sensor FIFO timestamp rebasing until I tested on actual devices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Never trust a single mechanism.&lt;/strong&gt; Every Android background API has an OEM that breaks it. AlarmManager gets suppressed. WorkManager gets deferred. Foreground services get killed. The only reliable approach is layered redundancy where each mechanism independently tries to recover.&lt;/p&gt;




&lt;p&gt;The app is called &lt;a href="https://howareu.app" rel="noopener noreferrer"&gt;How Are You?!&lt;/a&gt; and is available on Google Play. It is still in early testing — if you have an elderly parent on Android and want to try it, I would appreciate feedback, especially from OEM devices I have not tested yet. Email: &lt;a href="mailto:developer@howareu.app"&gt;developer@howareu.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I am happy to answer questions about any of these techniques. The OEM compatibility rabbit hole goes much deeper than what I have covered here.&lt;/p&gt;

</description>
      <category>android</category>
      <category>kontlin</category>
      <category>sideprojects</category>
      <category>architecture</category>
    </item>
    <item>
      <title>I spent several months building an AI safety app for my elderly parent — here is what I learned</title>
      <dc:creator>Stoyan Minchev</dc:creator>
      <pubDate>Sat, 21 Mar 2026 21:10:16 +0000</pubDate>
      <link>https://dev.to/stoyan_minchev/i-spent-several-months-building-an-ai-safety-app-for-my-elderly-parent-here-is-what-i-learned-2h8a</link>
      <guid>https://dev.to/stoyan_minchev/i-spent-several-months-building-an-ai-safety-app-for-my-elderly-parent-here-is-what-i-learned-2h8a</guid>
      <description>&lt;p&gt;My parent lives alone. After a fall that nobody noticed for hours, I decided to build something that would.&lt;/p&gt;

&lt;p&gt;Four months, 121 versions, and approximately 79,000 lines of Kotlin later, the app is live on Google Play. Here is the story — the technical challenges, the things that broke, and what I would do differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the app does&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Install it on your parent's Android phone. It watches. That is it.&lt;/p&gt;

&lt;p&gt;For 7 days, it learns their routine — when they wake up, how active they are, where they go. After that, it monitors 24/7 and emails your family if something seems wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unusual stillness (potential fall or medical event)&lt;/li&gt;
&lt;li&gt;Did not wake up on time&lt;/li&gt;
&lt;li&gt;At an unfamiliar location at an unusual hour&lt;/li&gt;
&lt;li&gt;Phone silent for too long&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No buttons to press. No wearable to charge. No daily check-in calls. Install and forget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The technical stack&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kotlin&lt;/strong&gt; + Jetpack Compose + Material Design 3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Room + SQLCipher&lt;/strong&gt; for encrypted local storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini API&lt;/strong&gt; for behavioral analysis (cloud, anonymized summaries)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resend API&lt;/strong&gt; for transactional email alerts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WorkManager + Foreground Service&lt;/strong&gt; for 24/7 reliability&lt;/li&gt;
&lt;li&gt;Clean architecture: &lt;code&gt;:domain&lt;/code&gt; (pure Kotlin) -&amp;gt; &lt;code&gt;:data&lt;/code&gt; -&amp;gt; &lt;code&gt;:ui&lt;/code&gt; -&amp;gt; &lt;code&gt;:app&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;## The hard part: staying alive on Android&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where 80% of my development time went.&lt;/p&gt;

&lt;p&gt;Android's job is to kill your app. OEMs make it worse. Here is what I learned:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;### Problem 1: OEM battery killers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Samsung, Xiaomi, Honor, OPPO — they all have proprietary battery managers that kill background apps. The standard &lt;code&gt;startForeground()&lt;/code&gt; is not enough.&lt;/p&gt;

&lt;p&gt;My solution: 11-layer service recovery:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Foreground service with IMPORTANCE_MIN channel&lt;/li&gt;
&lt;li&gt;WorkManager periodic watchdog&lt;/li&gt;
&lt;li&gt;AlarmManager backup chains&lt;/li&gt;
&lt;li&gt;BOOT_COMPLETED receiver&lt;/li&gt;
&lt;li&gt;SyncAdapter for process priority boost&lt;/li&gt;
&lt;li&gt;Batched accelerometer sensing (survives CPU sleep)&lt;/li&gt;
&lt;li&gt;Exact alarm permission recovery&lt;/li&gt;
&lt;li&gt;OEM-specific wakelock tag spoofing (Honor whitelists "LocationManagerService")&lt;/li&gt;
&lt;li&gt;START_STICKY restart&lt;/li&gt;
&lt;li&gt;Safety net AlarmClock at 8-hour intervals&lt;/li&gt;
&lt;li&gt;User-facing gap detection with OEM-specific guidance&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each layer was added because the previous ones were not enough on some device.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 2: Sensor data lies to you
&lt;/h3&gt;

&lt;p&gt;At 3 AM, your app wakes up to check the accelerometer. The sensor HAL returns data. You think it is fresh. It is not — it is 22-minute-old data sitting in the hardware FIFO buffer since the last time anyone read the sensor.&lt;/p&gt;

&lt;p&gt;On Honor devices, the HAL even rebases &lt;code&gt;event.timestamp&lt;/code&gt; on flush, so a delta check against &lt;code&gt;elapsedRealtimeNanos()&lt;/code&gt; thinks the data is fresh. The solution: explicit &lt;code&gt;sensorManager.flush()&lt;/code&gt;, discard warm-up readings, use &lt;code&gt;onFlushCompleted()&lt;/code&gt; callback instead of fixed timers, and dual-clock comparison as a safety net.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;### Problem 3: GPS does not work when you need it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;getCurrentLocation(PRIORITY_HIGH_ACCURACY)&lt;/code&gt; returns nothing. The OEM killed the GPS hardware to save power.&lt;/p&gt;

&lt;p&gt;Solution: Priority fallback chain — HIGH_ACCURACY -&amp;gt; wake probe -&amp;gt; BALANCED_POWER -&amp;gt; LOW_POWER -&amp;gt; getLastLocation(). Returns a &lt;code&gt;GpsLocationOutcome&lt;/code&gt; sealed class so the caller knows exactly what happened.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## The AI: from on-device to cloud&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I started with Gemini Nano (fully on-device). It worked on Pixels. It did not work on anything else. The addressable market was tiny.&lt;/p&gt;

&lt;p&gt;So I moved to Gemini Flash (cloud API). The privacy trade-off: detailed behavioral data stays on-device in an encrypted database, but anonymized summaries (including location context) are sent to Google's AI for weekly analysis. No names, no personal identifiers.&lt;/p&gt;

&lt;p&gt;The key architectural decision: &lt;strong&gt;API key sharding&lt;/strong&gt;. Each Google Cloud project gets 10,000 requests per day free. I created 6 projects with independent API keys. The app rotates through them on rate-limit errors (429/403). That is 60,000 requests per day — enough for thousands of users at zero cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## Travel intelligence&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The biggest UX win. Without it, every vacation generated 5 to 7 URGENT emails (one per night when the hard-floor detector fired at an unfamiliar location). By day 5, families ignored all emails.&lt;/p&gt;

&lt;p&gt;Now: Day 1 sends one "your parent appears to be traveling" notification. Days 2 through 6: silence (unless something actually changes). Return home: "they have returned to a familiar area."&lt;/p&gt;

&lt;p&gt;The state machine: HOME -&amp;gt; DAY_1 -&amp;gt; TRAVELING(n) -&amp;gt; TRIP_ENDED. Single-writer rule through &lt;code&gt;TravelStateManager&lt;/code&gt; to prevent state corruption from concurrent assessments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;## What I would do differently&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with cloud AI from Day 1.&lt;/strong&gt; I lost 2 months on Gemini Nano before accepting the device compatibility reality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build the OEM compatibility layer first.&lt;/strong&gt; The 11-layer recovery took 40+ versions to get right. It should have been the foundation, not an afterthought.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email before OAuth.&lt;/strong&gt; I started with Gmail OAuth (user signs into their Google account). It was a UX nightmare. Resend API (transactional email, zero auth) took 1 day to implement and just works.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;## Looking for early adopters&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The app is free to download with a 21-day free trial (then $49 for the first year, $5 per year after that). I am looking for families to test it — install it on your parent's Android phone (Android 9 or newer), run it for a couple of weeks, and tell me what works and what does not.&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://howareu.app" rel="noopener noreferrer"&gt;howareu.app&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>sideprojects</category>
      <category>android</category>
      <category>kotlin</category>
    </item>
  </channel>
</rss>
