Stoyan Minchev

Posted on Mar 23 • Edited on Apr 16

What Android OEMs do to background apps, and the 11 layers I built to survive it

#android #kontlin #sideprojects #architecture

I spent over a year building a safety monitoring app that runs 24/7 on elderly parents' phones. If it gets killed, nobody gets alerted when something goes wrong. That constraint forced me into the deepest, most frustrating corners of Android background execution.

This article covers what I learned about how Samsung, Xiaomi, Honor, OPPO, and Vivo actively kill background apps, why the standard Android approach is nowhere near sufficient, and the 11-layer recovery architecture I ended up building. I will also cover two related problems that surprised me: GPS hardware that silently stops working, and accelerometer data that lies about its age.

126,000 lines of Kotlin, 125+ versions, solo developer. The app is called How Are You?! — it learns an elderly person's daily routine over 7 days, then monitors around the clock and emails the family if something seems wrong. But this article is about the engineering, not the product.

The problem: Android wants your app dead

Stock Android already makes continuous background work difficult. Doze mode, App Standby, background execution limits — Google has been tightening the screws since Android 6. A foreground service with REQUEST_IGNORE_BATTERY_OPTIMIZATIONS is the standard answer.

That is necessary. It is nowhere near sufficient.

OEMs add their own proprietary battery management on top of stock Android, and they are far more aggressive. Here is what I encountered on the devices I tested:

Samsung maintains a "Sleeping Apps" list. If your app has no foreground activity for 3 days, Samsung kills it. OTA updates silently reset your battery optimization exemption. The user opted you out of optimization? Samsung un-opted you after the update.

Xiaomi (MIUI/HyperOS) kills background services aggressively and resets autostart permissions after OTA updates. Your app was whitelisted? Not anymore.

Honor and Huawei have PowerGenie, which monitors how often your app wakes the system. Call setAlarmClock() more than about 3 times per day and you get flagged as "frequently wakes your system." They also have HwPFWService, which kills apps holding wakelocks longer than 60 minutes with non-whitelisted tags.

OPPO (ColorOS) has "Sleep standby optimization" that freezes apps during the hours the phone detects the user is sleeping. A safety monitoring app for elderly people needs to run especially during sleep hours — that is when falls and medical events go unnoticed.

Vivo (Funtouch OS) has "AI sleep mode" that does the same thing.

Each manufacturer found a different way to kill you. No single workaround survives all of them.

The answer: 11 layers of recovery

The core insight is that no single mechanism is reliable across all OEMs and all device states. The answer is redundancy — each layer catches the failures of the layers above it.

Layer 1: Foreground service with START_STICKY

The foundation. startForeground() with a persistent notification. The notification channel must use IMPORTANCE_MIN — not IMPORTANCE_DEFAULT or higher. Why? OEMs auto-grant POST_NOTIFICATIONS on higher importance channels, bypassing the user's notification settings and making your persistent notification visible. IMPORTANCE_MIN keeps it silent while startForeground() still gives your process elevated priority.

START_STICKY tells the system to restart the service after a kill. But "restart" can take minutes or never happen on aggressive OEMs.

Layer 2: onDestroy recovery scheduling

When the system kills your service, onDestroy() fires (most of the time). Use this 50ms window to schedule everything that will bring you back:

override fun onDestroy() {
    super.onDestroy()
    ServiceWatchdogReceiver.scheduleWithBackup(this)
    MotionSnapshotReceiver.schedule(this)
}

This fires both the AlarmManager chain and the motion snapshot chain. If onDestroy() does not fire (force-stop, OEM kill without callback), the other layers cover it.

Layer 3: AlarmManager watchdog chain

A self-chaining setExactAndAllowWhileIdle() alarm at 15-minute intervals during active use. When it fires, it checks whether the service is alive and restarts it if not.

The interval adapts to power state: 15 minutes when active, 30 minutes when idle, 60 minutes during deep sleep. This matters for OEM battery scoring — more frequent alarms get flagged.

Important: never use Handler.postDelayed() as a replacement for AlarmManager. Handlers do not fire during CPU deep sleep. I learned this the hard way.

Layer 4: WorkManager periodic watchdog

A PeriodicWorkRequest at 15-minute intervals that does the same thing — checks the service and restarts if needed. WorkManager survives service kills and uses JobScheduler under the hood, which OEMs are more reluctant to interfere with.

But there is a subtle trap: ExistingPeriodicWorkPolicy.KEEP silently discards new requests if a worker is already enqueued, even if the existing one has a stale timer from hours ago. And REPLACE resets the countdown every time you call schedule(). The solution: query getWorkInfosForUniqueWork() first and only schedule when the worker is not already enqueued.

val workInfos = workManager
    .getWorkInfosForUniqueWork(WATCHDOG_WORK_NAME).await()
val isEnqueued = workInfos.any {
    it.state == WorkInfo.State.ENQUEUED || it.state == WorkInfo.State.RUNNING
}
if (!isEnqueued) {
    workManager.enqueueUniquePeriodicWork(
        WATCHDOG_WORK_NAME,
        ExistingPeriodicWorkPolicy.KEEP,
        watchdogRequest
    )
}

Layer 5: Boot recovery

BOOT_COMPLETED, LOCKED_BOOT_COMPLETED, QUICKBOOT_POWERON, and MY_PACKAGE_REPLACED receivers that re-establish the service and all alarm chains after reboot or app update.

Some OEMs reset permissions after OTA updates. OnePlus, Samsung, Xiaomi, Redmi, and POCO all do this. You need to detect the OTA and re-prompt the user for battery optimization exemption.

Layer 6: SyncAdapter for process priority

ContentResolver.addPeriodicSync() gives your process elevated priority through the sync framework. OEMs are reluctant to kill sync adapter processes because the sync framework is a system concept — killing it could break contacts, calendar, and email sync.

This is a ~1-hour periodic callback that checks service health. It will not bring you back fast, but it is extremely hard for OEMs to suppress.

Layer 7: AlarmClock safety net

setAlarmClock() at 8-hour intervals — approximately 3 calls per day. This is the nuclear option. AlarmClock alarms get the highest delivery priority on Android because they are designed to wake users up.

Why 8 hours and not shorter? Honor's PowerGenie specifically tracks AlarmClock frequency. At 15-minute intervals, it flags you as "frequently wakes your system" and kills you. At 8-hour intervals (~3/day), you fly under the radar.

fun scheduleSafetyNet(context: Context) {
    val intent = PendingIntent.getBroadcast(
        context, REQUEST_CODE_ALARMCLOCK, intent,
        PendingIntent.FLAG_UPDATE_CURRENT or PendingIntent.FLAG_IMMUTABLE
    )
    val triggerAt = System.currentTimeMillis() + SAFETY_NET_INTERVAL_MS // 8 hours
    alarmManager.setAlarmClock(
        AlarmManager.AlarmClockInfo(triggerAt, null),
        intent
    )
}

Layer 8: Exact alarm permission recovery

When the user revokes SCHEDULE_EXACT_ALARM, all pending AlarmManager chains die silently. No callback, no exception. Your watchdog, your snapshot receiver, your safety net — all gone.

Listen for ACTION_SCHEDULE_EXACT_ALARM_PERMISSION_STATE_CHANGED and re-establish everything on re-grant:

class ExactAlarmPermissionReceiver : BroadcastReceiver() {
    override fun onReceive(context: Context, intent: Intent) {
        if (canScheduleExactAlarms()) {
            ServiceWatchdogReceiver.scheduleWithBackup(context)
            MotionSnapshotReceiver.schedule(context)
        }
    }
}

Layer 9: Batched accelerometer sensing

This is the layer that surprised me most. Keep the accelerometer registered with maxReportLatencyUs during idle and deep sleep. The sensor HAL continuously samples into a hardware FIFO buffer and delivers readings via a sensor interrupt — this is completely invisible to OEM battery managers because it does not use AlarmManager, WorkManager, or any schedulable mechanism.

sensorManager.registerListener(
    batchedMotionListener,
    accelerometer,
    SensorManager.SENSOR_DELAY_NORMAL,
    maxReportLatencyUs  // 10 min in deep sleep
)

The HAL batches readings and delivers them all at once when the buffer fills or the latency expires. You get continuous motion awareness with zero wakes visible to the OEM.

One gotcha: a single SLIGHT_MOVEMENT reading (1.0-3.0 m/s^2) should not exit batched mode. Table vibrations and building micro-movements produce transient spikes. I require 3 consecutive SLIGHT_MOVEMENT readings (~15 seconds) before exiting. Anything above 3.0 m/s^2 (MODERATE_MOVEMENT) exits immediately.

Layer 10: Network restoration and app foreground triggers

CONNECTIVITY_ACTION receiver triggers a service health check when the network comes back. ProcessLifecycleOwner fires when the user opens the app. These are opportunistic — they catch edge cases where the service died during airplane mode or extended offline periods.

Layer 11: User-facing gap detection

When all 10 layers fail (and on some devices, they do), the app detects the gap and shows the user device-specific instructions: "Your [Manufacturer] phone is stopping background apps. Open Settings > Battery > [OEM-specific path] and disable optimization for How Are You?!"

This is the least satisfying layer because it requires user action. But on a few particularly aggressive OEM configurations, it is the only thing that works.

The wakelock tag problem on Honor

HwPFWService on Honor and Huawei devices maintains a whitelist of allowed wakelock tags. If your app holds a wakelock for more than 60 minutes with a tag that is not on the whitelist, HwPFWService kills your app.

The solution is embarrassingly simple: use a whitelisted tag on Honor/Huawei, your real tag everywhere else.

private val WAKELOCK_TAG: String = run {
    val manufacturer = Build.MANUFACTURER?.lowercase().orEmpty()
    if (manufacturer == "huawei" || manufacturer == "honor") {
        "LocationManagerService"  // Whitelisted by HwPFWService
    } else {
        "HowAreYou:PulseBurst"
    }
}

LocationManagerService is whitelisted because it is a system service tag. I am not proud of this, but it works.

getCurrentLocation() hangs forever

Once I had the service staying alive, I discovered a second problem: GPS does not work when you need it.

At approximately 12% battery on my Honor test device, the OEM battery saver silently killed GPS hardware access. No exception, no error callback, no log entry. The foreground service was alive, the accelerometer worked. But getCurrentLocation(PRIORITY_HIGH_ACCURACY) simply never completed. The Task from Play Services hung indefinitely — neither onSuccessListener nor onFailureListener ever fired.

The code fell back to getLastLocation(), which returned a 5-hour-old cached position from a completely different city.

Fix 1: Always timeout

Every getCurrentLocation() call must be wrapped in a coroutine timeout:

suspend fun getLocation(priority: Int): Location? {
    return withTimeoutOrNull(30_000L) {
        suspendCancellableCoroutine { cont ->
            fusedClient.getCurrentLocation(priority, token)
                .addOnSuccessListener { cont.resume(it) }
                .addOnFailureListener { cont.resume(null) }
        }
    }
}

Fix 2: Priority fallback chain

GPS hardware being dead does not mean all location sources are dead. Cell towers and Wi-Fi still work. I built a sequential fallback:

PRIORITY_HIGH_ACCURACY (GPS, ~10m)
    | timeout or null
PRIORITY_BALANCED_POWER_ACCURACY (Wi-Fi + cell, ~40-300m)
    | timeout or null
PRIORITY_LOW_POWER (cell only, ~300m-3km)
    | timeout or null
getLastLocation() (cached, any age)
    | null
TotalFailure

Each step gets its own 30-second timeout. In practice, when GPS is killed, BALANCED_POWER_ACCURACY returns in 2-3 seconds because Wi-Fi scanning still works.

Fix 3: GPS wake probe

Sometimes the GPS hardware is not permanently dead — it has been suspended by the battery manager. A brief requestLocationUpdates call can wake it:

if (hoursSinceLastFreshGps > 4) {
    val probeRequest = LocationRequest.Builder(
        Priority.PRIORITY_HIGH_ACCURACY, 1000L
    )
        .setDurationMillis(5_000L)
        .setMaxUpdates(5)
        .build()

    withTimeoutOrNull(6_000L) {
        fusedClient.requestLocationUpdates(probeRequest, callback, looper)
    }
    fusedClient.removeLocationUpdates(callback)
}

Five seconds, maximum once every 4 hours. On Honor, this recovers the GPS roughly 40% of the time.

Fix 4: Explicit outcome types

The original code returned Location?. The caller had no way to distinguish a fresh 10-meter GPS fix from a 5-hour-old cached position. I changed the return type to make the quality of data explicit:

sealed interface GpsLocationOutcome {
    data class FreshGps(val accuracy: Float) : GpsLocationOutcome
    data class WakeProbeSuccess(val accuracy: Float) : GpsLocationOutcome
    data class CellFallback(val accuracy: Float) : GpsLocationOutcome
    data class StaleLastLocation(val ageMs: Long) : GpsLocationOutcome
    data object TotalFailure : GpsLocationOutcome
}

Now the consumer can make informed decisions. A 3km cell tower reading is low precision, but it answers "is this person in the expected city or 200km away?" For a safety app, that distinction matters.

The sensor HAL lies about timestamps

At 3 AM, your app wakes up to check the accelerometer. You call registerListener(), and the sensor HAL returns data. You check event.timestamp against SystemClock.elapsedRealtimeNanos(). The delta is small. The data looks fresh.

It is not. It is 22-minute-old data sitting in the hardware FIFO buffer since the last time anyone read the sensor.

This is the normal behavior of hardware sensor FIFOs. When the CPU sleeps, the sensor continues sampling into its buffer. When you register a listener after wakeup, the HAL dumps the entire buffer contents at you. The timestamps are real (the readings were taken at those times), but the data is stale — it describes what happened 22 minutes ago, not what is happening now.

On most devices, you can catch this by comparing event.timestamp (CLOCK_BOOTTIME nanoseconds) against SystemClock.elapsedRealtimeNanos(). If the delta is large, the reading is stale.

Honor broke this assumption. On Honor devices, the HAL rebases event.timestamp on FIFO flush, so the delta check shows the data as fresh even when it is not.

The fix: flush, wait for callback, then collect

Do not trust the first readings after registerListener(). Instead:

Call sensorManager.flush(this) to drain the stale FIFO data
Wait for the onFlushCompleted() callback from SensorEventListener2
Only start collecting readings after the flush completes
Set a 1000ms fallback timer in case the HAL never fires the callback

class MotionSnapshotReceiver : BroadcastReceiver(), SensorEventListener2 {
    private var isFlushPhase = true

    override fun onSensorChanged(event: SensorEvent) {
        if (isFlushPhase) return  // Discard stale FIFO data
        collectReading(event)
    }

    override fun onFlushCompleted(sensor: Sensor?) {
        endFlushPhase(byHal = true)
    }

    private fun endFlushPhase(byHal: Boolean) {
        if (!isFlushPhase) return  // Guard against double-trigger
        isFlushPhase = false
        handler.removeCallbacks(flushFallbackRunnable)
        // Now start collecting real readings
    }
}

The fallback timer at 1000ms is important. I originally used 200ms, which was insufficient for Honor devices — their deep FIFO drains at approximately 16Hz, and a full buffer can take over 200ms to flush.

As a secondary safety net, I use dual-clock comparison: both CLOCK_BOOTTIME and CLOCK_MONOTONIC deltas must agree that the reading is fresh. If either delta exceeds 500ms of staleness, the reading is discarded.

A race condition in GPS processing

I had multiple independent trigger paths (stillness detector, smart GPS scheduler, area stability detector) that could request GPS concurrently. Two of them fired within 33 milliseconds of each other. Both read the same getLastLocation(), both passed the stationarity filter, and both inserted a GPS reading.

My code uses a minimum-readings-per-cluster filter to discard drive-through locations — a place needs at least 2 GPS readings to count as a real visit. The duplicate entry from the race condition defeated this filter. A single drive-by at 60km/h became a "cluster of 2."

The fix is a Mutex around the entire location processing path:

private val processLocationMutex = Mutex()

suspend fun processLocation(location: Location) {
    processLocationMutex.withLock {
        val lastLocation = getLastLocation()
        // The second concurrent caller now sees the just-inserted
        // location and correctly skips as duplicate
    }
}

Battery result

After all 11 layers and three tiers of power state, the battery impact is under 1% per day. The key numbers:

Before optimization: ~4,300 AlarmManager wakes per day. Every active-mode pulse (15s/30s) used AlarmManager. Every watchdog check (every 5 minutes) used AlarmManager. Honor flagged the app within hours.
After optimization: ~240 wakes per day. Active-mode pulses use Handler.postDelayed() (zero AlarmManager wakes). Watchdog intervals extended from 5 to 15 minutes. AlarmClock safety net reduced from every 15 minutes to every 8 hours.

That is a 94% reduction in system wakes while maintaining the same monitoring reliability.

The insight: aggressive scheduling wastes more battery than it saves in reliability. A three-tier power state that backs off when the device is still (active at 15-second pulses, idle at 5-minute pulses, deep sleep at 30-minute pulses with batched accelerometer as safety net) achieves both low battery impact and high reliability.

What I would do differently

Build the OEM compatibility layer first. I treated background reliability as something I would fix later. It took 40+ versions across several months to get right. It should have been the architectural foundation from Day 1.

Test on real OEM devices from the start. The Android emulator and Pixel devices tell you nothing about OEM battery management. I did not discover the Honor wakelock whitelist problem, the GPS hardware suspension, or the sensor FIFO timestamp rebasing until I tested on actual devices.

Never trust a single mechanism. Every Android background API has an OEM that breaks it. AlarmManager gets suppressed. WorkManager gets deferred. Foreground services get killed. The only reliable approach is layered redundancy where each mechanism independently tries to recover.

The app is called How Are You?! and is available on Google Play. It is still in closed testing phase — if you have an elderly parent on Android and want to try it, I would appreciate feedback, especially from OEM devices I have not tested yet. Email: developer@howareu.app

I am happy to answer questions about any of these techniques. The OEM compatibility rabbit hole goes much deeper than what I have covered here.

Top comments (3)

Still • May 3

Great article, this deserves more likes. Thanks a lot for sharing. I am in middle of the same problem where app randomly stops even though foreground service is running, will figure it out, but thanks again!

Stoyan Minchev • May 5 • Edited

I am happy that I managed to help. I will highly value any suggestions and recommendations as well.
That's why we are here, to support each other :)

You can check my other articles as well. Most of them cover other cases in the area of trying to survive OEMs' aggressiveness and peculiarities

Stoyan Minchev • Mar 23

Author here. A few things I did not fit into the article:

API key sharding: the AI behavioral analysis uses Google Gemini. Each Google Cloud project gives 10,000 requests/day free. I created 6 projects with independent keys and rotate on rate-limit
errors. That is 60,000 requests/day at zero infrastructure cost.
The app supports English, German, and Bulgarian. All user-facing strings are in resource files — no hardcoded text in Kotlin.
ActivityManager.getRunningServices() is deprecated since API 26 and unreliable on newer Android. If you need to check whether your own service is running, use a @Volatile static isRunning flag
set in onCreate/onDestroy.
SIGNIFICANT_MOTION sensor does not exist on all devices. Honor lacks it. Always have a fallback.

Happy to go deeper on any specific OEM or API if anyone is interested.