DEV Community

Cover image for A single Kotlin lambda silently broke my app for 21 hours - and I only found the bug because someone crossed a border
Stoyan Minchev
Stoyan Minchev

Posted on

A single Kotlin lambda silently broke my app for 21 hours - and I only found the bug because someone crossed a border

I build a safety-critical Android app that monitors elderly people living alone. It watches their phone 24/7 — motion, GPS, screen activity — and emails their family when something looks wrong. No buttons to press, no wearable to charge. Install it on grandma's phone and forget about it.

I took a trip from Bulgaria to Romania in early April to test the app in real conditions and have a small vacation with my family. I drove across the Danube at the Vidin - Calafat bridge. Everything was working fine. Then at 14:55, the app went completely silent.

Not crashed. Not killed by the OS. Silent.

For the next 21 hours and 42 minutes, the motion sensor recorded 682 events. The GPS hardware was acquiring satellite fixes with 11-meter accuracy. The app was running, awake, doing its job. But not a single location reached the database.

The next morning, the AI looked at the last known position — a border crossing — and the 12-hour data gap, and did what it was designed to do: it sent an URGENT alert. Except I was fine. I was in Craiova, 200km away, sleeping in a hotel. The alert was anchored to a stale coordinate from the previous afternoon.

I spent two days tracing this. The root cause was one line of Kotlin.

The interface that lies to you

Android's Geocoder class converts GPS coordinates into street addresses. On API 33+, there's an async callback version:

geocoder.getFromLocation(latitude, longitude, 1) { addresses ->
    // do something with the result
}
Enter fullscreen mode Exit fullscreen mode

That trailing lambda is Kotlin's SAM (Single Abstract Method) conversion. It looks clean. It compiles. It works perfectly — until it doesn't.

The interface behind this lambda is Geocoder.GeocodeListener:

public interface GeocodeListener {
    void onGeocode(@NonNull List<Address> addresses);

    default void onError(@Nullable String errorMessage) { }
}
Enter fullscreen mode Exit fullscreen mode

See that second method? onError has a default empty implementation. When you use a SAM lambda, Kotlin only implements the single abstract method — onGeocode. The default onError stays empty.

So what happens when geocoding fails? Network timeout. No roaming data after crossing a border. Play Services killed by the OEM battery manager. Any of a dozen things that go wrong on real Android devices in real countries.

The framework calls onError(). The empty default runs. Nothing happens. The continuation is never resumed. The coroutine hangs forever.

Why it killed everything, not just geocoding

If the geocoder had hung in isolation, it would have been a minor bug — one address lookup fails, you move on. But my code looked like this:

processLocationMutex.withLock {
    val address = reverseGeocode(latitude, longitude)  // hangs here
    insertLocationData(location, address)
}
Enter fullscreen mode Exit fullscreen mode

The processLocationMutex exists for a good reason. Four independent systems can trigger a GPS write at the same time — the stillness detector, the periodic scheduler, the force probe, and the area stability detector. Without the mutex, they race on the stationarity filter and insert duplicate rows that defeat the drive-through filtering logic.

But when reverseGeocode() hung, the mutex was held forever. Every subsequent GPS fix from every trigger path called processLocation(), tried to acquire the mutex, and blocked. Behind a coroutine that would never wake up.

No exception. No crash. No log entry. Just a growing queue of frozen coroutines, each holding a perfectly good satellite fix that would never reach the database.

The motion sensor kept firing. The GPS kept acquiring. The diagnostic logs show two successful HIGH_ACCURACY fixes at 21:37 and 21:38 — 11-meter accuracy, acquired in 2.5 seconds — both of which entered processLocation() and silently queued behind the hung mutex holder from 7 hours earlier.

The only recovery was killing the process

At 12:19 the next day — almost 22 hours after the hang started — I force-stopped the app from Android settings. The process died. The singleton mutex died with it. On restart, everything worked again.

But by then, the damage was done. The AI had already sent a false URGENT alert based on 12-hour-old coordinates. And a weekly re-calibration job had run during the trip, learning the border crossing drive-through as a "frequent location," which caused a cascade of further false alerts over the following days.

One hung lambda. One stale coordinate. Days of downstream consequences.

The fix has three layers

I don't trust single fixes for problems that can kill 21 hours of data.

Layer 1: Explicit object, both methods implemented.

val listener = object : Geocoder.GeocodeListener {
    override fun onGeocode(addresses: MutableList<Address>) {
        if (!hasResumed && continuation.isActive) {
            hasResumed = true
            continuation.resume(formatAddress(addresses.firstOrNull()))
        }
    }

    override fun onError(errorMessage: String?) {
        if (!hasResumed && continuation.isActive) {
            hasResumed = true
            continuation.resume(null)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

No SAM conversion. Both callbacks resume the continuation. The hasResumed flag guards against the race where both fire, or either fires after timeout.

Layer 2: Hard timeout ceiling.

withTimeoutOrNull(10_000L) {
    suspendCancellableCoroutine<String?> { continuation ->
        geocoder.getFromLocation(latitude, longitude, 1, listener)
    }
}
Enter fullscreen mode Exit fullscreen mode

Even if some future Android version adds a third callback method with another empty default, the coroutine dies after 10 seconds.

Layer 3: Geocoding moved outside the mutex.

// Geocoding is slow and can hang — never inside the mutex
val address = reverseGeocodingService.reverseGeocode(lat, lng)

// Only the database insert is protected (50ms critical section, not 10s+)
val acquired = withTimeoutOrNull(60_000L) {
    processLocationMutex.withLock {
        insertLocationData(location, address)
    }
}
Enter fullscreen mode Exit fullscreen mode

The mutex timeout is a tripwire. If something else wedges the lock in the future, we log a diagnostic error and drop the fix rather than queuing forever.

What I actually learned

SAM conversion is not a convenience. It's a contract you didn't read. When you write a trailing lambda, you're implementing one method and accepting the defaults for everything else. If those defaults are no-ops, you've written code that silently drops errors. The compiler won't warn you. The IDE won't flag it. It works perfectly until it doesn't.

The scary part is that GeocodeListener isn't unusual. Android has dozens of interfaces with default error methods. WebViewClient.onReceivedError() has a default. MediaPlayer.OnErrorListener has patterns where partial implementation looks complete. Every SAM-converted lambda on an interface with default methods is a potential silent failure.

Mutexes amplify hangs into outages. A 10-second geocoding timeout would have been invisible — one null address, one row without a street name, nobody notices. But a mutex turned a local hang into a system-wide 21-hour data loss. If you're using a mutex to serialize writes, the critical section should contain only writes. Anything that touches the network, the filesystem, or a third-party service belongs outside the lock.

Silent failures are worse than crashes. If the geocoder had thrown an exception, I would have found it in the first hour. Instead, it hung — producing no error, no log, no crash report. The only evidence was the absence of data in a database table. In a safety-critical app that monitors whether elderly people are still moving, silence is the most dangerous failure mode there is.


The app is called "How Are You?! Senior Safety" — soon it will be released, once I am confident, that there are no bad surprises popping up. Have you ever been bitten by a default interface method you didn't know existed?

Top comments (0)