DEV Community

Cover image for Battle-Tested Coroutines: Advanced Tactics & Common Traps
Kavearhasi Viswanathan
Kavearhasi Viswanathan

Posted on

Battle-Tested Coroutines: Advanced Tactics & Common Traps

You understand the why and the how of coroutines. You know about suspend functions and structured concurrency. Now comes the hard part: using them in the real world, where things don't always go according to plan. This is where the subtle details can lead to the most frustrating bugs.

And one of the most insidious bugs is the one that doesn't crash your app. It just silently fails.


The Ghost in the Machine: A Tale of Silent Failure

Imagine this scenario, one that has played out in development teams everywhere. You're building a screen that needs to load two pieces of data concurrently: the user's profile and their account settings. To be efficient, you fire off two parallel network requests. The feature works perfectly.

A month later, a bug report comes in. Sometimes, the settings just don't load. There are no crashes in the logs, no ANR reports. The progress bar just spins forever. The data never arrives. It's a ghost.

After hours of debugging, you find the culprit. The settings API started failing, but the exception was never logged. The coroutine that was supposed to fetch it died silently, and the rest of the app was completely unaware. This is the classic pitfall of misusing the async coroutine builder.


Two Flavors of Failure: launch vs. async

To understand how this silent failure happens, you need to know that launch and async handle exceptions in fundamentally different ways. It's not just about what they return; it's about how they report failure.

launch: The Loud Failure

The launch builder is for "fire-and-forget" work. You use it when you don't need a result back. Because of this, it treats exceptions as critical and uncaught. An exception thrown inside a launch block will immediately propagate up to its parent Job, which typically cancels the parent and all its other children. It fails loudly.

viewModelScope.launch { // Parent Coroutine
    launch { // Child 1
        delay(500)
        throw Exception("Something went wrong in Child 1!")
    }

    launch { // Child 2
        delay(1000)
        // This line will never be reached. The exception in Child 1
        // will cancel the parent scope, which cancels this coroutine.
        println("Child 2 finished.")
    }
}
Enter fullscreen mode Exit fullscreen mode

This is a safe default. A failure in one part of the system brings down the related components, preventing the app from continuing in a broken state.

async: The Conditional Failure

The async builder is different. It's designed to return a result via a Deferred object. But here's the critical detail that trips up developers: async only encapsulates exceptions when it's a direct child of a supervisor context.

In a regular scope, async behaves just like launch—exceptions propagate immediately to the parent. But when you combine async with a SupervisorJob or supervisorScope, the exception gets trapped inside the Deferred object. The exception doesn't go anywhere else... until you call .await().

When you call .await(), the Deferred will do one of two things: return the successful result, or re-throw the stored exception.

And this is the trap. If you use async inside a supervisor context but forget to call .await() or wrap it in a try-catch, any exception that occurs will be stored but never handled.

// The source of our "ghost" bug
viewModelScope.launch {
    supervisorScope { // The supervisor is the key here
        // We start the async job but never "await" its result
        val settingsDeferred = async {
            api.fetchSettings() // This call throws an exception!
        }

        // Because we're in a supervisorScope and .await() is never called,
        // the exception is stored in settingsDeferred but is never thrown.
        // The coroutine dies silently, and the app just hangs.
    }
}
Enter fullscreen mode Exit fullscreen mode

But without that supervisor? The behavior is completely different:

// This will NOT fail silently
viewModelScope.launch {
    // No supervisor here - just a regular scope
    async {
        api.fetchSettings() // Exception propagates immediately!
    }
    // The entire viewModelScope will be cancelled by this exception.
}
Enter fullscreen mode Exit fullscreen mode

Key Takeaway: The "silent failure" trap only happens when you combine async with supervisor contexts and then fail to call .await(). Always pair async with .await() inside a try-catch block if you want to handle failures correctly. And be mindful of whether you're working inside a supervisor context or not—it fundamentally changes how exceptions behave.


When One Failure Shouldn't Sink the Ship

The default "all-for-one" failure policy of a standard Job is safe, but it's not always what you want. What if you're managing a set of independent tasks, where the failure of one shouldn't affect the others?

This is where a SupervisorJob comes in.

Think of a standard Job as an old string of Christmas lights: when one bulb fails, the whole string goes out. A SupervisorJob is like a modern set of lights: one bulb can fail, but all the others stay lit.

A SupervisorJob modifies the exception propagation rule. When a direct child of a supervisor fails, the failure is not propagated to the parent. The supervisor and its other children keep running.

The best and safest way to use this is with the supervisorScope builder. It creates a self-contained scope where children don't cancel their siblings.

// Using supervisorScope to isolate failures
viewModelScope.launch {
    supervisorScope {
        launch {
            // This child will fail...
            throw Exception("Task 1 failed!")
        }

        launch {
            // ...but this child will continue running independently.
            delay(500)
            println("Task 2 completed successfully.") // This will print!
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

supervisorScope is perfect for UI components that launch independent jobs, like fetching data for different parts of a screen.


Choosing Your Weapon: withContext vs. async

A very common point of confusion is when to use withContext(Dispatchers.IO) versus async(Dispatchers.IO). They both run work on a background thread, but their intent is completely different.

  • Use withContext when you need to perform a single task on a different thread and get its result back. It's for sequential operations that just need a context switch. Think of it as saying, "Go do this one thing in the other room and bring me back the result."

    suspend fun fetchAndParseJson(url: String): JsonObject {
        // The perfect use case for withContext: do one job on the IO dispatcher.
        return withContext(Dispatchers.IO) {
            val rawJson = URL(url).readText()
            JsonParser.parseString(rawJson).asJsonObject
        }
    }
    
  • Use async when you have multiple tasks that you want to run in parallel. Its purpose is concurrency. Think of it as saying, "You start the fetching, you start the processing, and I'll wait for both of you to be done."

    suspend fun loadDashboard() = coroutineScope {
        // The perfect use case for async: parallel decomposition.
        val userDeferred = async { api.fetchUser() }
        val newsDeferred = async { api.fetchNews() }
    
        // Now we wait for both concurrent jobs to finish.
        val user = userDeferred.await()
        val news = newsDeferred.await()
        showDashboard(user, news)
    }
    

Using async for a single task is overkill; withContext is simpler and more semantically correct.


Confessions of a Coroutine Developer: Common Traps to Avoid

Finally, here are a few classic anti-patterns that every developer should know how to spot and avoid. We've all been tempted by them at some point.

  1. The GlobalScope Trap: Using GlobalScope.launch is almost always a mistake in application code. Coroutines launched in this scope are not tied to any component's lifecycle. They live for the entire application lifetime. If they hold a reference to a View or ViewModel, you've just created a massive memory leak. The rule is simple: always launch coroutines in a lifecycle-aware scope, like viewModelScope or lifecycleScope.

  2. The runBlocking UI Freeze: The name says it all. runBlocking is a bridge between blocking and non-blocking worlds, and it works by blocking the current thread until its coroutine completes. If you call this on the Android main thread, your UI will freeze, and you'll get an "Application Not Responding" (ANR) error. Its only place is in unit tests or at the very top level of a main function.

  3. Inefficient Context Switching: withContext is cheap, but not free. Calling it repeatedly inside a tight loop is inefficient.

    // BAD: Switching context on every single iteration
    for (item in hugeList) {
        withContext(Dispatchers.Default) {
            processItem(item)
        }
    }
    
    // GOOD: Switch context once for the entire block of work
    withContext(Dispatchers.Default) {
        for (item in hugeList) {
            processItem(item)
        }
    }
    

Writing battle-tested code is about knowing not just the happy path, but the failure modes and the subtle traps.


What's the trickiest coroutine bug you've ever had to track down? Share your war story in the comments below!

Top comments (0)