Kento IKEDA for AWS Community Builders

Posted on May 30 • Edited on Jun 20 • Originally published at zenn.dev

"Reinstalling Won't Fix It": A Cross-App Shared-Auth Deadlock After Switching Phones

#android #adb #mobile #tutorial

After migrating to a new Android phone, a few specific apps stopped launching. Amazon Shopping and Kindle would freeze on a blank white or black screen for a while, then close on their own. Reinstalling, clearing storage, updating the OS — none of it helped. Going through the usual support steps changed nothing.

What finally fixed it was clearing the storage of every Amazon app at once. Tracing the cause through ADB logs, it turned out that authentication data shared across multiple apps had become inconsistent, and the auth-retrieval step at startup was deadlocking.

The incident itself happened with a specific Pixel-and-Amazon combination, but structurally it's a pattern that can hit any app where "authentication data shared across apps" meets "many subsystems initialized in parallel at startup for speed." It's worth knowing about whether you design SDKs, build apps, or handle ops and support, so I'm leaving it here as a case study.

Note: This happened on my own device, with my own account. I'm reading the diagnostic output that the OS itself wrote out — not decompiling any app.

What happened after switching phones

Right after migrating data to a Pixel 10a, only certain apps refused to launch.

Affected: Amazon Shopping, Kindle
Symptom: open the app, it freezes on a blank white or black screen for about ten seconds, then closes on its own
Everything else: other apps work fine

The officially suggested remedies are generic — "reinstall the app," "clear the cache," "update the OS" — and none of them worked. When the root cause is in the OS or the data migration rather than the app itself, the standard support path has a hard time isolating it.

At first I couldn't even tell whether it was an OS problem, an app problem, or an Amazon account problem.

When nothing works, change the framing

I tried all the standard fixes.

Reinstalling the affected apps
Clearing storage of just the affected apps
Clearing the cache of the affected apps
Updating every app in the Play Store
Updating the OS
Restarting the device

The affected apps still wouldn't launch, no matter what.

The key realization: as long as you think of it as "a problem with one app," you won't fix it. Reinstalling just the affected app, or clearing just that app's storage, changes nothing. If that's the case, the cause probably isn't contained within a single app.

Only once you reframe it as "a problem across a group of apps" does the solution come into view.

Hypothesis: the shared authentication data is corrupted

Let me start with a hypothesis built only from public information. I'll back it up with logs in the second half.

Amazon's Login with Amazon SDK has a mechanism that lets other apps reuse the Amazon Shopping app's logged-in state. This is documented officially.

https://developer.amazon.com/docs/login-with-amazon/customer-experience-android.html

According to the docs, if the user is already signed in to the Amazon Shopping app, an app that integrates Login with Amazon won't ask them to re-enter account details — the SDK recognizes and reuses the auth state of the Amazon Shopping app or the Fire OS device. That's single sign-on (SSO). The SDK's internal package name is com.amazon.identity.auth.map.device, which also appears in Amazon's official migration guide.

https://developer.amazon.com/docs/login-with-amazon/upgrade-android-sdk.html

What we can say from this is that Amazon's authentication layer (referred to internally in the SDK as MAP) is designed so that other apps can reference the Amazon Shopping app's logged-in state. What the docs directly describe is SSO for apps using Login with Amazon, but it's reasonable to think that Amazon's own apps such as Kindle and Prime Video share the same auth layer too.

The hypothesis, then: during data migration, only part of the authentication data was carried over in a corrupted state, and the startup process that tries to fetch that shared data is getting stuck. If that's right, it explains why nothing short of wiping the apps that hold the shared data will fix it.

That said, specifics like "the first-installed app holds the auth data as the representative" or "only one particular app is the source" can't be asserted from public information. I'll check how solid the hypothesis is by reading the ANR trace in the second half.

The fix: clear storage for the whole group of related apps at once

Here's the fix up front. The concrete steps:

Open Settings > Apps > See all XX apps
List every Amazon app installed
For each one, run Storage & cache > Clear storage
Once they're all cleared, restart the device
Then open Amazon Shopping or Kindle — if a login screen appears, you're good

Examples of apps to target:

Amazon Shopping
Kindle
Amazon Prime Video
Amazon Music
Amazon Photos
Amazon Alexa

The important point is that what you need to wipe is not "the app that's failing" but "every app that shares the authentication data." In my case, the one that finally did it was clearing Prime Video's storage. It was an app I barely ever opened, and until I cleared it, clearing the other Amazon apps did nothing. It may well have been the source of the shared data.

Migration tools restore apps automatically from the old device's app list. As a result, you can end up with Amazon apps you haven't used in ages — ones you've forgotten you ever installed. In the user's mind it's an "app I don't use," but in the authentication-sharing network it's a full-fledged node, and the corrupted data sitting there drags down the apps you are launching. The instinct to "only clear the app that's failing" or "only clear the apps I actually use" backfires here.

Verifying the hypothesis with the ANR trace

Now the main part. Let's verify whether this really is a shared-auth deadlock, using the ANR (Application Not Responding) stack trace.

The first thing to establish: what's killing the app is an "ANR," not a "crash." A crash throws an exception and the process drops immediately; an ANR is the system force-closing the app after the main thread has failed to respond for some time (roughly several seconds to ten). Freezing on a blank screen and then closing is the classic ANR symptom — not an exception, but a timeout while waiting for a response.

Since this was happening on my own device with my own account, I connected the Pixel to a Mac over ADB and pulled the diagnostic log (the stack trace) that the OS wrote out when the ANR occurred. Again, I'm not decompiling the app — just reading the diagnostic output the OS left behind.

adb shell dumpsys dropbox --print data_app_anr | \
  grep -A 200 "Process: com.amazon.mShop.android.shopping"

The "DropBox" in dumpsys dropbox refers to DropBoxManager, the Android system-log mechanism that stores diagnostic entries (crashes, ANRs, and so on) over time. It has nothing to do with the cloud storage service of the same name. --print data_app_anr pulls only the entries tagged as app ANRs, filtered here by Amazon Shopping's process name.

The trace recorded several threads running in parallel at startup. The key part: they were waiting on each other's locks. Let's read them in order.

main thread (tid=1): the UI itself, stuck

The main thread was stuck while running a startup task called AndroidComponentDetectTask.

"main" prio=5 tid=1 Blocked
  at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(...)
  - waiting to lock <0x00eb4d79> held by thread 37
  at com.amazon.platform.service.ServiceRegistryImpl.getService(...)
  at com.amazon.mShop.appStart.AndroidComponentDetectTask.apply(...)
  ...
  at android.app.ActivityThread.handleBindApplication(...)

It's trying to acquire a lock, <0x00eb4d79>, and waiting for thread 37 to release it. This lock is one the Service Registry (the common registry where each subsystem registers and retrieves itself) takes internally when fetching or creating a service. On Android the main thread is the UI thread, so when it stops here, nothing gets drawn and the screen stays blank.

thread 36: collateral damage waiting on the same lock

The error-reporting init task (thread 36) was waiting on the exact same lock as main.

"StagedExecutor2-pool-19-thread-1" prio=5 tid=36 Blocked
  at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(...)
  - waiting to lock <0x00eb4d79> held by thread 37
  at com.amazon.platform.service.ServiceRegistryImpl.getService(...)
  at com.amazon.mShop.sam.log.SAMLogManager.initialize(...)
  at com.amazon.mShop.errorReporting.ErrorReporter.startSession(...)

Also waiting on <0x00eb4d79>. This Service Registry lock is a congestion point that multiple threads fight over at startup.

thread 37: the culprit, holding a lock while waiting on auth data

The problem thread is thread 37. It was holding <0x00eb4d79> (the Service Registry lock) while trying to acquire another lock, <0x004a4835>, and getting stuck.

"StagedExecutor3-pool-20-thread-1" prio=5 tid=37 Blocked
  - waiting to lock <0x004a4835> held by thread 62
  at com.amazon.identity.auth.device.api.MAPAccountManager.getAccount(...)
  at com.amazon.mShop.minerva.MinervaWrapperMAPClient.fetchAndSetAccountAttributeForTeen(...)
  at com.amazon.mShop.minerva.MinervaWrapperMAPClient.<init>(...)
  at com.amazon.mShop.minerva.MinervaWrapperServiceImpl.initializeMinervaClientIfNeeded(...)
  at com.amazon.platform.service.ServiceRegistryImpl.instantiateService(...)
  at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(...)
  - locked <0x00eb4d79>

Reading bottom to top, the sequence is:

The Service Registry takes its internal lock <0x00eb4d79> to create a service (at this moment, main and thread 36 are made to wait)
Still holding that lock, it proceeds into initializing the metrics SDK (Minerva) client
Inside that, it calls MAPAccountManager.getAccount to fetch the currently logged-in account
The auth SDK (MAP) tries to take another lock, <0x004a4835>, internally
But that lock is held by thread 62 and never comes back

Thread 37 sits in the decisive position that triggers the deadlock: holding the Service Registry lock while frozen waiting for auth data. Because it won't release the lock it holds, main and thread 36 — which are waiting on it — stall in turn.

thread 27: another one waiting on the auth lock

On top of that, thread 27 (Weblab, fetching A/B-test flags) was also waiting on the same auth lock <0x004a4835> as thread 37.

"StagedExecutor1-pool-15-thread-1" prio=5 tid=27 Blocked
  - waiting to lock <0x004a4835> held by thread 62
  at com.amazon.identity.auth.device.api.MultipleAccountManager.getAccountForMapping(...)
  at com.amazon.mShop.sso.SSOUtil.getCurrentAccountFromDisk(...)
  at com.amazon.mShop.core.features.weblab.WeblabServiceImpl.getTreatmentAndCacheForAppStartWithTrigger(...)

Weblab also needs auth information at startup, and via getAccountForMapping it's waiting on the same auth lock to be released. Note that thread 37 reaches the lock through MAPAccountManager and thread 27 through MultipleAccountManager — two different APIs converging on one internal lock.

The big picture: auth-data retrieval is where every task converges

Laid out, the dependencies look like this:

What stands out is that the tasks meant to run in parallel at startup (metrics, error reporting, A/B testing, component detection) all ultimately converge on a single point: "fetching the MAP auth data." Minerva and Weblab are supposed to be independent features, yet somewhere in initialization each one reaches for the same auth SDK to find out "who is logged in right now."

That auth-data retrieval never returns, because the shared data is corrupted. Every task that needs auth stalls; and because the task holding the Service Registry lock has stalled, even tasks unrelated to auth (main, error reporting) get dragged down. That's the full chain that leaves the screen blank until the ANR fires.

Following thread 62 — the one stuck while holding the auth lock — it was sending a query to another process via a ContentProvider and waiting for the response. A ContentProvider is Android's mechanism for sharing data between apps, and Amazon's apps appear to use it to pass authentication data around. It seems thread 62 was stuck holding the auth lock because one of the sharing sources never returned a response. Which app, and why it didn't respond, can't be pinned down from this trace alone. But the structure — "go fetch the shared auth data from the source, and it never comes back" — is consistent with the fact that wiping every Amazon app's storage fixed it.

Strictly speaking, this isn't a circular wait where two threads grab each other's locks (the textbook deadlock). It's a hang: a thread holding a lock freezes waiting on an external process, and the threads waiting on it stall in a chain. But since the outcome — "stuck holding a lock, with everyone waiting on it blocked forever" — is no different from a deadlock, I'm calling it a deadlock in this article.

The design pitfalls this case reveals

The textbook lessons — "acquire locks in a consistent order," "don't block the main thread" — apply here too, of course. But what the trace really surfaced is the pitfall that emerges when well-intentioned design decisions pile up. Parallelization for speed, auth lookups for functionality, cross-app data sharing for convenience. Each is reasonable on its own, but stacked together they become the following three pitfalls.

Pitfall 1: parallel-init speedups backfire on shared resources

Initializing subsystems in parallel to speed up startup looks like a correct optimization. Indeed, the trace recorded several init tasks running concurrently on separate threads — metrics, error reporting, A/B testing, component detection.

The thing is, many of them internally call the same shared operations: "register with the Service Registry" and "fetch the current logged-in account." Even run in parallel, they end up serialized on the shared resource's lock. On its own that just makes startup slower — but when the thread holding a lock stalls on something else, everyone waiting gets swept up all at once, as happened here.

Parallelization aimed at speed becomes effectively serial under shared-resource contention, and in the worst case deadlocks. The most dangerous spot is the assumption that "it parallelized, so it must be faster." When you add startup tasks, you have to look at how each one touches shared resources (registry, auth, settings store) as a set — otherwise you not only fail to get the scaling benefit, you raise the odds of a deadlock.

Pitfall 2: auth has become an implicit dependency of every feature

The most surprising thing in the trace was that both metrics and A/B testing — features that look unrelated to auth — were reaching for "who is logged in right now" during initialization. Metrics wants to attach user attributes; A/B testing wants to bucket by account. The reasons are each fair enough, but the result is that the auth SDK has become an implicit dependency point for the entire app.

When auth-data retrieval jams at a single point, it's not auth itself that stops — it's every feature that referenced auth, stalling in a chain. You need to recognize that auth isn't "a concern around the login screen" but "a critical path of the entire startup sequence." If you count how many subsystems call auth-data retrieval at startup in your own app, the number may be higher than you'd imagine.

Pitfall 3: ownership of shared data goes adrift during migration

A design where multiple apps share authentication data is convenient for users — sign in once and you don't need to log in again on the other apps. The problem is that "who owns this shared data, and who fixes it when it breaks" is left implicit.

Suppose there's an implicit rule like "the first-installed app is the representative." If migration doesn't reproduce the install order or state, the ownership relationship goes adrift. The owner sits there holding corrupted data while other apps go to reference it. The fact that nothing was fixed until I cleared Prime Video this time may have this ownership ambiguity in the background. Shared data needs a fallback — another app taking over, or safely regenerating the data — for when the owner disappears or its data breaks.

Lessons for support and for users

Even if you're not in a position to change the design, knowing this structure changes how fast you can respond.

For support: keep in mind that "please reinstall" only works when the problem is contained within a single app. For a post-migration report of "only certain apps won't launch," suspect that migration left the shared data corrupted, and being able to offer the next move — "clear storage for the related apps as a group" — changes the opening response. Even just asking "did this start right after switching phones?" up front can sometimes narrow the investigation considerably.

For users: think of the cleanup target as "every app from the same provider," not "the app that's giving you trouble." Keeping in mind that even an app you don't use is a node in the sharing network, and that corruption there causes collateral damage, raises your odds of getting out of it on your own.

Other cases where the same pattern can occur

It's not just Amazon — there are plenty of designs where multiple apps share authentication information.

Sharing auth tokens across apps via Android's standard AccountManager
Sharing login information across same-signature apps through a ContentProvider
Groups of apps with a common account platform (cross-app login across several apps from the same company, for example)

When these combine with a design that "initializes many subsystems in parallel at startup," the same conditions line up as in this case: when the shared data breaks, every app chain-fails to launch, and a single reinstall won't fix it. If your own app group meets these two conditions, it's worth checking once how you guarantee consistency across a device migration, and how you degrade or regenerate when the sharing source breaks.

Closing

If you run into apps not launching after switching phones, first try "if a single clear doesn't fix it, clear the group of related apps." That's the shortest fix from the user's side. When the sharing source is broken, you have to wipe the source along with the rest.

From the design side, the three pitfalls this trace surfaced are worth remembering: parallel init can backfire on shared resources; auth tends to become an implicit critical path for every feature; and ownership of shared data goes adrift during migration. Each is a "well-intentioned design" on its own, yet combined they produce an app that won't start.

"Please reinstall" only holds up as a universal fix for designs contained within a single app. For apps that hold shared data and carry a complex startup sequence, the same symptom recurs even after reinstalling. Just knowing this one structure makes a real difference in how fast you respond the next time you hit the same incident.

Top comments (1)

Harjot Singh • Jun 1

really interesting take on how shared auth can lead to deadlocks. it's a reminder of how complex app interactions can get. speaking of app building, with moonshift, you can spin up a full next.js + postgres + auth setup in about 7 minutes, and you own the code on your github. if you're curious, happy to offer a free run to see how it works.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.