DEV Community

Cover image for I shipped a release to 10% of users. Crash rate started climbing. I found out 20 minutes later when I happened to open the Play Console.
Yasser's studio
Yasser's studio

Posted on

I shipped a release to 10% of users. Crash rate started climbing. I found out 20 minutes later when I happened to open the Play Console.

I shipped a release to 10% of users. Crash rate started climbing. I found out 20 minutes later when I happened to open the Play Console.

That's too late. At 10%, you're already affecting real users. By the time you halt the rollout, investigate, and push a fix, you've lost reviews and trust you won't get back.

The frustrating part: the data was there the whole time. Google's API had my crash rate. I just wasn't checking it before the rollout progressed.

The gap in most Android CI pipelines

A typical Android release pipeline does three things:

  1. Build the AAB
  2. Run unit tests
  3. Upload to the Play Store

That's it. The pipeline considers itself done after the upload.

But "shipped" and "healthy" are not the same thing. Nothing in that pipeline asks: is the app actually in good shape right now? Is it safe to keep rolling out?

The answer to both questions is sitting in the Play Developer Reporting API. It just needs to be checked.

gpc vitals crashes --threshold

gpc vitals crashes --threshold 2.0

That's the gate. If the crash rate exceeds 2.0%, GPC exits with code 6. Your CI sees a non-zero exit code. The pipeline stops.

Same for ANR:

gpc vitals anr --threshold 0.47

0.47% is Google's own "bad behavior" threshold for ANR. Exceed it and Play Console flags your app. Gate on it in CI and you find out before Google does.

What exit code 6 means

GPC uses semantic exit codes. Exit code 6 specifically means "quality threshold breached" -- not a crash, not an auth failure, not a network error. Your CI can react differently to each:

gpc vitals crashes --threshold 2.0 --json
EXIT_CODE=$?

case $EXIT_CODE in
0) echo "Vitals healthy" ;;
6) echo "Threshold breached -- halting rollout" ;;
3) echo "Auth error -- check your service account" ;;
4) echo "API error -- check permissions" ;;
esac

Google's actual thresholds

For reference, here are the levels Google Play uses internally:

┌────────────┬──────────────────────────┬─────────────────────────────────────────┐
│ Metric │ Google warning threshold │ What happens │
├────────────┼──────────────────────────┼─────────────────────────────────────────┤
│ Crash rate │ 1.09% │ Play Console warning, visibility impact │
├────────────┼──────────────────────────┼─────────────────────────────────────────┤
│ ANR rate │ 0.47% │ Play Console warning, visibility impact │
└────────────┴──────────────────────────┴─────────────────────────────────────────┘

A practical approach: set your CI gate at 1.5x the Google threshold. That gives you a window to catch regressions before Google flags your app.

# Conservative: catch it before Google does
gpc vitals crashes --threshold 1.5

# Standard: roughly 2x Google's threshold
gpc vitals crashes --threshold 2.0

# Emergency: last line of defense
gpc vitals crashes --threshold 3.0

In CI: one step before the rollout increase

The pattern that works best is running a vitals gate before each rollout percentage increase, not just before the initial upload.

# GitHub Actions

  • name: Check crash rate before rollout increase
    run: gpc vitals crashes --threshold 2.0 --json

  • name: Increase rollout to 25%
    run: gpc releases rollout increase --track production --to 25

If the vitals check exits 6, the increase step never runs. No manual intervention needed.

For a full gate covering both crash and ANR:

  • name: Vitals gate
    run: |
    gpc vitals crashes --threshold 2.0 --json
    gpc vitals anr --threshold 0.47 --json

  • name: Increase rollout to 50%
    run: gpc releases rollout increase --track production --to 50

Handling a breach

When a threshold is breached, you probably want to do more than just fail the job. You want to halt the active rollout too:

  • name: Vitals gate
    id: vitals
    run: gpc vitals crashes --threshold 2.0 --json
    continue-on-error: true

  • name: Halt rollout on breach
    if: steps.vitals.outcome == 'failure'
    run: |
    echo ":⚠️:Crash threshold breached -- halting rollout"
    gpc releases rollout halt --track production --json

Now the pipeline detects the breach, stops the rollout automatically, and surfaces a warning in your CI logs.

The full staged rollout pattern

This is how I structure releases now. Vitals gate at every step:

Upload AAB to internal track
|
v
Wait 24-48h for data
|
v
Vitals gate -----> BREACH? --> Halt, alert, fix
|
| PASS
v
Increase to 10%
|
v
Wait 24-48h
|
v
Vitals gate -----> BREACH? --> Halt, alert, fix
|
| PASS
v
Increase to 50%
|
v
Wait 24-48h
|
v
Vitals gate -----> BREACH? --> Halt, alert, fix
|
| PASS
v
Complete rollout (100%)

Bad releases get caught early, at low rollout percentages, before they affect most of your users.

The overview command for a quick health check

If you want a full picture before you start a rollout:

gpc vitals overview

{
"crashRate": { "value": 1.2, "threshold": "bad" },
"anrRate": { "value": 0.3, "threshold": "good" },
"slowStartRate": { "value": 5.1, "threshold": "acceptable" },
"slowRenderingRate": { "value": 2.8, "threshold": "good" },
"excessiveWakeupRate": { "value": 0.1, "threshold": "good" },
"stuckWakelockRate": { "value": 0.05, "threshold": "good" }
}

Six metrics in one call. Add it as a pre-release sanity check before any upload.

Quick context on GPC

If you're new to this series: GPC is a CLI that covers the entire Google Play Developer API. 215 endpoints. 7 TypeScript packages. 1,874 tests.

Previous articles:

  • Part 1: I built a CLI that covers the entire Google Play Developer API
  • Part 2: I replaced 4 Play Console tabs with one terminal command
  • Part 3: I stopped submitting to Google Play without running this first

Try it

# npm
npm install -g @gpc-cli/cli

# Homebrew
brew install yasserstudio/tap/gpc

# Standalone binary (no Node.js required)
curl -fsSL https://raw.githubusercontent.com/yasserstudio/gpc/main/scripts/install.sh | sh

Then:

gpc auth login
gpc vitals crashes --threshold 2.0

Docs | GitHub | Vitals gates docs

Free to use. Code is on GitHub.

Does your release pipeline gate on app health before increasing rollout? Curious what thresholds people use in practice.

Top comments (0)