---
title: "Android Baseline Profiles: the CI Pipeline That Cut Cold Start by 35%"
published: true
description: "Custom Macrobenchmark journeys, Cloud Profile delivery, and a CI tracing pipeline that reduced cold launch time by 35% across device tiers."
tags: android, kotlin, mobile, performance
canonical_url: https://blog.mvpfactory.co/android-baseline-profiles-ci-pipeline-cold-start
---
## What we're building
By the end of this tutorial, you'll have a CI-integrated pipeline that generates custom Baseline Profiles against your real startup graph, validates them with `profman`, and gates releases on cold-start regression thresholds. This is the setup that took our cold launch from sluggish to 35% faster across device tiers.
## Prerequisites
- Android project with Macrobenchmark module configured
- Gradle managed devices or a physical test device
- R8 enabled for release builds
- Familiarity with ART compilation basics
## Step 1: Understand why default profiles underperform
The default `BaselineProfileGenerator` test covers `Activity.onCreate()` through first frame rendered. That's it. It misses your Dagger/Hilt injection graph, initial network prefetch, and every lazy-initialized singleton your app touches in the first 2 seconds.
In my experience, the default profile typically covers 40–60% of the methods executed during a real cold start. The remaining methods get interpreted or JIT-compiled at runtime — exactly the penalty Baseline Profiles exist to eliminate.
| Compilation Mode | Methods Covered | Cold Start Impact |
|---|---|---|
| No profile (interpret + JIT) | 0% pre-compiled | Baseline |
| Default Baseline Profile | ~50% of startup methods | 15–20% improvement |
| Custom journey profile | ~85% of startup methods | 30–40% improvement |
| Cloud Profile (aggregated) | ~75% across user segments | 25–35% improvement |
Here is the gotcha that will save you hours: DEX layout optimization depends on class loading order. When the profiler sees your full initialization graph, ART can colocate hot classes within the same memory pages, meaning fewer page faults. That's where the real win hides.
## Step 2: Write custom Macrobenchmark startup journeys
Let me show you a pattern I use in every project. Model what your actual users do in the first 5 seconds:
kotlin
@get:Rule
val benchmarkRule = MacrobenchmarkRule()
@test
fun startupWithAuthAndFeed() {
benchmarkRule.measureRepeated(
packageName = TARGET_PACKAGE,
metrics = listOf(StartupTimingMetric()),
iterations = 10,
startupMode = StartupMode.COLD,
) {
pressHome()
startActivityAndWait()
// Wait for Dagger graph + initial API response
device.wait(Until.hasObject(By.res("feed_list")), 5_000)
// Scroll to trigger RecyclerView prefetch
device.findObject(By.res("feed_list")).scroll(Direction.DOWN, 2f)
}
}
This forces the profiler to record methods across dependency injection, network deserialization, and RecyclerView layout — all hot paths the default generator misses entirely.
## Step 3: Layer Cloud Profile delivery
Google Play aggregates anonymized runtime profiles from users and delivers them as Cloud Profiles to new installs. They take 1–2 weeks to propagate after a release and reflect the average user journey, not your optimized one.
Here is the minimal setup to get this working — ship a custom Baseline Profile in your APK/AAB for immediate benefit, and let Cloud Profiles fill coverage gaps over time:
kotlin
baselineProfile {
automaticGenerationDuringBuild = true
saveInSrc = true
mergeIntoMain = true
}
## Step 4: Build the CI tracing pipeline
Run this pipeline on every release branch across three device tiers: low-end (2GB RAM), mid-range, and flagship.
| Pipeline Stage | Tool | Output |
|---|---|---|
| Profile generation | Macrobenchmark + Gradle managed devices | `baseline-prof.txt` |
| Profile validation | `profman --dump` | Method coverage report |
| Startup measurement | Macrobenchmark `StartupTimingMetric` | P50/P90 cold start (ms) |
| Regression gate | Custom Gradle task | Fail build if P50 regresses >5% |
The validation step matters more than people think. Running `profman --dump-classes-and-methods` against your compiled profile lets you verify that method references actually resolve in the current DEX files. Without this, you're flying blind.
## Gotchas
**R8 silently breaks your profiles.** The docs don't mention this, but R8 can rename, inline, or remove methods that your Baseline Profile references. When that happens, ART silently ignores the stale entries. Zero benefit, zero warnings.
The fix: generate profiles *after* R8 processing. In your CI pipeline, the order must be:
1. Build release APK (R8 runs)
2. Install optimized APK on test device/emulator
3. Run Macrobenchmark profile generator against installed APK
4. Extract and embed the resulting profile
Reversing steps 1 and 3 is the single most common mistake I see. It produces profiles that look valid but match nothing at runtime. Maddening to debug.
**Device-tier blindness.** A profile that shaves 200ms on a Pixel might do almost nothing on a low-RAM device where memory pressure dominates. Always measure across tiers.
## Wrapping up
Write custom Macrobenchmark startup journeys that cover your real initialization graph: DI, network, first meaningful content. Default generators leave 40%+ of hot methods uncompiled. Always generate profiles after R8, validate with `profman --dump` in CI, and enforce P50/P90 thresholds so regressions don't slip through. That's the pipeline that got us to 35%.
Top comments (0)