Originally published on Medium
Stop Scrolling Perfetto Timelines — Query Your Traces with SQL and Let AI Find the Bugs
Perfetto traces are secretly a SQL database. You can query them, pipe the results to Claude, and get a ranked list of exactly what's slowing your app down — in minutes, not hours.
The Problem
You captured a Perfetto trace. Your app's cold start takes forever. Now you're doing this:
🔍 Zoom in… scroll right… "what was that slice?"… zoom out… scroll left… "wait, where was I?"
The timeline UI is great for exploration. But you can't aggregate ("What's my P90 frame time?"), you can't compare across traces ("Is v8.1 slower than v8.0?"), and you can't automate anything.
Here's what most Android devs don't realize: Perfetto stores every trace in a SQLite-based query engine. You can run SQL against it. And once you have SQL output, you can feed it to AI for instant analysis.
This post walks through the entire workflow — demo app, real trace, real AI analysis.
Image: https://github.com/user-attachments/assets/c1b93001-3b34-4020-ae85-f20ee04d492d
The Demo App: 5 Intentional Bottlenecks
I built a deliberately broken app called SlowStart with 5 common startup bottlenecks:
class MainActivity : AppCompatActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// ❌ 1: Heavy JSON on main thread
Trace.beginSection("SlowStart::HeavyJsonParsing")
JSONObject(buildLargeJson(500))
Trace.endSection()
// ❌ 2: Sync SharedPreferences
Trace.beginSection("SlowStart::SharedPrefBulkRead")
prefs.edit().apply {
repeat(200) { putString("key_$it", "val_$it") }
commit() // blocking! apply() is async
}
Trace.endSection()
// ❌ 3: Binder IPC
Trace.beginSection("SlowStart::PackageManagerQuery")
packageManager.getInstalledPackages(0)
Trace.endSection()
setContentView(R.layout.activity_main)
// ❌ 4: Unnecessary layout inflation
Trace.beginSection("SlowStart::ExtraViewInflation")
layoutInflater.inflate(R.layout.heavy_layout, null)
Trace.endSection()
// ❌ 5: Expensive computation
Trace.beginSection("SlowStart::ExpensiveInit")
(0..50_000).shuffled().toMutableList().sort()
Trace.endSection()
}
}
Each bottleneck is wrapped in Trace.beginSection() so it shows up as a named slice in Perfetto. Custom trace markers are your best friend — use them in your own app.
Step 1: Capture the Trace
adb shell am force-stop com.example.slowstart
echo 'buffers: { size_kb: 65536 } duration_ms: 15000
data_sources: { config { name: "linux.ftrace" ftrace_config {
ftrace_events: "sched/sched_switch"
ftrace_events: "power/cpu_frequency"
atrace_categories: "am" atrace_categories: "wm"
atrace_categories: "view" atrace_categories: "dalvik"
atrace_apps: "com.example.slowstart" }}}
data_sources: { config { name: "linux.process_stats" }}' \
| adb shell perfetto -c - --txt -o /data/misc/perfetto-traces/trace -d
sleep 3
adb shell am start -n com.example.slowstart/.MainActivity -W
sleep 15
adb pull /data/misc/perfetto-traces/trace slowstart.perfetto-trace
Pro tip: 15 seconds with targeted atrace_categories beats 60 seconds with everything enabled. Less noise = better analysis.
Image: https://github.com/user-attachments/assets/25e6875f-9f24-4ced-8430-36c776a87a24
Our trace: 520ms cold start, 72 processes, 4.3MB. LaunchState: COLD confirms a clean cold start.
Step 2: Perfetto SQL Crash Course
Setup
curl -LO https://get.perfetto.dev/trace_processor
chmod +x ./trace_processor
./trace_processor slowstart.perfetto-trace
Or use the Query (SQL) tab at ui.perfetto.dev.
The Data Model
Four tables matter. Memorize this join chain: slice → thread_track → thread → process
slice thread_track thread process
┌────────────┐ ┌─────────────┐ ┌──────────────┐ ┌──────────┐
│ name │ │ id ◄──┼──────│ │ │ │
│ ts │ │ utid ───┼──────► utid │ │ upid ◄──┤
│ dur │ └─────────────┘ │ name │ │ pid │
│ track_id ──┼──► │ is_main_thd │ │ name │
└────────────┘ │ upid ───┼───►└──────────┘
└──────────────┘
Key rule: Always use utid/upid, never raw tid/pid — the OS reuses those.
The 7 Queries
1 — What's in this trace?
SELECT upid, pid, name FROM process
WHERE name IS NOT NULL ORDER BY name;
2 — Startup lifecycle
SELECT name, dur / 1e6 AS dur_ms FROM slice
WHERE name IN ('bindApplication', 'activityStart',
'activityResume', 'Choreographer#doFrame')
ORDER BY ts;
3 — Find YOUR bottlenecks (the money query)
SELECT name, dur / 1e6 AS dur_ms FROM slice
WHERE name LIKE 'SlowStart::%' ORDER BY dur DESC;
Image: https://github.com/user-attachments/assets/a940c986-c012-463b-92e9-650d9f655af0
234ms out of 520ms. That's 45% of startup — self-inflicted.
Surprise: ExpensiveInit dominated at 181ms, not JSON parsing. This is why you measure instead of guess.
4 — Slowest main thread slices
SELECT s.name, s.dur / 1e6 AS dur_ms
FROM slice s
JOIN thread_track tt ON s.track_id = tt.id
JOIN thread t USING(utid)
WHERE t.is_main_thread = 1 AND s.dur > 1000000
ORDER BY s.dur DESC LIMIT 15;
Anything > 16ms blocks a frame. This query shows everything blocking your main thread.
5 — Binder transactions
SELECT name, dur / 1e6 AS dur_ms FROM slice
WHERE name LIKE 'binder%' ORDER BY dur DESC LIMIT 10;
6 — Standard Library (the cheat code)
INCLUDE PERFETTO MODULE android.startup.startups;
SELECT * FROM android_startups;
INCLUDE PERFETTO MODULE android.monitor_contention;
SELECT blocking_method, blocked_method, dur / 1e6 AS ms
FROM android_monitor_contention ORDER BY dur DESC;
The stdlib handles Android version-specific differences. Always prefer it over raw SQL.
7 — Frame jank
SELECT name, dur / 1e6 AS dur_ms,
CASE WHEN dur > 32000000 THEN '🔴 SEVERE'
WHEN dur > 16000000 THEN '🟡 JANK'
ELSE '🟢 OK' END AS verdict
FROM slice WHERE name LIKE 'Choreographer#doFrame%'
ORDER BY dur DESC LIMIT 20;
Let's zoom into the timeline to see what's happening during startup:
Image: https://github.com/user-attachments/assets/14a1fddd-d953-41a3-908f-3cff6eb63d83
You can see the SlowStart:: slices clearly on the main thread. The ExpensiveInit slice at the end is visually huge — matching our SQL finding of 181ms.
Step 3: Feed It to AI
Option A: Extract + Upload (2 minutes)
A Python script runs all 13 queries and saves a single JSON:
pip install perfetto pandas
python3 extract_for_ai.py slowstart.perfetto-trace -o report.json --csv
Image: https://github.com/user-attachments/assets/c8487299-016d-4d6f-9a78-6a7584ad3c38
Upload report.json to Claude → "Analyze my Android startup. Rank bottlenecks and give Kotlin fixes."
Option B: Agentic Loop (fully automated)
Let Claude decide what queries to run:
import anthropic, re
from perfetto.trace_processor import TraceProcessor
tp = TraceProcessor(trace="slowstart.perfetto-trace")
client = anthropic.Anthropic()
SYSTEM = """You're an Android perf expert with SQL access to a Perfetto trace.
Write queries inside <sql>...</sql> tags. You'll get results back.
When done, give a final ranked analysis with fixes."""
messages = [{"role": "user", "content":
"Analyze cold startup. Find bottlenecks, give fixes."}]
for i in range(10):
resp = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096, system=SYSTEM, messages=messages)
text = resp.content[0].text
queries = re.findall(r'<sql>(.*?)</sql>', text, re.DOTALL)
if not queries:
print("FINAL REPORT:", text)
break
results = []
for sql in queries:
df = tp.query(sql.strip()).as_pandas_dataframe()
results.append(f"Query: {sql}\nResult:\n{df.head(50).to_string()}")
messages.append({"role": "assistant", "content": text})
messages.append({"role": "user", "content": "\n\n".join(results)})
Claude autonomously explores the trace, runs 6–10 SQL queries, and produces a complete analysis with code fixes.
What AI Found (Real Data)
I uploaded trace_report.json to Claude. Here's what came back:
Image: https://github.com/user-attachments/assets/42b34848-537f-42a9-b83d-1801f307a2ad
Image: https://github.com/user-attachments/assets/4458ec80-4b21-4a3b-a933-aa4f18f7f792
What AI Caught That I'd Have Missed
Beyond the 5 obvious bottlenecks, Claude cross-referenced the data and found:
-
A 36ms background GC between JSON parsing and the sort —
buildLargeJson()was churning allocations -
30 GC events total during startup —
shuffled()+sort()on 50K items = massive object churn - EmojiCompat taking 178ms on a background thread — not blocking, but burning CPU during the critical window
-
2 severe jank frames at 46ms each right after
activityResume -
20 monitor contention events in
system_server, longest 7.4ms inActivityStarter.execute()
That's the power of structured data + AI. It connects dots across tables that you'd never spot scrolling a timeline.
AI-Generated Fixes
Image: https://github.com/user-attachments/assets/bdd026f2-7cae-4636-affd-41b2d1b2e48f
Image: https://github.com/user-attachments/assets/9039a9b7-6e82-4dbc-b0ac-66c5ad0255aa
Total recoverable: 234ms — 45% of the 520ms cold start.
How Does AI Actually Decode a Trace Report?
When I asked Claude "How did you decode this?", the answer was revealing:
-
Anchor on
android_startups— get the cold-start budget (520ms). Everything else is framed against this. - Check timestamps — only slices inside the startup window are on the critical path.
- Rank by impact — main-thread slices = blocking. Background GC overlapping main thread = stall multiplier.
-
Cross-reference tables — the 36ms GC landed between
HeavyJsonParsingandExpensiveInit.class_verificationshowed JIT-compilingbuildLargeJson(int). Same method. Not a coincidence. -
Flag what's missing —
android_bindererrored in our extraction. The AI flagged this gap unprompted.
It's not magic. It's a performance engineer who reads SQL output instantly and connects dots across tables — work that takes a human 30 minutes of scrolling.
Bonus: Claude Code CLI — Skip Everything
If you use Claude Code, skip the extraction entirely:
claude "Analyze the cold startup bottlenecks in this trace and give me fixes" \
--file slowstart.perfetto-trace
One command. Claude Code installs perfetto, queries the binary trace autonomously, and gives you ranked bottlenecks + fixes.
Why can't you upload .perfetto-trace directly to Claude chat? It's binary protobuf — LLMs can't read it. Claude Code is the exception: it has a sandbox that installs trace_processor and queries the binary directly.
Cheatsheet
SETUP: curl -LO https://get.perfetto.dev/trace_processor
./trace_processor my.perfetto-trace
JOIN: slice → thread_track → thread → process
TIMESTAMPS: dur / 1e6 = milliseconds | 16ms budget = 16,000,000 ns
STATES: R = runnable | S = sleeping | D = I/O blocked
STDLIB: INCLUDE PERFETTO MODULE android.startup.startups;
INCLUDE PERFETTO MODULE android.binder;
INCLUDE PERFETTO MODULE android.monitor_contention;
WORKFLOW: Capture → SQL → JSON → AI → Fix
Try It Yourself
Code: github.com/umangsh28/perfetto-ai-blog
git clone https://github.com/umangsh28/perfetto-ai-blog
cd perfetto-ai-blog
# Build & install demo app, then:
echo '...' | adb shell perfetto -c - --txt \
-o /data/misc/perfetto-traces/trace -d
sleep 3 && adb shell am start -n com.example.slowstart/.MainActivity -W
sleep 15 && adb pull /data/misc/perfetto-traces/trace slowstart.perfetto-trace
pip install perfetto pandas
python3 scripts/extract_for_ai.py slowstart.perfetto-trace
# Upload report.json to Claude — done.
Follow me for more Android performance + AI content. All code is open source — drop a comment if you want help debugging your traces.
Top comments (0)