Umang Sharma

Posted on Apr 15

Stop Scrolling Perfetto Timelines: Query Your Traces with SQL and Let AI Find the Bugs

#android #performance #ai #claude

Originally published on Medium

Stop Scrolling Perfetto Timelines — Query Your Traces with SQL and Let AI Find the Bugs

Perfetto traces are secretly a SQL database. You can query them, pipe the results to Claude, and get a ranked list of exactly what's slowing your app down — in minutes, not hours.

The Problem

You captured a Perfetto trace. Your app's cold start takes forever. Now you're doing this:

🔍 Zoom in… scroll right… "what was that slice?"… zoom out… scroll left… "wait, where was I?"

The timeline UI is great for exploration. But you can't aggregate ("What's my P90 frame time?"), you can't compare across traces ("Is v8.1 slower than v8.0?"), and you can't automate anything.

Here's what most Android devs don't realize: Perfetto stores every trace in a SQLite-based query engine. You can run SQL against it. And once you have SQL output, you can feed it to AI for instant analysis.

This post walks through the entire workflow — demo app, real trace, real AI analysis.

Image: https://github.com/user-attachments/assets/c1b93001-3b34-4020-ae85-f20ee04d492d

The Demo App: 5 Intentional Bottlenecks

I built a deliberately broken app called SlowStart with 5 common startup bottlenecks:

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)

        // ❌ 1: Heavy JSON on main thread
        Trace.beginSection("SlowStart::HeavyJsonParsing")
        JSONObject(buildLargeJson(500))
        Trace.endSection()

        // ❌ 2: Sync SharedPreferences
        Trace.beginSection("SlowStart::SharedPrefBulkRead")
        prefs.edit().apply {
            repeat(200) { putString("key_$it", "val_$it") }
            commit()  // blocking! apply() is async
        }
        Trace.endSection()

        // ❌ 3: Binder IPC
        Trace.beginSection("SlowStart::PackageManagerQuery")
        packageManager.getInstalledPackages(0)
        Trace.endSection()

        setContentView(R.layout.activity_main)

        // ❌ 4: Unnecessary layout inflation
        Trace.beginSection("SlowStart::ExtraViewInflation")
        layoutInflater.inflate(R.layout.heavy_layout, null)
        Trace.endSection()

        // ❌ 5: Expensive computation
        Trace.beginSection("SlowStart::ExpensiveInit")
        (0..50_000).shuffled().toMutableList().sort()
        Trace.endSection()
    }
}

Each bottleneck is wrapped in Trace.beginSection() so it shows up as a named slice in Perfetto. Custom trace markers are your best friend — use them in your own app.

Step 1: Capture the Trace

adb shell am force-stop com.example.slowstart

echo 'buffers: { size_kb: 65536 } duration_ms: 15000
  data_sources: { config { name: "linux.ftrace" ftrace_config {
  ftrace_events: "sched/sched_switch"
  ftrace_events: "power/cpu_frequency"
  atrace_categories: "am" atrace_categories: "wm"
  atrace_categories: "view" atrace_categories: "dalvik"
  atrace_apps: "com.example.slowstart" }}}
  data_sources: { config { name: "linux.process_stats" }}' \
  | adb shell perfetto -c - --txt -o /data/misc/perfetto-traces/trace -d

sleep 3
adb shell am start -n com.example.slowstart/.MainActivity -W
sleep 15
adb pull /data/misc/perfetto-traces/trace slowstart.perfetto-trace

Pro tip: 15 seconds with targeted atrace_categories beats 60 seconds with everything enabled. Less noise = better analysis.

Image: https://github.com/user-attachments/assets/25e6875f-9f24-4ced-8430-36c776a87a24

Our trace: 520ms cold start, 72 processes, 4.3MB. LaunchState: COLD confirms a clean cold start.

Step 2: Perfetto SQL Crash Course

Setup

curl -LO https://get.perfetto.dev/trace_processor
chmod +x ./trace_processor
./trace_processor slowstart.perfetto-trace

Or use the Query (SQL) tab at ui.perfetto.dev.

The Data Model

Four tables matter. Memorize this join chain: slice → thread_track → thread → process

  slice               thread_track          thread              process
 ┌────────────┐      ┌─────────────┐      ┌──────────────┐    ┌──────────┐
 │ name       │      │ id       ◄──┼──────│              │    │          │
 │ ts         │      │ utid     ───┼──────► utid         │    │ upid  ◄──┤
 │ dur        │      └─────────────┘      │ name         │    │ pid      │
 │ track_id ──┼──►                        │ is_main_thd  │    │ name     │
 └────────────┘                           │ upid      ───┼───►└──────────┘
                                          └──────────────┘

Key rule: Always use utid/upid, never raw tid/pid — the OS reuses those.

The 7 Queries

1 — What's in this trace?

SELECT upid, pid, name FROM process
WHERE name IS NOT NULL ORDER BY name;

2 — Startup lifecycle

SELECT name, dur / 1e6 AS dur_ms FROM slice
WHERE name IN ('bindApplication', 'activityStart',
  'activityResume', 'Choreographer#doFrame')
ORDER BY ts;

3 — Find YOUR bottlenecks (the money query)

SELECT name, dur / 1e6 AS dur_ms FROM slice
WHERE name LIKE 'SlowStart::%' ORDER BY dur DESC;

Image: https://github.com/user-attachments/assets/a940c986-c012-463b-92e9-650d9f655af0

234ms out of 520ms. That's 45% of startup — self-inflicted.

Surprise: ExpensiveInit dominated at 181ms, not JSON parsing. This is why you measure instead of guess.

4 — Slowest main thread slices

SELECT s.name, s.dur / 1e6 AS dur_ms
FROM slice s
JOIN thread_track tt ON s.track_id = tt.id
JOIN thread t USING(utid)
WHERE t.is_main_thread = 1 AND s.dur > 1000000
ORDER BY s.dur DESC LIMIT 15;

Anything > 16ms blocks a frame. This query shows everything blocking your main thread.

5 — Binder transactions

SELECT name, dur / 1e6 AS dur_ms FROM slice
WHERE name LIKE 'binder%' ORDER BY dur DESC LIMIT 10;

6 — Standard Library (the cheat code)

INCLUDE PERFETTO MODULE android.startup.startups;
SELECT * FROM android_startups;

INCLUDE PERFETTO MODULE android.monitor_contention;
SELECT blocking_method, blocked_method, dur / 1e6 AS ms
FROM android_monitor_contention ORDER BY dur DESC;

The stdlib handles Android version-specific differences. Always prefer it over raw SQL.

7 — Frame jank

SELECT name, dur / 1e6 AS dur_ms,
  CASE WHEN dur > 32000000 THEN '🔴 SEVERE'
       WHEN dur > 16000000 THEN '🟡 JANK'
       ELSE '🟢 OK' END AS verdict
FROM slice WHERE name LIKE 'Choreographer#doFrame%'
ORDER BY dur DESC LIMIT 20;

Let's zoom into the timeline to see what's happening during startup:

Image: https://github.com/user-attachments/assets/14a1fddd-d953-41a3-908f-3cff6eb63d83

You can see the SlowStart:: slices clearly on the main thread. The ExpensiveInit slice at the end is visually huge — matching our SQL finding of 181ms.

Step 3: Feed It to AI

Option A: Extract + Upload (2 minutes)

A Python script runs all 13 queries and saves a single JSON:

pip install perfetto pandas
python3 extract_for_ai.py slowstart.perfetto-trace -o report.json --csv

Image: https://github.com/user-attachments/assets/c8487299-016d-4d6f-9a78-6a7584ad3c38

Upload report.json to Claude → "Analyze my Android startup. Rank bottlenecks and give Kotlin fixes."

Option B: Agentic Loop (fully automated)

Let Claude decide what queries to run:

import anthropic, re
from perfetto.trace_processor import TraceProcessor

tp = TraceProcessor(trace="slowstart.perfetto-trace")
client = anthropic.Anthropic()

SYSTEM = """You're an Android perf expert with SQL access to a Perfetto trace.
Write queries inside <sql>...</sql> tags. You'll get results back.
When done, give a final ranked analysis with fixes."""

messages = [{"role": "user", "content":
    "Analyze cold startup. Find bottlenecks, give fixes."}]

for i in range(10):
    resp = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096, system=SYSTEM, messages=messages)
    text = resp.content[0].text
    queries = re.findall(r'<sql>(.*?)</sql>', text, re.DOTALL)

    if not queries:
        print("FINAL REPORT:", text)
        break

    results = []
    for sql in queries:
        df = tp.query(sql.strip()).as_pandas_dataframe()
        results.append(f"Query: {sql}\nResult:\n{df.head(50).to_string()}")

    messages.append({"role": "assistant", "content": text})
    messages.append({"role": "user", "content": "\n\n".join(results)})

Claude autonomously explores the trace, runs 6–10 SQL queries, and produces a complete analysis with code fixes.

What AI Found (Real Data)

I uploaded trace_report.json to Claude. Here's what came back:

Image: https://github.com/user-attachments/assets/42b34848-537f-42a9-b83d-1801f307a2ad

Image: https://github.com/user-attachments/assets/4458ec80-4b21-4a3b-a933-aa4f18f7f792

What AI Caught That I'd Have Missed

Beyond the 5 obvious bottlenecks, Claude cross-referenced the data and found:

A 36ms background GC between JSON parsing and the sort — buildLargeJson() was churning allocations
30 GC events total during startup — shuffled() + sort() on 50K items = massive object churn
EmojiCompat taking 178ms on a background thread — not blocking, but burning CPU during the critical window
2 severe jank frames at 46ms each right after activityResume
20 monitor contention events in system_server, longest 7.4ms in ActivityStarter.execute()

That's the power of structured data + AI. It connects dots across tables that you'd never spot scrolling a timeline.

AI-Generated Fixes

Image: https://github.com/user-attachments/assets/bdd026f2-7cae-4636-affd-41b2d1b2e48f

Image: https://github.com/user-attachments/assets/9039a9b7-6e82-4dbc-b0ac-66c5ad0255aa

Total recoverable: 234ms — 45% of the 520ms cold start.

How Does AI Actually Decode a Trace Report?

When I asked Claude "How did you decode this?", the answer was revealing:

Anchor on android_startups — get the cold-start budget (520ms). Everything else is framed against this.
Check timestamps — only slices inside the startup window are on the critical path.
Rank by impact — main-thread slices = blocking. Background GC overlapping main thread = stall multiplier.
Cross-reference tables — the 36ms GC landed between HeavyJsonParsing and ExpensiveInit. class_verification showed JIT-compiling buildLargeJson(int). Same method. Not a coincidence.
Flag what's missing — android_binder errored in our extraction. The AI flagged this gap unprompted.

It's not magic. It's a performance engineer who reads SQL output instantly and connects dots across tables — work that takes a human 30 minutes of scrolling.

Bonus: Claude Code CLI — Skip Everything

If you use Claude Code, skip the extraction entirely:

claude "Analyze the cold startup bottlenecks in this trace and give me fixes" \
  --file slowstart.perfetto-trace

One command. Claude Code installs perfetto, queries the binary trace autonomously, and gives you ranked bottlenecks + fixes.

Why can't you upload .perfetto-trace directly to Claude chat? It's binary protobuf — LLMs can't read it. Claude Code is the exception: it has a sandbox that installs trace_processor and queries the binary directly.

Cheatsheet

SETUP:      curl -LO https://get.perfetto.dev/trace_processor
            ./trace_processor my.perfetto-trace

JOIN:       slice → thread_track → thread → process

TIMESTAMPS: dur / 1e6 = milliseconds | 16ms budget = 16,000,000 ns

STATES:     R = runnable | S = sleeping | D = I/O blocked

STDLIB:     INCLUDE PERFETTO MODULE android.startup.startups;
            INCLUDE PERFETTO MODULE android.binder;
            INCLUDE PERFETTO MODULE android.monitor_contention;

WORKFLOW:   Capture → SQL → JSON → AI → Fix

Try It Yourself

Code: github.com/umangsh28/perfetto-ai-blog

git clone https://github.com/umangsh28/perfetto-ai-blog
cd perfetto-ai-blog

# Build & install demo app, then:
echo '...' | adb shell perfetto -c - --txt \
  -o /data/misc/perfetto-traces/trace -d
sleep 3 && adb shell am start -n com.example.slowstart/.MainActivity -W
sleep 15 && adb pull /data/misc/perfetto-traces/trace slowstart.perfetto-trace

pip install perfetto pandas
python3 scripts/extract_for_ai.py slowstart.perfetto-trace
# Upload report.json to Claude — done.

Follow me for more Android performance + AI content. All code is open source — drop a comment if you want help debugging your traces.

DEV Community