Fernando Rodriguez

Posted on Apr 30 • Originally published at frr.dev

33,000 lines of XML to tell you heavyWork() is slow: how I tamed xctrace for LLMs

#xctrace #instruments #profiling #llm

Last week I was profiling a Swift app with Instruments. Standard procedure: xctrace record, xctrace export, copy the XML to Claude Code's context, ask it to find the hotspots.

Claude responds: "The XML is too large, I can't process it reliably."

33,553 lines of XML. For a program with two functions.

The real problem

xctrace export is a fantastic tool. It gives you everything: every sample, every backtrace, every frame with its binary, memory address, and UUID. It's exhaustive, precise, and complete.

And that's exactly the problem.

When I profile an app to find bottlenecks, I don't need the 3,044 individual samples. I don't need to know that sample number 1,847 caught the CPU at address 0x1027ec9a8 in libswiftCore.dylib at 00:02.847.882. I need to know that heavyWork() consumes 70% of the time and lightWork() takes 30%.

What I need is ten lines, not thirty-three thousand.

Why XML is the right format (but the noise isn't)

Before anyone says "the problem is using XML in 2026": not really.

XML is the ideal format for what xctrace does. Think about it:

Hierarchical: a backtrace is a tree of frames. A sample contains a backtrace, a thread, a process. XML models this naturally.
Self-describing: every element has a name, typed attributes, and the structure is validatable. You don't have to guess what field 7 means in a CSV line.
Elegant deduplication: xctrace uses an id/ref system where it defines a frame once (id="59" name="heavyWork()") then references it with ref="59". It's basically a serialized flyweight pattern.
Processable with standard tools: XPath, xmllint, xml.etree.ElementTree... you don't need a proprietary parser.

xctrace's XML isn't bloat. It's structured information that Instruments needs to reconstruct interactive call trees, compare runs, filter by thread and process. It's designed for a GUI tool that can expand and collapse nodes.

The problem appears when you try to fit that information into an LLM's context window. It's like trying to read the complete Don Quixote to find the windmill quote. The information is there, but the signal-to-noise ratio is devastating.

The solution: ztrace

So I built ztrace. A Python script that takes a .trace bundle and produces a compact summary.

The idea is simple:

Run xctrace export --toc to get metadata (process, duration, template)
Run xctrace export --xpath to extract the time-profile table
Parse the XML resolving the id/ref system
Filter system frames (everything living in /usr/lib/ or /System/)
Aggregate by function and generate the summary

Step 3 is more important than it seems. xctrace doesn't repeat the complete definition of a frame every time it appears in a backtrace. It defines it once with id="59" then uses ref="59". If you don't resolve the refs, you lose most of the information.

The result

With the test fixture (a trivial program with heavyWork() at ~70% and lightWork() at ~30%):

$ ztrace summary sample.trace

Process: hotspot  Duration: 3.8s  Template: Time Profiler
Samples: 3044  Total CPU: 3044ms

SELF TIME
   69.4%    2113ms  hotspot  heavyWork()
   29.7%     905ms  hotspot  lightWork()

TOTAL TIME (callers with significant overhead)
   99.9%    3041ms  main

CALL STACKS
   69.4%    2113ms  main > heavyWork()
   29.7%     904ms  main > lightWork()

From 33,553 lines to 13. All the information an LLM needs to tell you "optimize heavyWork(), it takes 70% of the CPU" fits in a tweet.

What it filters (and why)

Not everything xctrace reports is actionable. When I profile an app, I can't optimize libdispatch.dylib. I can't rewrite dyld4::PrebuiltLoader::loadDependents. Those frames are noise if what I'm looking for are hotspots in my code.

ztrace filters through several layers:

System binaries: everything living in /usr/lib/ or /System/ gets discarded. These are OS and Swift runtime frames.

Runtime internals: functions like __swift_instantiateConcreteTypeFromMangledNameV2 or DYLD-STUB$$sin are technically in your binary (statically linked), but they're not code you wrote. Out.

Unresolved symbols: production apps (Spotify, for example) are stripped. Frames appear as raw addresses like 0x104885404. ztrace filters them and warns you: "85% of user samples have no symbols". So you know the profile has data but you need the dSYMs to extract value.

Testing it with real apps

The fixture is nice but artificial. Does it work with a real app? I tested it with Ghostty (the terminal emulator):

Process: ghostty  Duration: 3.8s  Template: Time Profiler
Samples: 295  Total CPU: 295ms

SELF TIME
   53.2%     157ms  ghostty  main
    3.7%      11ms  ghostty  renderer.metal.RenderPass.begin
    3.1%       9ms  ghostty  renderer.generic.Renderer(renderer.Metal).rebuildCells
    2.7%       8ms  ghostty  renderer.generic.Renderer(renderer.Metal).drawFrame
    2.4%       7ms  ghostty  renderer.generic.Renderer(renderer.Metal).updateFrame
    2.0%       6ms  ghostty  heap.PageAllocator.alloc
    1.7%       5ms  ghostty  terminal.page.Page.clonePartialRowFrom
    1.7%       5ms  ghostty  font.shaper.coretext.Shaper.shape

This is actionable. You immediately see: the Metal renderer (render pass, rebuild cells, draw frame) and font shaping are where the time goes. If you were optimizing Ghostty, you'd know exactly where to start.

And each function comes with its module (ghostty), so in an app with multiple frameworks you'd know whether the bottleneck is in your code or a dependency.

The stack (and why not Swift)

The original CLAUDE.md said Swift. "Consistent with the use case," I thought. After seeing that 95% of the work is parsing XML and formatting text, I switched to Python.

xml.etree.ElementTree parses XML in three lines. In Swift, XMLParser is pure SAX — callbacks, mutable state, delegates. A hack for something that should be "give me the tree and let me navigate."

Also: you distribute a Python script with uv tool install. A Swift binary only works on macOS/arm64. And given that xctrace only exists on macOS, "cross-platform" isn't an argument in Swift's favor here. But distribution with uv is infinitely cleaner than compiling and copying binaries.

What's next

This is v0.1. What's missing:

ztrace record: record and summarize in one command (convenience, not urgent)
Configurable filters: exclude specific modules, adjust call stack depth
Trace comparison: before/after optimization, in diff format
Allocations support: not just CPU, also memory

The repo is on GitHub if you want to try it.

Integrating it into daily workflow

The beauty of ztrace isn't running it manually. It's having Claude Code use it automatically when profiling.

Add this to your CLAUDE.md (global or project-specific):

### Profiling (xctrace)

- Use `ztrace summary <file.trace>` to read traces. NEVER read raw XML from xctrace export.
- Flow: `xctrace record` → `ztrace summary`
- Flags: `--threshold 0.5` (more functions), `--depth 10` (deeper stacks)

And from there, every time Claude Code needs to profile something, the flow is:

# 1. Record
xctrace record --template 'Time Profiler' --time-limit 5s --launch -- .build/debug/MyApp

# 2. Summarize (10 lines that fit in context)
ztrace summary MyApp.trace

# 3. Claude reads the summary and suggests optimizations

Without ztrace, step 2 would generate 30,000 lines of XML that either blow up the context window or drown the signal in noise. With ztrace, Claude has exactly what it needs to tell you "70% of CPU is in heavyWork(), line 42 of Renderer.swift".

The meta point

ztrace exists because LLMs are bad at processing raw data at scale. They're good at reasoning about processed, compact data. Giving Claude 33,000 lines of XML is like giving a doctor an MRI in raw DICOM format and asking for a diagnosis. The doctor needs the rendered image, not the bytes.

Next time an LLM tells you "the output is too large," the solution isn't a model with more context. It's a better summary. A pipeline that transforms raw data into actionable information before it reaches the model.

Which in the end is what we engineers do: convert noise into signal. With or without AI involved.

DEV Community