Last week I was profiling a Swift app with Instruments. Standard procedure: xctrace record, xctrace export, copy the XML to Claude Code's context, ask it to find the hotspots.
Claude responds: "The XML is too large, I can't process it reliably."
33,553 lines of XML. For a program with two functions.
The real problem
xctrace export is a fantastic tool. It gives you everything: every sample, every backtrace, every frame with its binary, memory address, and UUID. It's exhaustive, precise, and complete.
And that's exactly the problem.
When I profile an app to find bottlenecks, I don't need the 3,044 individual samples. I don't need to know that sample number 1,847 caught the CPU at address 0x1027ec9a8 in libswiftCore.dylib at 00:02.847.882. I need to know that heavyWork() consumes 70% of the time and lightWork() takes 30%.
What I need is ten lines, not thirty-three thousand.
Why XML is the right format (but the noise isn't)
Before anyone says "the problem is using XML in 2026": not really.
XML is the ideal format for what xctrace does. Think about it:
- Hierarchical: a backtrace is a tree of frames. A sample contains a backtrace, a thread, a process. XML models this naturally.
- Self-describing: every element has a name, typed attributes, and the structure is validatable. You don't have to guess what field 7 means in a CSV line.
-
Elegant deduplication: xctrace uses an
id/refsystem where it defines a frame once (id="59" name="heavyWork()") then references it withref="59". It's basically a serialized flyweight pattern. -
Processable with standard tools: XPath,
xmllint,xml.etree.ElementTree... you don't need a proprietary parser.
xctrace's XML isn't bloat. It's structured information that Instruments needs to reconstruct interactive call trees, compare runs, filter by thread and process. It's designed for a GUI tool that can expand and collapse nodes.
The problem appears when you try to fit that information into an LLM's context window. It's like trying to read the complete Don Quixote to find the windmill quote. The information is there, but the signal-to-noise ratio is devastating.
The solution: ztrace
So I built ztrace. A Python script that takes a .trace bundle and produces a compact summary.
The idea is simple:
- Run
xctrace export --tocto get metadata (process, duration, template) - Run
xctrace export --xpathto extract thetime-profiletable - Parse the XML resolving the
id/refsystem - Filter system frames (everything living in
/usr/lib/or/System/) - Aggregate by function and generate the summary
Step 3 is more important than it seems. xctrace doesn't repeat the complete definition of a frame every time it appears in a backtrace. It defines it once with id="59" then uses ref="59". If you don't resolve the refs, you lose most of the information.
The result
With the test fixture (a trivial program with heavyWork() at ~70% and lightWork() at ~30%):
$ ztrace summary sample.trace
Process: hotspot Duration: 3.8s Template: Time Profiler
Samples: 3044 Total CPU: 3044ms
SELF TIME
69.4% 2113ms hotspot heavyWork()
29.7% 905ms hotspot lightWork()
TOTAL TIME (callers with significant overhead)
99.9% 3041ms main
CALL STACKS
69.4% 2113ms main > heavyWork()
29.7% 904ms main > lightWork()
From 33,553 lines to 13. All the information an LLM needs to tell you "optimize heavyWork(), it takes 70% of the CPU" fits in a tweet.
What it filters (and why)
Not everything xctrace reports is actionable. When I profile an app, I can't optimize libdispatch.dylib. I can't rewrite dyld4::PrebuiltLoader::loadDependents. Those frames are noise if what I'm looking for are hotspots in my code.
ztrace filters through several layers:
System binaries: everything living in /usr/lib/ or /System/ gets discarded. These are OS and Swift runtime frames.
Runtime internals: functions like __swift_instantiateConcreteTypeFromMangledNameV2 or DYLD-STUB$$sin are technically in your binary (statically linked), but they're not code you wrote. Out.
Unresolved symbols: production apps (Spotify, for example) are stripped. Frames appear as raw addresses like 0x104885404. ztrace filters them and warns you: "85% of user samples have no symbols". So you know the profile has data but you need the dSYMs to extract value.
Testing it with real apps
The fixture is nice but artificial. Does it work with a real app? I tested it with Ghostty (the terminal emulator):
Process: ghostty Duration: 3.8s Template: Time Profiler
Samples: 295 Total CPU: 295ms
SELF TIME
53.2% 157ms ghostty main
3.7% 11ms ghostty renderer.metal.RenderPass.begin
3.1% 9ms ghostty renderer.generic.Renderer(renderer.Metal).rebuildCells
2.7% 8ms ghostty renderer.generic.Renderer(renderer.Metal).drawFrame
2.4% 7ms ghostty renderer.generic.Renderer(renderer.Metal).updateFrame
2.0% 6ms ghostty heap.PageAllocator.alloc
1.7% 5ms ghostty terminal.page.Page.clonePartialRowFrom
1.7% 5ms ghostty font.shaper.coretext.Shaper.shape
This is actionable. You immediately see: the Metal renderer (render pass, rebuild cells, draw frame) and font shaping are where the time goes. If you were optimizing Ghostty, you'd know exactly where to start.
And each function comes with its module (ghostty), so in an app with multiple frameworks you'd know whether the bottleneck is in your code or a dependency.
The stack (and why not Swift)
The original CLAUDE.md said Swift. "Consistent with the use case," I thought. After seeing that 95% of the work is parsing XML and formatting text, I switched to Python.
xml.etree.ElementTree parses XML in three lines. In Swift, XMLParser is pure SAX — callbacks, mutable state, delegates. A hack for something that should be "give me the tree and let me navigate."
Also: you distribute a Python script with uv tool install. A Swift binary only works on macOS/arm64. And given that xctrace only exists on macOS, "cross-platform" isn't an argument in Swift's favor here. But distribution with uv is infinitely cleaner than compiling and copying binaries.
What's next
This is v0.1. What's missing:
-
ztrace record: record and summarize in one command (convenience, not urgent) - Configurable filters: exclude specific modules, adjust call stack depth
- Trace comparison: before/after optimization, in diff format
- Allocations support: not just CPU, also memory
The repo is on GitHub if you want to try it.
Integrating it into daily workflow
The beauty of ztrace isn't running it manually. It's having Claude Code use it automatically when profiling.
Add this to your CLAUDE.md (global or project-specific):
### Profiling (xctrace)
- Use `ztrace summary <file.trace>` to read traces. NEVER read raw XML from xctrace export.
- Flow: `xctrace record` → `ztrace summary`
- Flags: `--threshold 0.5` (more functions), `--depth 10` (deeper stacks)
And from there, every time Claude Code needs to profile something, the flow is:
# 1. Record
xctrace record --template 'Time Profiler' --time-limit 5s --launch -- .build/debug/MyApp
# 2. Summarize (10 lines that fit in context)
ztrace summary MyApp.trace
# 3. Claude reads the summary and suggests optimizations
Without ztrace, step 2 would generate 30,000 lines of XML that either blow up the context window or drown the signal in noise. With ztrace, Claude has exactly what it needs to tell you "70% of CPU is in heavyWork(), line 42 of Renderer.swift".
The meta point
ztrace exists because LLMs are bad at processing raw data at scale. They're good at reasoning about processed, compact data. Giving Claude 33,000 lines of XML is like giving a doctor an MRI in raw DICOM format and asking for a diagnosis. The doctor needs the rendered image, not the bytes.
Next time an LLM tells you "the output is too large," the solution isn't a model with more context. It's a better summary. A pipeline that transforms raw data into actionable information before it reaches the model.
Which in the end is what we engineers do: convert noise into signal. With or without AI involved.
Top comments (0)