Performance issues are often misunderstood. Many teams assume profiling is only useful for shaving off a few milliseconds to reduce infrastructure costs at scale. While that can be true, this narrow view misses the real power of profiling. In reality, profiling can uncover deep, systemic inefficiencies that dramatically affect user experience.
This is exactly what happened with Sentry’s AI Autofix.
By dogfooding Sentry’s own profiling tools, the team identified unexpected performance bottlenecks that were slowing Autofix by tens of seconds per interaction—not milliseconds. With targeted fixes and no architectural overhaul, Autofix became significantly faster, more responsive, and production-ready.
This article explains what went wrong, how profiling exposed the root causes, and why profiling is essential for modern AI-driven products.
Why Speed Matters for Developers and Users
Speed is not a luxury feature—it’s a core part of user experience. Developers expect tools to respond quickly, especially when those tools promise to accelerate debugging.
Sentry has always focused on improving developer workflows through observability. When introducing generative AI, the goal was not automation for its own sake, but collaboration. That vision led to Autofix, an AI agent that analyzes issues, understands the codebase, and works with developers to suggest accurate fixes.
Just like well-designed onboarding flows or carefully crafted email templates, Autofix is meant to reduce friction and save time. But during internal testing and early adopter usage, something felt off.
Cold starts were slow. Individual Autofix runs were taking 10–20 seconds longer than expected. For a tool designed to speed up debugging, this was a serious problem.
If Autofix launched in this state, it would feel incomplete and frustrating—no matter how intelligent the AI was
The Challenge of Debugging AI Performance
When performance issues appear in AI-powered systems, the cause is rarely obvious.
Was the slowdown coming from:
- A third-party LLM API?
- Task queue delays in Celery?
- Network latency or infrastructure bottlenecks?
- Poor concurrency handling?
Each of these possibilities could require weeks of investigation or major architectural changes.
Instead of guessing, the team turned to what they already had: Sentry Profiling.
What Is Profiling and Why It Matters
Profiling is the process of collecting detailed runtime data about how your application actually executes in production. Unlike logs or metrics, profiling shows where time is being spent at the function and thread level.
Sentry supports multiple types of profiling:
CPU Profiling
CPU profiling shows how much time each function spends executing on the CPU. It helps identify:
- Inefficient loops
- Blocking operations
- Redundant function calls
- Expensive initialization logic
Browser Profiling
Browser profiling focuses on frontend performance, capturing:
- Page load timing
- Rendering delays
- Long tasks on the main thread
Mobile Profiling
Mobile profiling provides similar insights for native and hybrid apps, tracking screen-level and function-level performance.
All of this data is visualized using flame graphs.
How to Read a Flame Graph (Quick Guide)
Flame graphs can look intimidating, but they’re incredibly powerful once you understand them:
- X-axis (Time): The width of a block represents how long a function ran.
- Y-axis (Call Stack): Shows which functions called which.
- Colors: Help distinguish system code from application code, packages, or modules.
In Sentry, profiles can be viewed:
- Individually
- Aggregated across many runs
- As differential profiles to spot regressions
Profiles can also be correlated with distributed traces, giving a full picture across CPU, network, and application layers.
Problem 1: Cold Starts Were Slowing Autofix Down
The Issue
Autofix had noticeable startup lag. Every time it began a root cause analysis or coding step, users experienced several seconds of delay before meaningful output appeared.
In distributed traces, one Celery task consistently stood out—sometimes taking 40+ seconds, far above the average.
This wasn’t just a perception problem. It was measurable, repeatable, and user-facing.
The Root Cause (Found via Profiling)
Opening a real production profile immediately revealed the problem.
A method called from_repo_definition, responsible for initializing a GitHub repository client, was being called over and over again:
- While processing stack traces
- When loading code for analysis
- Every time Autofix accessed a file
Each call added over one second of latency because it made HTTP requests to GitHub behind the scenes.
Without profiling, this behavior would have been almost invisible.
The Fix
The solution was simple and surgical:
- Standardize repo access through a single method
- Add caching using Python’s @functools.cache
@functools.cache
def from_repo_definition(cls, repo_def, type):
...
This ensured the repository client was initialized once per run, not repeatedly.
The Result
After deployment:
- Repo initialization dropped from multiple calls to one
- Cold starts became significantly faster
- Autofix’s first suggestion appeared in ~2 seconds instead of ~10
Across a full run, this saved:
- ~1 second per file access
- ~1.5 seconds before each Autofix step
- ~5 seconds before the coding step even began
Problem 2: Unnecessary Thread Blocking
The Issue
Encouraged by the first win, the team examined profiles from Autofix’s coding step.
Another issue jumped out immediately:
A 6-second thread wait at the end of each run.
Even after the AI had finished generating the final answer, users were still waiting.
The Root Cause
Autofix generates intermediate “insights” in parallel to keep users informed during execution. These insights run on a separate thread.
However:
- A new insight generation task was being started even when the run was ending
- Only one worker thread was available
- All threads had to finish before returning the final result
This caused unnecessary blocking right at the worst possible moment.
The Fix
Two small changes fixed the issue:
- Avoid starting insight generation when the run is about to end
- Increase thread workers from 1 to 2 if completion.message.tool_calls: # only generate insights if the run is still ongoing
ThreadPoolExecutor(max_workers=2)
The Result
- The 6-second delay was eliminated
- Thread joins dropped to under 400ms
- Final answers reached users immediately
As a bonus, insight quality improved by removing redundant summaries.
The Overall Impact
By fixing just two hidden inefficiencies, the Autofix team saved an estimated:
~20 seconds per Autofix run
- No infrastructure changes.
- No AI model swaps.
- No complex redesign.
Just profiling + targeted fixes.
Backend performance graphs showed dramatic drops in slow transactions—some runs improved by up to 80 seconds.
Why Profiling Is Not Optional Anymore
Without profiling:
- These issues could have shipped to production
- First impressions would suffer
- Debugging would rely on guesswork, logs, and assumptions
With profiling:
- The problems were found in minutes
- The fixes took less than an hour
- The impact was immediate and measurable
Profiling isn’t about shaving milliseconds anymore. It’s about understanding real execution behavior in real user environments—especially in complex, AI-driven systems.
Final Thoughts
Sentry Profiling turned runtime behavior into actionable insight. It exposed inefficiencies that would have been nearly impossible to detect manually and enabled meaningful improvements with minimal effort.
If you care about:
- Developer experience
- User trust
- AI product performance
- Shipping fast and shipping right
Then profiling isn’t optional—it’s essential.
Fixing 10–20 seconds of lag in minutes isn’t theoretical.
It’s exactly what happened here.
Top comments (0)