Hannah Ward

Posted on Jan 31

How Profiling Helped Fix Slowness in Sentry’s AI Autofix

#ai #security #email #openai

Performance issues are often misunderstood. Many teams assume profiling is only useful for shaving off a few milliseconds to reduce infrastructure costs at scale. While that can be true, this narrow view misses the real power of profiling. In reality, profiling can uncover deep, systemic inefficiencies that dramatically affect user experience.

This is exactly what happened with Sentry’s AI Autofix.

By dogfooding Sentry’s own profiling tools, the team identified unexpected performance bottlenecks that were slowing Autofix by tens of seconds per interaction—not milliseconds. With targeted fixes and no architectural overhaul, Autofix became significantly faster, more responsive, and production-ready.

This article explains what went wrong, how profiling exposed the root causes, and why profiling is essential for modern AI-driven products.

Why Speed Matters for Developers and Users

Speed is not a luxury feature—it’s a core part of user experience. Developers expect tools to respond quickly, especially when those tools promise to accelerate debugging.

Sentry has always focused on improving developer workflows through observability. When introducing generative AI, the goal was not automation for its own sake, but collaboration. That vision led to Autofix, an AI agent that analyzes issues, understands the codebase, and works with developers to suggest accurate fixes.

Just like well-designed onboarding flows or carefully crafted email templates, Autofix is meant to reduce friction and save time. But during internal testing and early adopter usage, something felt off.

Cold starts were slow. Individual Autofix runs were taking 10–20 seconds longer than expected. For a tool designed to speed up debugging, this was a serious problem.

If Autofix launched in this state, it would feel incomplete and frustrating—no matter how intelligent the AI was

The Challenge of Debugging AI Performance

When performance issues appear in AI-powered systems, the cause is rarely obvious.

Was the slowdown coming from:

A third-party LLM API?
Task queue delays in Celery?
Network latency or infrastructure bottlenecks?
Poor concurrency handling?

Each of these possibilities could require weeks of investigation or major architectural changes.

Instead of guessing, the team turned to what they already had: Sentry Profiling.

What Is Profiling and Why It Matters

Profiling is the process of collecting detailed runtime data about how your application actually executes in production. Unlike logs or metrics, profiling shows where time is being spent at the function and thread level.

Sentry supports multiple types of profiling:

CPU Profiling

CPU profiling shows how much time each function spends executing on the CPU. It helps identify:

Inefficient loops
Blocking operations
Redundant function calls
Expensive initialization logic

Browser Profiling

Browser profiling focuses on frontend performance, capturing:

Page load timing
Rendering delays
Long tasks on the main thread

Mobile Profiling

Mobile profiling provides similar insights for native and hybrid apps, tracking screen-level and function-level performance.

All of this data is visualized using flame graphs.

How to Read a Flame Graph (Quick Guide)

Flame graphs can look intimidating, but they’re incredibly powerful once you understand them:

X-axis (Time): The width of a block represents how long a function ran.
Y-axis (Call Stack): Shows which functions called which.
Colors: Help distinguish system code from application code, packages, or modules.

In Sentry, profiles can be viewed:

Individually
Aggregated across many runs
As differential profiles to spot regressions

Profiles can also be correlated with distributed traces, giving a full picture across CPU, network, and application layers.

Problem 1: Cold Starts Were Slowing Autofix Down

The Issue

Autofix had noticeable startup lag. Every time it began a root cause analysis or coding step, users experienced several seconds of delay before meaningful output appeared.
In distributed traces, one Celery task consistently stood out—sometimes taking 40+ seconds, far above the average.
This wasn’t just a perception problem. It was measurable, repeatable, and user-facing.

The Root Cause (Found via Profiling)

Opening a real production profile immediately revealed the problem.

A method called from_repo_definition, responsible for initializing a GitHub repository client, was being called over and over again:

While processing stack traces
When loading code for analysis
Every time Autofix accessed a file

Each call added over one second of latency because it made HTTP requests to GitHub behind the scenes.

Without profiling, this behavior would have been almost invisible.

The Fix

The solution was simple and surgical:

Standardize repo access through a single method
Add caching using Python’s @functools.cache

@functools.cache
def from_repo_definition(cls, repo_def, type):
...

This ensured the repository client was initialized once per run, not repeatedly.

The Result

After deployment:

Repo initialization dropped from multiple calls to one
Cold starts became significantly faster
Autofix’s first suggestion appeared in ~2 seconds instead of ~10

Across a full run, this saved:

~1 second per file access
~1.5 seconds before each Autofix step
~5 seconds before the coding step even began

Problem 2: Unnecessary Thread Blocking

The Issue

Encouraged by the first win, the team examined profiles from Autofix’s coding step.

Another issue jumped out immediately:
A 6-second thread wait at the end of each run.

Even after the AI had finished generating the final answer, users were still waiting.

The Root Cause

Autofix generates intermediate “insights” in parallel to keep users informed during execution. These insights run on a separate thread.

However:

A new insight generation task was being started even when the run was ending
Only one worker thread was available
All threads had to finish before returning the final result

This caused unnecessary blocking right at the worst possible moment.

The Fix

Two small changes fixed the issue:

Avoid starting insight generation when the run is about to end
Increase thread workers from 1 to 2 if completion.message.tool_calls: # only generate insights if the run is still ongoing

ThreadPoolExecutor(max_workers=2)

The Result

The 6-second delay was eliminated
Thread joins dropped to under 400ms
Final answers reached users immediately

As a bonus, insight quality improved by removing redundant summaries.

The Overall Impact

By fixing just two hidden inefficiencies, the Autofix team saved an estimated:

~20 seconds per Autofix run

No infrastructure changes.
No AI model swaps.
No complex redesign.

Just profiling + targeted fixes.

Backend performance graphs showed dramatic drops in slow transactions—some runs improved by up to 80 seconds.

Why Profiling Is Not Optional Anymore

Without profiling:

These issues could have shipped to production
First impressions would suffer
Debugging would rely on guesswork, logs, and assumptions

With profiling:

The problems were found in minutes
The fixes took less than an hour
The impact was immediate and measurable

Profiling isn’t about shaving milliseconds anymore. It’s about understanding real execution behavior in real user environments—especially in complex, AI-driven systems.

Final Thoughts

Sentry Profiling turned runtime behavior into actionable insight. It exposed inefficiencies that would have been nearly impossible to detect manually and enabled meaningful improvements with minimal effort.
If you care about:

Developer experience
User trust
AI product performance
Shipping fast and shipping right

Then profiling isn’t optional—it’s essential.
Fixing 10–20 seconds of lag in minutes isn’t theoretical.
It’s exactly what happened here.

DEV Community

How Profiling Helped Fix Slowness in Sentry’s AI Autofix

Why Speed Matters for Developers and Users

The Challenge of Debugging AI Performance

What Is Profiling and Why It Matters

CPU Profiling

Browser Profiling

Mobile Profiling

How to Read a Flame Graph (Quick Guide)

Problem 1: Cold Starts Were Slowing Autofix Down

The Issue

The Root Cause (Found via Profiling)

The Fix

The Result

Problem 2: Unnecessary Thread Blocking

The Issue

The Root Cause

The Fix

The Result

The Overall Impact

Why Profiling Is Not Optional Anymore

Final Thoughts

Top comments (0)