I built a dataset of 50,000 debugging sessions — and what I found surprised me

#machinelearning #python #datascience #productivity

Most bug datasets only tell you: "Bug was fixed in 23 minutes."

They don't tell you what happened during those 23 minutes.

So I built one that does.

What is DebugTraj-50K?

It is a dataset of 50,000 developer debugging sessions with 665,364 step-by-step behavioral events recorded across 8 programming languages and 71 error types.

Every session captures what a developer actually did while fixing a bug:

How many times they searched Google
How many compile attempts they made
Which files they opened
Whether they used an AI tool
Whether they asked a colleague
What strategy finally worked
And whether they succeeded or gave up

This is not a dataset about bugs. It is a dataset about human behavior under pressure.

Who is this useful for?

AI researchers and companies
Tools like GitHub Copilot and Cursor need behavioral data to train models that understand how developers think — not just how code looks. This dataset fills that gap.

ML practitioners
You can build models to predict:

Will this debugging session succeed?
How long will it take?
What will the developer do next?

CS educators
See exactly where students struggle, how long they take, and what makes senior developers faster.

DevTool companies
Understand real developer pain points when building IDEs, debuggers, and productivity tools.

Individual developers
Compare your own debugging habits with 50,000 others.

The Daily Coding Companion

On top of the dataset I built a practical notebook called the Daily Coding Companion.

It answers questions every developer asks themselves while debugging:

How long will MY bug take to fix?
Am I taking too long compared to my peers?
What should I try next when I am stuck?
When should I stop and ask for help?
Which language is actually hardest to debug?

You just fill in your language, your error type, and how long you have been stuck — and it gives you answers based on real data from 50,000 sessions.

No ML knowledge needed.

# Example: estimate fix time for your current bug
my_language   = 'Python'
my_experience = 'Mid-level (2-5 years)'
my_severity   = 3

similar = sessions[
    (sessions['programming_language'] == my_language) &
    (sessions['experience_level']     == my_experience) &
    (sessions['error_severity']       == my_severity)
]

print(f"Expected fix time : {similar['resolution_time_minutes'].mean():.0f} minutes")
print(f"Chance of fixing  : {(similar['outcome']=='fixed').mean()*100:.1f}%")

Key findings from the data

Senior developers fix bugs 3.7x faster than junior developers on average
Rust and C++ take 1.5x longer to debug than Python or JavaScript
Sessions where developers took a break had a higher fix rate
AI tool usage is highest among mid-level developers, not juniors
After 8+ searches and 12+ compile attempts, asking a colleague increases fix rate significantly