Muneer Alam

Posted on Jul 4

Python Tracebacks Tell You Where. Not Why. So I Built Something That Does Both.

#python #opensource #devtools #productivity

Friday, 11:47 PM. Production goes down.

The alert says KeyError: 'user_id'. The traceback points to line 42 of a file that has not changed in three months. You SSH in, tail the logs, and find nothing. No variable values. No request context. No path forward but to add a print statement, redeploy, and wait.

Three hours later it happens again. You still do not know why.

If you have written Python long enough, this sequence is familiar. The language gives you file names, line numbers, and function names. It leaves out the one thing you actually need: what the values were at the moment of failure.

Python's traceback module plays it safe. Capturing runtime state inside a crash handler risks a double fault. If the exception hook itself raises, CPython writes an error message and the original traceback is lost. So the standard library prints the stack and leaves frame locals untouched.

We, as an ecosystem, have accepted this trade-off for decades.

I recently built a tool to fix this. It is called Safedump. You install it in two lines:

import safedump
safedump.install()

When your application crashes, it captures every local variable, the full exception chain, and thread state. It saves this as structured JSON on your machine. Later, you inspect it with safedump view. No cloud involved. No telemetry. No accounts.

It is the missing layer between Python tracebacks and cloud crash reporting.

What a Traceback Actually Tells You

Consider a real scenario. You are parsing user-submitted JSON:

import json

def process_payment(data: dict) -> dict:
    user = data["user"]
    amount = data["amount"]
    currency = data.get("currency", "USD")
    gateway = GATEWAYS[user["tier"]]  # this can fail
    return gateway.charge(user["id"], amount, currency)

The traceback when it crashes:

Traceback (most recent call last):
  File "payments.py", line 22, in <module>
    process_payment(payload)
  File "payments.py", line 8, in process_payment
    gateway = GATEWAYS[user["tier"]]
KeyError: 'premium_plus'

This tells you a KeyError happened on line 8. The missing key was premium_plus.

What it does not tell you:

What was user["tier"]? None? An empty string? A misspelled value?
What was the full data payload? Was user even present?
Which request triggered this? A specific customer or a random API scan?
What did GATEWAYS look like? Was premium_plus supposed to be there?

Each question requires a different technique. Add a log statement. Reproduce the input. Check CloudWatch. SSH into the box. By the time you have done all four, the incident timer has hit forty minutes.

What Existing Tools Do Well

The Python ecosystem has several tools that improve on the bare traceback. They deserve credit.

rich.traceback makes tracebacks dramatically more readable with syntax highlighting and better formatting. If you are developing locally, it is a genuine quality-of-life improvement. But it is still terminal-only. It prints to stderr and disappears when the session ends.

stackprinter goes further by showing source code context and local variable values. You can call stackprinter.format() and get a string you can log. The output is plain text, readable anywhere but without structure. You cannot ask it what the value of user was in frame 2.

Sentry is the most mature option. It captures exceptions, aggregates them, and handles the full monitoring lifecycle. It also captures frame locals. The trade-off is that it is cloud-dependent. Your data leaves your network and you need an account.

Rollbar solves the same problem with a similar architecture. Useful, but the same constraint: your crash data lives on someone else's server.

Each tool solves a real problem. But there is a gap between "make the terminal output prettier" and "ship your data to the cloud." Safedump sits in that gap.

Why Not Just Use Logging?

A common question: why not add logger.exception() to your error handler?

Logging captures a string. Safedump captures structured data. A log line requires parsing. A Safedump report is typed JSON ready for programmatic analysis. Logging also cannot redact secrets. If your log statement includes password=request.form["password"], that secret is in your aggregation system forever. Logging is essential for tracing application flow. But it is not designed for post-mortem crash analysis with full variable state.

The Smallest Useful Crash Reporter

When I started building this, one constraint drove everything: the crash handler must never crash. If capturing variable values makes the handler itself raise, CPython loses the original traceback entirely.

I built something that separates capturing state from displaying it.

pip install safedump

import safedump
safedump.install()

When an unhandled exception occurs, you see two things. First, the original traceback, unchanged, so your existing workflows are not disrupted. Second:

Crash report saved: ~/.safedump/2026-06-25-19-48-11-ZeroDivisionError-a1b2c3.safedump.json

Later, you inspect it:

safedump view

The output shows the exception, every frame with local variables and their types, source code context, thread state, and environment metadata. Human-readable in the terminal and machine-parseable as JSON.

You can also capture exceptions manually:

try:
    result = dangerous_operation()
except Exception:
    path = safedump.capture_exception()
    print(f"Crash captured: {path}")
    raise

The Same Crash, Two Views

Here is everything a standard traceback gives you from the earlier KeyError example:

KeyError: 'premium_plus'

Here is the same crash through a Safedump report. The report lives on disk as versioned JSON:

{
  "safedump_version": "1.0.0",
  "timestamp": "2026-06-25T19:48:11.000000",
  "exception": {
    "type": "KeyError",
    "message": "'premium_plus'",
    "module": "builtins"
  },
  "frames": [
    {
      "index": 0,
      "file": "payments.py",
      "line": 8,
      "function": "process_payment",
      "locals": {
        "user": {"type": "dict", "value": "{'id': '...', 'tier': 'premium_plus', 'name': '...'}"},
        "amount": {"type": "int", "value": "2999"},
        "currency": {"type": "str", "value": "USD"}
      }
    }
  ],
  "redactions": [
    {
      "location": "process_payment.user.id",
      "reason": "variable_name_match",
      "rule": "DENYLIST_SUBSTRING: id"
    }
  ]
}

[JSON Report Screenshot: Shows the reader what they get instead of a plain traceback. Structured, versioned JSON with a redaction audit trail.]

With the standard traceback you have the exception type and message. With the report you have the exact value of user["tier"], the full payload context, thread state, and a redaction audit trail. You can answer every earlier question without reproducing the bug.

Why the Name?

Safedump combines two ideas: the report is safe to share (secrets redacted), and it is a dump of runtime state at the crash moment. The name is honest: this is a raw capture of what your program looked like when it failed, cleaned up enough to share without embarrassment.

Things That Almost Broke the Crash Handler

Building a crash handler is harder than it looks. You are already in an error state. Memory might be corrupted. Any allocation could trigger another failure. Here are some of the problems I hit and how each shaped the design.

The Handler Must Never Crash

If sys.excepthook raises, CPython writes an error to stderr. The original traceback is lost. Every operation inside the handler is wrapped in try/except. Serialization fails? Caught. Disk full? Caught. Memory allocation fails? Falls back to a pre-allocated buffer. The original traceback always prints.

The 100x Problem

Rich's traceback rendering is roughly 100x slower than stdlib formatting for large tracebacks. Inside a crash handler that is unacceptable. The solution was two phases. Capture at crash time using stdlib only. Render after the crash with Rich formatting and no time constraint. The capture phase never imports Rich. It never calls __repr__ on untrusted objects. It uses reprlib.repr() with depth and length limits.

Why JSON, Not Pickle

JSON cannot execute code during deserialization. Pickle was rejected for this reason. So were cloudpickle and dill. JSON also means reports are universally readable. Pipe them into jq or process them in CI. Every report includes a safedump_version field for forward compatibility.

Sharing Crash Reports Without Sharing Secrets

The first time you paste a crash report containing a production API key into a GitHub issue, you learn this lesson. Safedump applies two layers of redaction automatically. First, a variable name denylist: any local whose name contains patterns like password, token, or key has its value replaced. The matching is tiered to avoid false positives. keyboard does not match key, but api_key does. Second, regex-based credential detection catches AWS keys, GitHub tokens, and JWTs even when the variable name is not suspicious.

Atomic Writes or Corrupted Reports

Reports are written to a temp file, then atomically renamed via os.replace(). This prevents partial writes from overwriting valid reports. If the primary directory is unwritable, Safedump falls back to /tmp. The original traceback always displays.

How It Compares

Tool	Best for	Limitation
Stdlib traceback	Zero-dependency debugging everywhere	No variable values, no structure, no redaction
rich.traceback	Local development, beautiful terminal output	Terminal only, output disappears when session ends
stackprinter	Getting variable values into logs, lightweight	Plain text only, cannot query structure later
Sentry	Production aggregation, dashboards, team workflows	Requires cloud, data leaves your network
Safedump	Local crash reports with full state. Offline inspection. Structured JSON. Secret redaction	No built-in aggregation. Designed for single-crash analysis, not fleet monitoring

The takeaway: if you already use Sentry, keep using it. If you want something between "add a print statement" and "set up a monitoring service," Safedump is worth trying. It also works alongside Sentry: Safedump captures more per-crash detail locally while Sentry handles aggregation across services.

Who this is for: Python developers debugging production crashes who want more context than a traceback provides, but do not want to set up a monitoring service for every project. Open-source maintainers who want crash reports from users without asking them to create accounts or send log files.

Who this is not for: Teams that need fleet-wide error aggregation, dashboards, and alerting. That is Sentry's job. If you already have a monitoring setup that works, Safedump is complementary, not a replacement.

Lessons That Apply Beyond Crash Handlers

Research before architecture, and architecture before code. The most expensive bug is the one you discover during implementation because you did not think through the design. A structured review caught five issues that would have required significant refactoring after code was written. Catching them before a single line existed cost nothing.

Write down why you rejected alternatives. Every "we chose X instead of Y" decision becomes a valuable artifact when someone later asks why you did not use Y. Safedump has four Architecture Decision Records documenting the constitution, API design, architecture approval, and public API freeze.

Test the crash handler by crashing it. The hardest tests to write were the ones that deliberately triggered failures inside the handler. Simulating MemoryError, disk full, permission denied, and corrupt inputs. These tests found real bugs that would not have appeared in normal use, but would have failed catastrophically in production.

Separation of concerns is not optional in error paths. The hot path inside the exception handler and the cold path for rendering and CLI should share no code paths, no imports, and no mutable state. The hot path must be predictable. The cold path can be anything.

What This Changes

Going back to the opening scenario. Friday night. KeyError. No context.

With Safedump installed, that same crash produces a file containing every local variable, the exception chain, thread state, and environment metadata. You run safedump view and see that user["tier"] was None. A missing field in the request payload.

No redeploy. No logger.debug. No SSH session. No reproducing the bug.

The traceback told me where. The report told me why.

Safedump is open source, MIT licensed, and works on Python 3.9 through 3.13. Install it with pip install safedump and capture crash context in under a minute.

I would love to hear from you: what information do you wish Python tracebacks included? What does your debugging workflow look like today? If you hit a production crash and the traceback was not enough, what did you reach for?

If you try Safedump in a real project, I would like to know how it went. The good first issue label on GitHub is where to start if you want to contribute. Bug reports and feature requests go through Issues. Discussion happens in the Discussions tab.

GitHub: github.com/Muneer320/safedump

Top comments (4)

Amrit Kang • Jul 4

Definitely bookmarking Safedump!

Mark Ruth • Jul 5

Really, this looks interesting, am gonna look at the repo

Unknown • Jul 5

Another day another banger, great work mate

Alex Shev • Jul 5

Tracebacks answer location, but teams usually need cause and next safe step. The useful version of this tool would preserve the raw stack trace while adding hypotheses that can be tested.