I Built a Tool That Watches Your Python App Run and Tells You What Static Analysis Can't

#python #django #performance #tooling

For the past few months I kept running into the same problem.

Mypy says the code is fine. Pylance says the code is fine. The tests pass. Then production breaks because some legacy caller is passing a float to a function annotated as int, and it's been doing it for six months without anyone noticing.

Static analysis can only see what's written. It can't see what's called.

So I built Ghost, a Python runtime observer that hooks into sys.setprofile and watches your app run, then tells you what actually happened.

pip install ghost-observer
ghost run app.py
ghost report

That's it. No code changes. No decorators. No configuration.

What it actually shows you

Here's a real example. I have a function:

def calculate_discount(price: int, quantity: int) -> int:
    return int(price * quantity * 0.9)

Mypy is happy. But after running under Ghost:

ghost report --sort exceptions

FUNCTION                    CALLS   EXC%   MEAN LAT   DOM. ARG SIG
calculate_discount:12           8     0%      <1µs     (int, int)

ghost anomalies

[MEDIUM] type_mismatch   calculate_discount (line 12)
         param 'price' annotated as int but observed 'float' in 3/8 calls (38%)
         param 'quantity' annotated as int but observed 'float' in 2/8 calls (25%)

38% of real calls are passing floats. Mypy never saw it. Ghost caught it on the first run.

The four things Ghost detects

1. Type mismatches
Compares PEP-484 annotations against observed runtime types. Every call. Not just the ones in your test suite.

2. Never-called functions
Ground-truth dead code detection. Not "this function isn't reachable according to the call graph", literally "this function was never called while the app ran." Real traffic, real answer.

3. High exception rates

[HIGH] high_exc_rate   validate_order (line 34)
       exception rate 60.0% (18/30 calls) exceeds threshold 5%

Silent failures at 60% call rate. No logs, no alerts, just Ghost.

4. Latency outliers
Flags functions whose mean latency is >2.5σ above their module average. Finds the one slow function hiding among ten fast ones.

How it works

Ghost installs two hooks before your app starts:

sys.setprofile — captures every call and return (~50ns overhead per event)
sys.settrace — exception detection only (distinguishes return None from raise)

Events flow into an in-memory buffer → background thread flushes to SQLite every 1–5s (adaptive) → aggregator builds per-function profiles.

Privacy: Ghost captures type(value).__qualname__, never the value itself. No secrets, PII, or passwords ever enter the buffer.

All the commands

# Observe any script
ghost run app.py
ghost run manage.py runserver  # Django
ghost run -m uvicorn main:app  # FastAPI

# Profile table
ghost report
ghost report --sort latency
ghost report --sort exceptions

# Deep dive on one function
ghost explain process_order
ghost explain validate_user --backend gemini  # AI analysis with GEMINI_API_KEY

# Anomaly detection
ghost anomalies
ghost anomalies --exc-threshold 0.02   # flag anything above 2%

# Compare two runs
ghost sessions
ghost diff <session-1> <session-2>

# Live-updating terminal dashboard
ghost watch
ghost watch --interval 1 --sort latency

# Export for other tools
ghost export --format json -o profile.json
ghost export --format csv -o profile.csv

# Housekeeping
ghost clean --older-than 7

Comparing two sessions

This is where Ghost gets genuinely useful for refactoring. Run before:

ghost run app.py
# make your changes
ghost run app.py
ghost diff <session-before> <session-after>

Output:

Ghost diff  session-abc123  →  session-def456

  ── changed (3) ──
  CHANGED   process_order (line 42)
             mean latency: 12.50ms → 4.20ms  (0.34×) ↓ faster
             call count: 100 → 100  (no change)

  CHANGED   validate_order (line 18)
             exception rate: 60.0% → 8.0%  ↓

  CHANGED   get_product_details (line 87)
             mean latency: 45.00ms → 2.10ms  (0.05×) ↓ faster
             new arg signature observed: (int, str)

Traffic-weighted proof that your refactor actually helped.

The competitive gap

Tool	What it sees
mypy / Pylance	Source code types (annotations only)
cProfile	Call counts and timing (no types, no exceptions)
Sentry	Errors that reach your error handler
Ghost	Every call, actual types, real exception rates, measured latency

The four things Ghost detects — runtime type mismatches, truly dead code, silent exception rates, latency outliers — are invisible to static tools by definition. They only exist at runtime.

Install

pip install ghost-observer

# Optional: AI-powered explanations
pip install ghost-observer[gemini]
export GEMINI_API_KEY=your-key
ghost explain your_slow_function