DEV Community

Cover image for I Built a Tool That Watches Your Python App Run and Tells You What Static Analysis Can't
Tanay Baviskar
Tanay Baviskar

Posted on

I Built a Tool That Watches Your Python App Run and Tells You What Static Analysis Can't

For the past few months I kept running into the same problem.

Mypy says the code is fine. Pylance says the code is fine. The tests pass. Then production breaks because some legacy caller is passing a float to a function annotated as int, and it's been doing it for six months without anyone noticing.

Static analysis can only see what's written. It can't see what's called.

So I built Ghost, a Python runtime observer that hooks into sys.setprofile and watches your app run, then tells you what actually happened.

pip install ghost-observer
ghost run app.py
ghost report
Enter fullscreen mode Exit fullscreen mode

That's it. No code changes. No decorators. No configuration.


What it actually shows you

Here's a real example. I have a function:

def calculate_discount(price: int, quantity: int) -> int:
    return int(price * quantity * 0.9)
Enter fullscreen mode Exit fullscreen mode

Mypy is happy. But after running under Ghost:

ghost report --sort exceptions

FUNCTION                    CALLS   EXC%   MEAN LAT   DOM. ARG SIG
calculate_discount:12           8     0%      <1µs     (int, int)
Enter fullscreen mode Exit fullscreen mode
ghost anomalies

[MEDIUM] type_mismatch   calculate_discount (line 12)
         param 'price' annotated as int but observed 'float' in 3/8 calls (38%)
         param 'quantity' annotated as int but observed 'float' in 2/8 calls (25%)
Enter fullscreen mode Exit fullscreen mode

38% of real calls are passing floats. Mypy never saw it. Ghost caught it on the first run.


The four things Ghost detects

1. Type mismatches
Compares PEP-484 annotations against observed runtime types. Every call. Not just the ones in your test suite.

2. Never-called functions
Ground-truth dead code detection. Not "this function isn't reachable according to the call graph", literally "this function was never called while the app ran." Real traffic, real answer.

3. High exception rates

[HIGH] high_exc_rate   validate_order (line 34)
       exception rate 60.0% (18/30 calls) exceeds threshold 5%
Enter fullscreen mode Exit fullscreen mode

Silent failures at 60% call rate. No logs, no alerts, just Ghost.

4. Latency outliers
Flags functions whose mean latency is >2.5σ above their module average. Finds the one slow function hiding among ten fast ones.


How it works

Ghost installs two hooks before your app starts:

  • sys.setprofile — captures every call and return (~50ns overhead per event)
  • sys.settrace — exception detection only (distinguishes return None from raise)

Events flow into an in-memory buffer → background thread flushes to SQLite every 1–5s (adaptive) → aggregator builds per-function profiles.

Privacy: Ghost captures type(value).__qualname__, never the value itself. No secrets, PII, or passwords ever enter the buffer.


All the commands

# Observe any script
ghost run app.py
ghost run manage.py runserver  # Django
ghost run -m uvicorn main:app  # FastAPI

# Profile table
ghost report
ghost report --sort latency
ghost report --sort exceptions

# Deep dive on one function
ghost explain process_order
ghost explain validate_user --backend gemini  # AI analysis with GEMINI_API_KEY

# Anomaly detection
ghost anomalies
ghost anomalies --exc-threshold 0.02   # flag anything above 2%

# Compare two runs
ghost sessions
ghost diff <session-1> <session-2>

# Live-updating terminal dashboard
ghost watch
ghost watch --interval 1 --sort latency

# Export for other tools
ghost export --format json -o profile.json
ghost export --format csv -o profile.csv

# Housekeeping
ghost clean --older-than 7
Enter fullscreen mode Exit fullscreen mode

Comparing two sessions

This is where Ghost gets genuinely useful for refactoring. Run before:

ghost run app.py
# make your changes
ghost run app.py
ghost diff <session-before> <session-after>
Enter fullscreen mode Exit fullscreen mode

Output:

Ghost diff  session-abc123  →  session-def456

  ── changed (3) ──
  CHANGED   process_order (line 42)
             mean latency: 12.50ms → 4.20ms  (0.34×) ↓ faster
             call count: 100 → 100  (no change)

  CHANGED   validate_order (line 18)
             exception rate: 60.0% → 8.0%  ↓

  CHANGED   get_product_details (line 87)
             mean latency: 45.00ms → 2.10ms  (0.05×) ↓ faster
             new arg signature observed: (int, str)
Enter fullscreen mode Exit fullscreen mode

Traffic-weighted proof that your refactor actually helped.


The competitive gap

Tool What it sees
mypy / Pylance Source code types (annotations only)
cProfile Call counts and timing (no types, no exceptions)
Sentry Errors that reach your error handler
Ghost Every call, actual types, real exception rates, measured latency

The four things Ghost detects — runtime type mismatches, truly dead code, silent exception rates, latency outliers — are invisible to static tools by definition. They only exist at runtime.


Install

pip install ghost-observer

# Optional: AI-powered explanations
pip install ghost-observer[gemini]
export GEMINI_API_KEY=your-key
ghost explain your_slow_function
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/Tanaybaviskar/ghost-observer
PyPI: pypi.org/project/ghost-observer


Would love to hear what anomalies Ghost finds in your codebase. Drop them in the comments.


Top comments (0)