For the past few months I kept running into the same problem.
Mypy says the code is fine. Pylance says the code is fine. The tests pass. Then production breaks because some legacy caller is passing a float to a function annotated as int, and it's been doing it for six months without anyone noticing.
Static analysis can only see what's written. It can't see what's called.
So I built Ghost, a Python runtime observer that hooks into sys.setprofile and watches your app run, then tells you what actually happened.
pip install ghost-observer
ghost run app.py
ghost report
That's it. No code changes. No decorators. No configuration.
What it actually shows you
Here's a real example. I have a function:
def calculate_discount(price: int, quantity: int) -> int:
return int(price * quantity * 0.9)
Mypy is happy. But after running under Ghost:
ghost report --sort exceptions
FUNCTION CALLS EXC% MEAN LAT DOM. ARG SIG
calculate_discount:12 8 0% <1µs (int, int)
ghost anomalies
[MEDIUM] type_mismatch calculate_discount (line 12)
param 'price' annotated as int but observed 'float' in 3/8 calls (38%)
param 'quantity' annotated as int but observed 'float' in 2/8 calls (25%)
38% of real calls are passing floats. Mypy never saw it. Ghost caught it on the first run.
The four things Ghost detects
1. Type mismatches
Compares PEP-484 annotations against observed runtime types. Every call. Not just the ones in your test suite.
2. Never-called functions
Ground-truth dead code detection. Not "this function isn't reachable according to the call graph", literally "this function was never called while the app ran." Real traffic, real answer.
3. High exception rates
[HIGH] high_exc_rate validate_order (line 34)
exception rate 60.0% (18/30 calls) exceeds threshold 5%
Silent failures at 60% call rate. No logs, no alerts, just Ghost.
4. Latency outliers
Flags functions whose mean latency is >2.5σ above their module average. Finds the one slow function hiding among ten fast ones.
How it works
Ghost installs two hooks before your app starts:
-
sys.setprofile— captures every call and return (~50ns overhead per event) -
sys.settrace— exception detection only (distinguishesreturn Nonefromraise)
Events flow into an in-memory buffer → background thread flushes to SQLite every 1–5s (adaptive) → aggregator builds per-function profiles.
Privacy: Ghost captures type(value).__qualname__, never the value itself. No secrets, PII, or passwords ever enter the buffer.
All the commands
# Observe any script
ghost run app.py
ghost run manage.py runserver # Django
ghost run -m uvicorn main:app # FastAPI
# Profile table
ghost report
ghost report --sort latency
ghost report --sort exceptions
# Deep dive on one function
ghost explain process_order
ghost explain validate_user --backend gemini # AI analysis with GEMINI_API_KEY
# Anomaly detection
ghost anomalies
ghost anomalies --exc-threshold 0.02 # flag anything above 2%
# Compare two runs
ghost sessions
ghost diff <session-1> <session-2>
# Live-updating terminal dashboard
ghost watch
ghost watch --interval 1 --sort latency
# Export for other tools
ghost export --format json -o profile.json
ghost export --format csv -o profile.csv
# Housekeeping
ghost clean --older-than 7
Comparing two sessions
This is where Ghost gets genuinely useful for refactoring. Run before:
ghost run app.py
# make your changes
ghost run app.py
ghost diff <session-before> <session-after>
Output:
Ghost diff session-abc123 → session-def456
── changed (3) ──
CHANGED process_order (line 42)
mean latency: 12.50ms → 4.20ms (0.34×) ↓ faster
call count: 100 → 100 (no change)
CHANGED validate_order (line 18)
exception rate: 60.0% → 8.0% ↓
CHANGED get_product_details (line 87)
mean latency: 45.00ms → 2.10ms (0.05×) ↓ faster
new arg signature observed: (int, str)
Traffic-weighted proof that your refactor actually helped.
The competitive gap
| Tool | What it sees |
|---|---|
| mypy / Pylance | Source code types (annotations only) |
| cProfile | Call counts and timing (no types, no exceptions) |
| Sentry | Errors that reach your error handler |
| Ghost | Every call, actual types, real exception rates, measured latency |
The four things Ghost detects — runtime type mismatches, truly dead code, silent exception rates, latency outliers — are invisible to static tools by definition. They only exist at runtime.
Install
pip install ghost-observer
# Optional: AI-powered explanations
pip install ghost-observer[gemini]
export GEMINI_API_KEY=your-key
ghost explain your_slow_function
GitHub: github.com/Tanaybaviskar/ghost-observer
PyPI: pypi.org/project/ghost-observer
Would love to hear what anomalies Ghost finds in your codebase. Drop them in the comments.

Top comments (0)