I ran the same deliberately broken codebase through all three major Python type checkers and the results weren't even close.
mypy caught 34 errors. pyright found 58. Pyre? Only 29.
This isn't a synthetic benchmark. I built a realistic 800-line Python project with common type errors — the kind that slip into production: optional chaining bugs, dict key assumptions, protocol violations, generic variance issues. Then I measured what each checker actually caught.
The performance gap matters because type checkers aren't just linters. They're your first line of defense against AttributeError: 'NoneType' at 3am. Pick the wrong one and you're shipping bugs that should've been impossible.
The Test Corpus: 15 Categories of Real-World Type Errors
I didn't want contrived examples. The test project simulates a data pipeline service:
- REST API endpoints (FastAPI-style)
- Database models with SQLAlchemy-ish patterns
- Async workers processing JSON payloads
- Utility functions doing dict/list transformations
Continue reading the full article on TildAlice

Top comments (0)