Ruff vs Black vs Flake8: 1000-File Accuracy Benchmark

#ruff #flake8 #black #python

Ruff Caught 847 Issues. Black Caught 12. Here's What Broke.

I ran three Python linters against 1000 files pulled from 15 open-source repos (Django, Requests, Pandas, Flask, and 11 others). Ruff flagged 847 issues. Black caught 12. Flake8 found 634 but took 18x longer than Ruff.

This isn't a "Ruff is faster" post — everyone knows that already. This is about what actually matters: which tool catches the bugs that break production, and which ones just yell about line length.

The repo set totaled 187,432 lines of Python. I ran each tool with default configs, no exceptions, no ignore rules. Just the out-of-the-box experience a new developer would get.

A detailed photograph of a Ruff bird resting in the grass showcasing its plumage. — Photo by Elien on Pexels

The Benchmark Setup: 1000 Files, 3 Tools, Zero Mercy

I cloned 15 repos at specific commits (January 2025 snapshots) and sampled 60-80 .py files from each. The selection was random but excluded test files, migration scripts, and auto-generated code. This gave me real application logic: view functions, ORM models, utility modules, API clients.

The tooling:

Continue reading the full article on TildAlice