Python Code Quality Tools Beyond Linting

#architecture #python #tooling

The landscape of Python software quality tooling is currently defined by two contrasting forces: high-velocity convergence and deep specialization. The recent, rapid adoption of Ruff has solved the long-standing community problem of coordinating dozens of separate linters and formatters, establishing a unified, high-performance axis for standard code quality.

A second category of tools continues to operate in necessary, but isolated, silos. Tools dedicated to architectural enforcement and deep structural metrics, such as:

import-linter (Layered architecture enforcement)
tach (Dependency visualization and enforcement)
complexipy, radon, lizard (Metrics for overall and cognitive complexity)
module_coupling_metrics, lcom, and cohesion (Metrics for coupling and class cohesion)
pyscn - Python Code Quality Analyzer (Module dependencies, clone detection, complexity)

These projects address fundamental challenges of code maintainability, evolvability, and architectural debt that extend beyond the scope of fast, stylistic linting. The success of Ruff now presents the opportunity to foster a cross-tool discussion focused not just on syntax, but on structure.

Specialized quality tools are vital for long-term maintainability and risk assessment. Tools like import-linter and tach mitigate technical risk by enforcing architectural rules, preventing systemic decay, and reducing change costs. Complexity and cohesion metrics from tools such as complexipy, lcom, and cohesion quantitatively flag overly complex or highly coupled components, acting as early warning systems for technical debt. By analysing the combined outputs, risk assessment shifts to predictive modelling: integrating data from individual tools (e.g., import-linter violations, complexipy scores) creates a multi-dimensional risk score. Overlaying these results, such as identifying modules that are both low in cohesion and involved in tach-flagged dependency cycles, generates a "heat map" of technical debt. This unified approach, empirically validated against historical project data like bug frequency and commit rates can yield a predictive risk assessment. It identifies modules that are not just theoretically complex but empirically confirmed sources of instability, transforming abstract quality metrics into concrete, prioritized refactoring tasks for the riskiest codebase components.

Reasons to Connect

Bring the maintainers and core users of these diverse tools into a shared discussion.

Increasing Tool Visibility and Sustainability: Specialized tools often rely on small, dedicated contributor pools and suffer from knowledge isolation, confining technical debate to their specific GitHub repository. A broader discussion provides these projects with critical outreach, exposure to a wider user base, and a stronger pipeline of new contributors, ensuring their long-term sustainability.

Let's start the conversation on how to 'measure' maintainable, and architecturally sound Python code.
And keep Goodhart's law: "When a measure becomes a target, it ceases to be a good measure" in mind ;-)

Top comments (2)

Christian Ledermann • Oct 6

Here is another one, only a couple of months old:
pyscn - Python Code Quality Analyzer

🔍 CFG-based dead code detection – Find unreachable code after exhaustive if-elif-else chains
📋 Clone detection with APTED + LSH – Identify refactoring opportunities with tree edit distance
🔗 Coupling metrics (CBO) – Track architecture quality and module dependencies
📊 Cyclomatic complexity analysis – Spot functions that need breaking down
100,000+ lines/sec • Built with Go + tree-sitter