pyscn: Keeping AI-Generated Python Code Clean with Structural Analysis
As developers rely more on AI tools to generate large amounts of code, maintaining code quality becomes increasingly challenging. pyscn is designed to address this by detecting structural issues—unreachable code, duplication, complexity, and architectural coupling—that traditional linters often overlook.
Design Goals
- Structural analysis over style — focuses on architecture and logic, not formatting.
- High throughput — suitable for large codebases and CI pipelines.
- Low noise, deterministic results — grounded in CFGs, ASTs, and edit-distance algorithms.
- AI integration — built to work seamlessly with modern code assistants.
Core Architecture
Go + Tree-sitter
pyscn is implemented in Go for performance and concurrency, and uses Tree-sitter to parse Python efficiently.
Key Characteristics
- Supports Python 3.8+ syntax.
- CST parsing is resilient to partial or invalid input.
- Parallelized file scanning for speed.
Distribution Model
The Go binary is embedded inside the Python wheel, providing:
- Native pip / pipx installation experience.
- No Go toolchain required for end users.
- Full performance of compiled code.
Analysis Techniques
Dead Code Detection (Control Flow Graphs)
-
pyscnbuilds a Control Flow Graph (CFG) for every function:- Explicit Entry/Exit nodes.
- Branches for if, while, for, try/except, etc.
- Reachability analysis (BFS/DFS) marks blocks as dead if they cannot be reached from the entry point. This reduces false positives and identifies logic-level unreachable paths that text-based linters miss.
Clone Detection (LSH → APTED)
pyscn uses a two-stage clone detection pipeline:
- LSH (Locality-Sensitive Hashing) Quickly identifies likely clone candidates using MinHash on normalized AST features.
- APTED (Tree Edit Distance) Precisely measures structural similarity, even when identifiers differ. This combination scales to large repositories while maintaining accuracy.
Complexity, Duplication, and Coupling
- Cyclomatic complexity Aggregates per-function complexity and applies continuous penalties.
- Duplication Flags clone groups and calculates duplicated code percentages.
- Coupling (CBO) Measures cross-module/class dependencies to highlight fragile architecture.
Scoring and Reports
Each project receives a Health Score (0–100) and a grade.
The score starts at 100 and subtracts penalties for:
- Complexity
- Duplication
- Dead code (severity-based)
- Coupling (high CBO)
Reports include:
- HTML dashboards
- JSON output
- Clone groups
- Dead code locations
AI Integration with MCP
pyscn includes a built-in Model Context Protocol (MCP) server (pyscn-mcp).
AI assistants can:
- Call analysis functions (detect_clones, find_dead_code, etc.)
- Request structured JSON results
- Perform refactors based on pyscn output.
This enables workflows where the AI not only sees the problems but can automatically repair them.
MCP Configuration Example (Cursor / Claude)
{
"mcpServers": {
"pyscn-mcp": {
"command": "uvx",
"args": ["pyscn-mcp"]
}
}
}
Installation
Recommended:
pipx install pyscn
Or with uv:
uv tool install pyscn
Running an Analysis
pyscn analyze .
Outputs:
- HTML report
- Complexity hotspots
- Dependency cycles
- Clone groups
- Complexity metrics
Summary
pyscn combines:
- The speed of Go
- The parsing accuracy of Tree-sitter
- Proven algorithms like CFGs, LSH, and APTED
- MCP-based AI interoperability
The result is a modern, high-performance analyzer built for AI-driven development environments.
Star pyscn on GitHub and try it on your next project—what structural issues will it uncover? Share your thoughts in the comments!
Top comments (0)