The problem no one talks about π€«
Your linter verifies code style. Your SAST scans for known vulnerability patterns. Your tests confirm behavior.
But who checks that your release is internally coherent?
- Does your
pyproject.tomldeclare all the packages you actually import? - Does your
README.mddocument CLI commands that actually exist? - Does your CI matrix include the Python version you and your users develop on?
- Are your
except Exception: passblocks intentional, or are they silently hiding critical bugs?
We built HefestoAI to answer these exact questions. To prove it works, we ran it against some of Python's most popular and well-maintained libraries.
Here is what we discovered.
What we found π
Even elite projects suffer from drift between what they claim and what they do.
β‘ FastAPI (1,179 files analyzed)
- CI Config Drift: FastAPI's CI matrix tests against Python 3.13 and 3.14, but misses 3.12 β the version a vast majority of developers run today.
- Silent Exception Swallows (5 instances): Patterns like
except Exception: return []in middleware code. These silently hide underlying errors instead of properly logging or re-raising them.
π¨ Rich (224 files analyzed)
- Bare
except:clauses (6 instances): Catching everything without specifying a type. This carelessly masksKeyboardInterrupt,SystemExit, and other system exceptions that should propagate. - Broad Exception Swallows (6 instances): Cases of
except Exception: passthat silently swallow real problems.
π httpx (66 files analyzed)
- CI Matrix Drift: Only testing against Python 3.9, neglecting the modern 3.12 standard used by most of the community.
These aren't bugs. They're coherence issues. π§©
None of these findings will crash an app today. But they are the exact kind of "drift" that causes:
- "Works on my machine" syndromes because your CI tests a completely different Python version.
- Multi-day debug sessions due to a silent failure hidden behind an overbroad exception.
- Broken deployments because a new dependency was imported in code but forgotten in
pyproject.toml.
We call this category Release Truth β mechanically verifying that what your project claims is true, is actually true.
High Precision > High Noise π―
Before making noise and opening PRs, we needed to ensure we weren't just building another tool that spams developers with false positives. So, we created a benchmark:
| Code Set | Files | Findings | Result |
|---|---|---|---|
| Vulnerable (AI-generated patterns) | 10 | 13 true positives | 100% recall |
| Safe (Proper code) | 10 | 0 false positives | 100% precision |
The vulnerable set includes common insecure patterns generated by Copilot and Claude: f-string SQL injection, os.system() command injection, hardcoded API keys, eval() usage, pickle deserialization, assert in production, and attribute typos.
The safe set uses the correct, idiomatic alternatives: parameterized queries, subprocess.run() with list args, environment variables, ast.literal_eval(), and proper exception handling.
Every vulnerable pattern was caught. Not a single safe pattern was falsely flagged.
How we reached 0% False Positives π οΈ
Reaching zero noise wasn't easy. Over the last month, we:
- Rewrote SQL injection detection: We now require a DB execution sink to be in-scope. This eliminated a 43% false-positive rate stemming from innocent DB-API placeholders.
- Shifted to AST-based checks: We check
assert,pickle,bare except, andevalusing the Abstract Syntax Tree, replacing regex setups that produced noise. - Recognized
@propertydecorators: We now treat them as valid attributes, which eliminated 55 false positives across httpx and Rich with a single fix. - Respected
contextlib.suppress(ImportError): Optional imports correctly wrapped are no longer inaccurately flagged as undeclared dependencies.
Every single fix was validated with before/after fixture evidence. We dogfood HefestoAI on itself constantly (470+ tests, 0 regressions).
Try it yourself π
You can run it right now on your CLI:
pip install hefesto-ai
hefesto analyze . --fail-on HIGH
Or add it directly to your CI as a pre-commit hook:
repos:
- repo: https://github.com/artvepa80/Agents-Hefesto
rev: v4.11.1 # Or use the latest!
hooks:
- id: hefesto-analyze
What makes it different:
- β‘ Blazing Fast: ~0.01s per file.
- π Polyglot: Supports 21 formats natively (Python, TypeScript, Java, Go, Rust, YAML, Terraform, Dockerfile, SQL, etc.).
- π Private & Local: Fully deterministic and offline-first. No API keys required.
- π§ Smart Context (Optional): Can be enhanced with AI (Gemini, Claude, OpenAI).
- π Open Source: MIT Licensed.
Star us on GitHub: artvepa80/Agents-Hefesto
We're actively looking for community feedback on false positive rates. If you run it on your codebase and hit an FP, please open an issue β we take our <5% FP target very seriously!
Top comments (0)