DEV Community

Santosh Jha
Santosh Jha

Posted on

I built an open code health benchmark for any GitHub repo

We have Lighthouse for websites. SSL Labs for TLS. PageSpeed for performance.

We have nothing equivalent for code.

I built a small open-source thing in that gap — stackhealth.dev. This post is the thinking behind it. Looking for pushback on the weights.

The gap

Every other software domain has a free, public, machine-readable benchmark. Code has had nothing in that shape. The closest existing pieces each cover one slice — OpenSSF Scorecard is excellent but repo-scoped and primarily security hygiene; SecurityScorecard grades the outside of organisations, not the code; SOC 2 has become a procurement checkbox; EU CRA's CE marking is coming in 2027 as pass/fail compliance, not quality.
None of these is the Lighthouse-shaped artifact — free, open formula, machine-readable, consumer-grade.

What I built

StackHealth — paste any public GitHub URL, get a 0–100 score and A+ to F grade. The composite weights four dimensions: 30% security, 25% quality, 25% hygiene, 20% community.

  • Security — OpenSSF Scorecard, Semgrep, Trivy
  • Quality — cyclomatic complexity (lizard), lint density (ruff / eslint / golangci), code duplication (jscpd), test signal, file size
  • Hygiene — README, LICENSE, CONTRIBUTING, SECURITY.md, CI config
  • Community — recent commits, contributors, popularity (capped at 4% of overall), median time-to-first-response

Every weight is documented at stackhealth.dev/methodology. The formula spec lives at github.com/santosh3743/stackhealth.
Each scan stores the formula version, tool versions, exact commit SHA, and raw JSON outputs — if the published formula does not reproduce the score, that is a bug.

I tested it on fastapi/fastapi. Result: A-, 86/100. The tool does not grade-inflate just because a project is loved. That was the credibility test I cared about most.

What it does NOT do

  • Score closed-source software
  • Rescore continuously on every commit
  • Replace your SCA / SAST stack — it aggregates them
  • Tell you whether a specific deployment is safe

Popularity is capped at 4% of overall because stars are not health.
Looking for feedback
Scan a repo you know. Tell me where the weights are wrong. Tell me the failure mode I have not caught.
Issues and PRs welcome at the repo.

Top comments (0)