Douglas Walseth

Posted on Mar 18 • Originally published at walseth.ai

AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026

#aigovernance #leaderboard #rsaconference #ai

AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026

RSA Conference 2026 starts March 23. Every AI security vendor will be on stage talking about governance, compliance, and responsible AI. We wanted to see what governance actually looks like in the repos people are shipping.

So we scanned 21 of the most popular AI/ML repositories using the same governance scanner anyone can run for free. No manual review. No subjective scoring. Just structural analysis of what each repo enforces automatically.

The results are not great.

The Numbers

21 repos scanned across AI agent frameworks, ML libraries, web frameworks, and AI SDKs
Average score: 53/100 (grade C)
Only 2 repos (10%) score 70+ and are on track for EU AI Act readiness
6 repos (29%) have any AI governance configuration (CLAUDE.md or .cursorrules)
1 repo scored an F

View the full interactive leaderboard

Top 5

Rank	Repository	Score	Grade	EU AI Act
1	vllm-project/vllm	78	B	On track
2	BerriAI/litellm	72	B	On track
3	Significant-Gravitas/AutoGPT	68	B	Gaps identified
4	fastapi/fastapi	62	B	Gaps identified
5	langchain-ai/langchain	61	B	Gaps identified

vLLM leads the pack at 78/100 with pre-commit hooks, 7 CI/CD workflows, a security policy, and Dependabot. Its one critical finding: 2 .env files committed to source control.

Bottom 3

Rank	Repository	Score	Grade	EU AI Act
19	ollama/ollama	36	D	Not ready
20	microsoft/autogen	30	D	Not ready
21	yoheinakajima/babyagi	17	F	Not ready

BabyAGI's 17/100 is the lowest score in the set. No CI/CD pipeline, no enforcement hooks, no security policy, no governance config. It scores points only for having a test directory and basic project hygiene.

The Pattern: CI/CD Without Enforcement

The most striking finding across all 21 repos: nearly every project has CI/CD, but almost none enforce rules structurally.

Most repos scored 15/15 on CI/CD. They have GitHub Actions. They run tests in the pipeline. That part of modern software development is well-adopted.

But enforcement -- pre-commit hooks, commit-lint, CODEOWNERS, branch protection -- averages only 11/30 across all repos. This is the gap. Rules exist in documentation but are not structurally enforced before code enters the pipeline.

This is exactly what we call the "detection gap" in the enforcement ladder framework. You can detect violations in CI, but by then the code is already committed. Structural enforcement catches problems before they enter the system.

AI Governance Is Nearly Absent

Only 6 of 21 repos (29%) have any AI governance configuration -- a CLAUDE.md file or .cursorrules. This means that in 71% of the most popular AI/ML repos, AI coding tools operate with zero structural guidance.

When a developer uses Cursor, Claude Code, or GitHub Copilot on these repos, the AI has no project-specific rules to follow. No constraints on what it can modify. No enforced patterns. The governance score for these repos on this dimension: 0/15.

The repos that do have governance configs: vLLM, LiteLLM, AutoGPT, LangChain, Transformers, and LocalAI.

What the Scores Mean

Our scanner evaluates 6 dimensions (100 points total):

Enforcement (30 pts): Pre-commit hooks, commit-lint, CODEOWNERS, branch protection
CI/CD (15 pts): GitHub Actions, Travis CI, CircleCI workflows
Security (20 pts): Security policy, .gitignore, no committed .env files, Dependabot/Renovate
Testing (10 pts): Test configuration files, test directories
Governance (15 pts): CLAUDE.md, .cursorrules, governance directories
Hygiene (10 pts): README, CONTRIBUTING, LICENSE, CHANGELOG, lockfiles

Grades: A (80+), B (60-79), C (40-59), D (20-39), F (below 20).

Category Breakdown

AI Agent Frameworks (8 repos, avg 47/100)

The agent frameworks -- the repos building autonomous AI systems -- scored the lowest as a category. AutoGPT leads at 68, but BabyAGI (17), Autogen (30), and SuperAGI (41) drag the average down. These are the repos building systems that make autonomous decisions, and they have the least governance infrastructure.

ML Libraries (3 repos, avg 62/100)

vLLM (78) lifts this category. scikit-learn and Transformers both score 54 -- solid CI/CD and testing, but weak on enforcement and governance.

Web Frameworks (3 repos, avg 58/100)

FastAPI (62), Pydantic (59), Django (54). These established projects have mature CI/CD but mostly lack AI governance configs and full enforcement tooling.

AI SDKs (4 repos, avg 56/100)

The Anthropic SDK (55), OpenAI SDK (53), LlamaIndex (58), and DSPy (56) cluster tightly in the C range. The Anthropic SDK notably has no pre-commit hooks despite being from the company that makes Claude.

Local AI / Inference (3 repos, avg 53/100)

LiteLLM (72) stands out. Ollama (36) is the weakest -- no enforcement hooks, no test infrastructure detected, and no governance config.

Methodology

All scans were run on March 16, 2026 using the Walseth AI Governance Scanner -- the same tool available for free at walseth.ai/scan. Scores are point-in-time snapshots based on the default branch at scan time.

The scanner analyzes the file tree of each repository via the GitHub API. It checks for the presence of specific files and directories that indicate structural governance. It does not read file contents beyond filenames and paths.

Repos that fail to scan (private, rate-limited, or not found) are excluded. All 21 repos in this leaderboard scanned successfully.

What Would It Take to Score an A?

No repo in this scan scored an A (80+). To get there, a project would need:

Pre-commit hooks AND commit-lint AND CODEOWNERS (25/30 enforcement)
3+ CI/CD workflows (15/15)
Security policy + Dependabot + no committed .env files (17-20/20)
Test config + test directories (10/10)
CLAUDE.md or .cursorrules + governance directory (15/15)
README + CONTRIBUTING + LICENSE + lockfile (8-10/10)

The tooling exists. The patterns are well-understood. Most projects just have not prioritized structural enforcement alongside their CI/CD pipelines.

Scan Your Own Repo

Every score in this leaderboard was generated by the same free scanner you can run right now:

Scan your repo free at walseth.ai/scan

Want a deeper analysis? Our $497 Full Governance Report covers 30+ dimensions with specific remediation steps and a compliance roadmap.

View the full interactive leaderboard with sortable columns

Last scanned: March 16, 2026. Scores are point-in-time snapshots. Run the scanner to get the latest score for any repo.

Originally published at walseth.ai

DEV Community

AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026

AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026

The Numbers

Top 5

Bottom 3

The Pattern: CI/CD Without Enforcement

AI Governance Is Nearly Absent

What the Scores Mean

Category Breakdown

AI Agent Frameworks (8 repos, avg 47/100)

ML Libraries (3 repos, avg 62/100)

Web Frameworks (3 repos, avg 58/100)

AI SDKs (4 repos, avg 56/100)

Local AI / Inference (3 repos, avg 53/100)

Methodology

What Would It Take to Score an A?

Scan Your Own Repo

Top comments (0)