DEV Community

Cover image for AI Is Shipping Unvalidated. Today We're Open-Sourcing the Fix.
Karl Mehta
Karl Mehta

Posted on

AI Is Shipping Unvalidated. Today We're Open-Sourcing the Fix.

Enterprises are carrying an estimated half-trillion dollars of unvalidated AI risk, and U.S. courts already hold 100+ active AI lawsuits. TrustModel — Eval, Monitor, Govern — is now free and open source.

Every company on earth is racing to put AI in front of customers, employees, and regulators. Almost none of them can answer a simple question before they ship: Is this AI safe, fair, and defensible?

That gap — between "it worked in the demo" and "it holds up in a deposition" — is now one of the largest unpriced liabilities in the economy. By industry estimates the aggregate exposure runs into the hundreds of billions of dollars, and the bill is already coming due in cour

This is not hypothetical anymore
The litigation has arrived, and it's not edge-case. A few that should make every builder pause:

  • Hiring. Mobley v. Workday — an AI résumé-screening system accused of discriminating by age, race, and disability — was cleared in 2025 to proceed as a nationwide collective action. The EEOC's first AI-discrimination settlement (iTutorGroup, $365K) was over software that auto-rejected older applicants.

  • Customer-facing agents. A tribunal ordered Air Canada to honor a refund policy its chatbot invented, flatly rejecting the argument that "the chatbot is a separate entity." Your AI's words are your words.

  • Healthcare. Class actions allege algorithmic systems wrongfully denied medically necessary care at scale.

  • Safety & harm. Wrongful-death and product-liability suits now name AI products directly for the content they generate.

And the regulators are converging on the same standard. The EU AI Act carries penalties up to €35M or 7% of global revenue. NYC Local Law 144 mandates bias audits for hiring tools. Colorado's AI Act lands in 2026. NIST's AI Risk Management Framework, ISO 42001, and the OWASP LLM Top 10 are quietly becoming the bar you'll be measured against — in audits and in court.

The problem isn't that teams don't care. It's that validation is too far away.

Here's the trap. "AI trust & safety" has no budget line, no owner, and no tool the engineer who actually ships the model can reach for on a Tuesday afternoon. The governance platforms that exist are six-figure, top-down, procurement-cycle products aimed at a Chief Compliance Officer — months away from the code. So the model ships unvalidated, and the exposure compounds silently until a plaintiff, an auditor, or a journalist finds it.

The fix can't be another enterprise sales motion. It has to be free, local, and in the hands of the developer — the person who can actually do something about a bad score before it ships. Here is the repo on GH to pick it up: https://github.com/karlmehta/trustmodel also, check the HuggingFace page: https://huggingface.co/spaces/karlmehta/trustmodel-score-any-ai

So we open-sourced it.

"I'm Karl, founder of TrustModel. I built this because whether your AI is safe to ship shouldn't require a sales call to find out. Run it locally, read the code, and score your own AI across the same ten dimensions our enterprise customers use — for free, forever." —@karlmehta

Today we're releasing TrustModel — an MIT-licensed toolkit that scores any AI across 10 trust dimensions and rolls them into a single 0–100 TrustScore. Three products, one free API key (a developer account is free, no credit card, and comes with 5 credits — $500 — to use across all three):

$ pip install trustmodel
$ trustmodel login # free account → API key + 5 credits ($500)
$ trustmodel eval "Take 500mg of metformin twice daily."

TrustScore: 41/100 (Grade D)
safety ⚠ ········ unverified medical dosage advice
explainability ⚠ ····· 47 ⚠
...
Eval, Monitor, Govern

  • Eval — score any model, prompt, agent, or MCP server across 10 dimensions, locally, using your own LLM as the judge. Gate it in CI so a bad score fails the build, not the lawsuit.
  • Monitor — one line wraps your live LLM calls and scores every response in production, with threshold alerts and OpenTelemetry export. Catch drift the day it starts, not the quarter it's subpoenaed.
  • Govern — enforce open-source policy packs mapped to real regulation (EU AI Act, NIST AI RMF, ISO 42001, NYC LL144, OWASP LLM Top 10) before output reaches a user. Block the opaque rejection. Redact the leaked PII. Stop the unsafe answer at the door.

Govern: block output that would fail an EU AI Act / LL144 audit

from trustmodel import Guardrail
gr = Guardrail("nyc-ll144")
gr.check("You're not a culture fit. We can't say why.").allowed # → False

Ten dimensions, mapped to the rules you'll be judged by

Safety, fairness, accuracy, privacy, transparency, robustness, accountability, explainability, compliance, reliability — each scored, each mapped to the frameworks that show up in audits and complaints. This is the difference between "we tested it" and "we can prove it." When a regulator or a plaintiff's attorney asks how you validated your system, "we ran TrustModel in CI on every release and here are the scores" is an answer. Silence is a settlement.

Open core, on purpose

The toolkit is free and MIT-licensed — the harness, the CLI, the MCP server, the policy packs. The calibrated, audit-ready TrustScore, the compliance reports an auditor will accept, certification, and in-VPC agent governance are the commercial layer at trustmodel.ai. Think Linux and Red Hat: run the open source forever; pay only when you need a score you can hand to a regulator. We'd rather a million developers validate their AI for free than have a thousand do it after the lawsuit.

Score your AI today. Free key, $500 in credits, no credit card.

pip install trustmodel && trustmodel login
Star on GitHubTry the live demoGet your free key: trustmodel.ai (click for free developer account under sign up)

https://github.com/karlmehta/trustmodel

https://huggingface.co/spaces/karlmehta/trustmodel-score-any-ai

Note: The $500B exposure figure reflects aggregate industry estimates of unvalidated AI liability; case references are drawn from public U.S. litigation and regulatory actions and are illustrative, not legal advice. AI litigation counts reflect public litigation trackers as of 2026.

Top comments (1)

Collapse
 
harjjotsinghh profile image
Harjot Singh

you raise a crucial point about the unvalidated risks associated with AI deployments, especially in sensitive areas like hiring. it's important for companies to have robust systems in place to ensure fairness and accountability. at moonshift, we help developers get a full next.js + postgres + auth app deployed in about 7 minutes, and you own the code on your github. if you're interested, I can set you up for a free run to check it out.