DEV Community

Do know about the AI transparency index? you should

AI Transparency index and the numbers are uncomfortable, still you should know

stanford's foundation model transparency index dropped its december 2025 edition and if you build anything on top of these models, you should probably read it.

the mean score dropped 17 points. from 58 to 41. meta down 29, mistral down 37, openai down 14. this is not a documentation problem — these companies have entire policy teams. it's a choice.

a few things that stood out to me:

  • ibm scored 95. first place across all three years. nobody talks about this.
  • open-weight ≠ transparent. deepseek and alibaba release weights and still scored 32 and 26. publishing weights is not the same as being auditable.
  • training data is still a black box everywhere. what they trained on, whether they had licenses, how they handled pii — consistently the worst-scoring subdomain, three years running.
  • anthropic didn't submit a report. the fmti team built one manually. anthropic ranked 2nd. good score, bad signal.

as engineers we're the ones building on top of these systems. when something goes wrong in production, "we didn't disclose how we trained it" is not an answer you can give anyone.

the index doesn't fix that. but it names who's trying to be honest versus who's retreating as market share grows. that's useful signal when choosing what to build on.

Why you should know about the fmti?

most people pick their ai provider based on benchmarks, pricing, or vibes. the foundation model transparency index measures something different: how honest a company is about what they actually built.
that matters more than most engineers realize.
when you integrate a model into a product, you inherit its risks — biased outputs, leaked training patterns, copyright exposure, opaque safety evaluations. you can't audit what was never disclosed. and when something breaks, you're the one explaining it to stakeholders, not the lab.
the fmti gives you a structured way to ask: does this provider tell me enough to reason about what i'm building on?
it's not perfect. scores can be gamed, and disclosure isn't the same as safety. but it's one of the few independent, recurring attempts to hold this industry accountable before regulators do it badly.
if you're doing vendor evaluation, building on llms in a regulated domain, or just tired of treating "trust us" as an architecture decision — this index is worth bookmarking.

Top comments (0)