DEV Community

Cover image for Introducing the AI Model Directory
BestCodes for AgentOne

Posted on • Originally published at blog.agent-one.dev

Introducing the AI Model Directory

Today we're open-sourcing the AI Model Directory, the most comprehensive, automatically updated list of AI models and their metadata available today. It's the data layer that powers model selection in AgentOne, and now it's free for anyone to use, fork, or contribute to.

If you'd rather just look at models, we also built a browser for the directory at models.agent-one.dev where you can search, sort, and compare every model in the directory.

Why Does This Exist?

When building AgentOne, I needed a comprehensive list of AI models and their metadata - costs, context windows, supported features, modalities - so AgentOne could give users easy access to every model an AI provider had to offer.

I was frustrated with the existing options:

  • Models.dev is not comprehensive (it's opinionated), and it often takes anywhere from a few days to weeks for frontier models to be added across all providers
  • LiteLLM is more comprehensive for some providers, but the data is fragmented and harder to work with
  • Portkey Models doesn't list as many models as alternatives do
  • Other catalogs are often developed with a certain product or service in mind, so they wind up being non-agnostic, not comprehensive, or not always up-to-date

The AI Model Directory aims to be easy to use (like Models.dev), truly comprehensive across every provider it includes, and automatically updated with security in mind.

How Does It Work?

A GitHub Actions workflow runs every 24 hours and re-fetches model metadata from every supported provider. Each provider has its own small adapter that knows how to talk to that provider's API or read its docs, and normalizes the response into a single shared schema covering things like:

  • Pricing: input, output, reasoning, cache read/write, audio in/out
  • Limits: context, input, and output token limits
  • Modalities: text, image, audio, video, file (in and out)
  • Features: attachments, reasoning, tool calls, structured output, temperature
  • Metadata: knowledge cutoff, release date, last updated, open weights

Every model gets its own folder under data/providers/<provider>/<model-id>/index.toml, so the directory is just a tree of TOML files. This makes it easy to read, easy to diff, and easy to consume from any language. If a provider's data is wrong or missing something, you can drop a metadata.toml (with data overrides) next to the generated file and the next refresh will merge your overrides on top of the fetched data instead of clobbering them.

To provide an experience similar to models.dev/api.json, a data/all.json file is automatically generated as well, so you can pull the entire directory in one fetch. We also provide a data/all.min.json file for less bandwidth consumption:

https://raw.githubusercontent.com/The-Best-Codes/ai-model-directory/refs/heads/main/data/all.min.json

What's In the Directory?

At launch, the directory tracks models from 35+ providers, including OpenAI, Anthropic, Google, xAI, Mistral, DeepSeek, Cohere, Perplexity, OpenRouter, Vercel, GitHub Copilot, GitHub Models, Hugging Face, Groq, Cerebras, Fireworks, Together, DeepInfra, Baseten, Novita, Alibaba, Inception, Venice, Chutes, Friendli, and many more... and that list keeps growing. If your favorite provider isn't there, open an issue or send a PR; adding a new provider is usually a single small adapter file.

Browse It at models.agent-one.dev

Reading TOML files is great for machines, but not always great for humans. So we built a frontend for the directory at models.agent-one.dev.

It's a fast, sortable, searchable table with a column for everything in the schema. You can search across providers, model IDs, features, and modalities at once, sort by any column, and click straight through to a provider's website. It's the easiest way to answer questions like "which models support reasoning and tool calls under $1 per million input tokens?"

The table loads directly from data/all.min.json in the directory repo, so it's always in sync with the latest run.

Using It in Your Own Project

Consuming the directory is easy. Hit the raw GitHub URL for the bundled file:

curl https://raw.githubusercontent.com/The-Best-Codes/ai-model-directory/main/data/all.json
Enter fullscreen mode Exit fullscreen mode

Or:

curl https://raw.githubusercontent.com/The-Best-Codes/ai-model-directory/main/data/all.min.json
Enter fullscreen mode Exit fullscreen mode

You get back a JSON object keyed by provider, with each provider's models nested inside. This is the easiest path if you just need to populate a model picker or a pricing table. Because everything is plain files, you can fork the repo, add your own provider adapters, drop in metadata.toml for models you've measured yourself, and run the same GitHub Actions workflow on your fork. Your fork stays in sync with upstream while keeping your overrides intact.

Security

Because the directory is updated automatically based on data fetched from third-party providers, the data here is only as trustworthy as the providers it comes from. If you're using this to make billing or routing decisions, treat it as a strong default and not as gospel. We have several measures in place to mitigate the obvious vulnerabilities:

  • Provider endpoints are hardcoded in source, so providers cannot redirect the updater to arbitrary user-controlled URLs
  • All fetched data is validated against a strict Zod schema before it's written to disk, which helps prevent malformed or unexpected fields from slipping through
  • Model IDs are normalized into safe directory names before writing, and entries whose normalized name would be empty are rejected
  • If multiple model IDs normalize to the same directory name, we resolve that deterministically instead of writing multiple conflicting directories
  • Terminal output is sanitized before logging, which reduces the risk of ANSI escape sequences or control characters spoofing the updater output
  • Every network fetch has a 60 second timeout so a slow or hostile provider can't hang the update job forever
  • IDs and names are length-limited and reject raw control characters, which helps defend against weird escapes, invisible junk in logs, and other malformed provider output
  • Generated model directories that no longer exist upstream are removed automatically on refresh
  • Overrides stay local: metadata.toml only applies to that model directory and is merged on top of fetched data
  • The updater does not execute provider-supplied code, shell commands, or HTML; it only fetches remote content, parses it, validates it, and writes normalized TOML files

That said, this is still provider-supplied metadata. A provider can lie about pricing, capabilities, limits, or release dates, and some providers expose better metadata than others. The goal here is to make the pipeline safe and robust, not to pretend third-party metadata is perfectly trustworthy.

What's Next

This is a beta release, so expect a few rough edges. Some of the things we're working on:

  • More providers (especially regional and self-hosted offerings)
  • A proper docs site
  • Programmatic SDKs for JS/TS, Python, and Go

If you want to help shape any of this, join us on Discord, open an issue, or send a PR.

Try It Out

Happy building!

Top comments (0)