How I Discovered All ANTHROPIC's 7 AI Models Automatically
TL;DR
- ANTHROPIC has 7 active AI models across 1 families
- 0 strongly classified, 0 partially classified, 7 need metadata review
- Built automated discovery agent: LangGraph + Gemini enrichment + BigQuery
- Discovered on: May 15, 2026
Why This Matters
The ANTHROPIC model ecosystem is massive and fragmented. Choosing between 7 models is overwhelming.
The Discovery Method
I built a LangGraph agent that:
- Tier 1 (Official API): Queries ANTHROPIC's /v1/models endpoint directly (100% verified)
- Enrichment: Uses Gemini API to analyze each of 7 models
- Storage: Persists everything in BigQuery for team access
- Confidence Scoring: Semantic confidence 0-1.0 for each model
All 7 models came directly from the official API (confidence: 1.0).
Model Confidence Distribution
| Confidence Level | Count | What It Means |
|---|---|---|
| Strongly Classified (≥0.8) | 0 | Strong metadata classification signal |
| Partially Classified (0.5-0.8) | 0 | Partial metadata signal |
| Needs Review (<0.5) | 7 | Sparse/ambiguous metadata signal |
| Total | 7 | All ANTHROPIC models |
All 1 Model Families
claude
Strongly Classified Metadata (0)
These 0 entries have strong metadata classification signal:
- Semantic confidence ≥0.8
- Clear purpose/family/capability mapping
- Better downstream filtering fidelity
Metadata Needing Review (7)
These 7 entries have weak metadata classification signal:
- Semantic confidence <0.5
- Ambiguous naming or sparse metadata
- Requires manual review for precise categorization
Key Insights
- Scale: 7 models is a massive ecosystem
- Fragmentation: 1 families provides specialization
- Metadata Quality: Confidence buckets reflect enrichment signal, not model quality
- Automation: Discovering this manually would take weeks
- Versioning: Multiple variants across families
Next Steps
- For Teams: Import this data into your model selection matrix
- For Monitoring: Re-run discovery quarterly to catch new releases
- For Decision-Making: Use classification signal as a metadata quality indicator
- For Production: Evaluate models independently of metadata confidence buckets
The Tool
Source: ANTHROPIC's 7 models from official API
Method: LangGraph agent with Gemini semantic enrichment
Storage: BigQuery
Runtime: Minutes (fully automated)
This is the future: automated, data-driven model ecosystem management.
Data snapshot: May 15, 2026 | Total models: 7 | Families: 1
Tags: AI, LLM, ANTHROPIC, ModelOps, Automation, BigQuery, Gemini
Top comments (1)
Automating model discovery is solving the unglamorous half of the routing problem nobody wants to own: the catalog goes stale constantly (new model drops, an old one deprecates, pricing shifts) and a hand-maintained list is wrong within weeks, so a discovery agent that keeps the inventory current is genuinely useful infrastructure. The part I'd push on is that discovery is step one and classification is where the value is, knowing 7 models exist doesn't help you choose; knowing which one is cheapest-that-clears-the-bar for a given task does. Your TL;DR flags exactly this (7 need metadata review, 0 classified), and that's the hard, ongoing work: capability and cost tags that are accurate enough to route against, which usually means your own task evals, not just the spec sheet, because published benchmarks don't predict your workload. Discovery keeps the menu current; classification turns the menu into a decision. Once you have both, routing is just policy on top. That auto-maintained-catalog-feeding-per-task-routing is exactly how I think about model selection in Moonshift. Are you enriching classification from benchmarks, or planning to score models against your own tasks so the routing is grounded in real performance?