Yesterday, the Center for AI Safety (CAIS) and Scale AI Labs dropped an updated Remote Labor Index (RLI) — and it's the most significant AI automation benchmark we've seen in months.
For the first time, a frontier model has crossed 16% full automation on real, paid remote-work projects. Here's what you need to know.
What Is the Remote Labor Index?
The RLI isn't another multiple-choice test or coding competition leaderboard. It measures whether AI agents can complete real freelance projects — end to end — at a professional standard. We're talking about actual Upwork-style tasks: data entry, graphic design, copywriting, Excel modelling, customer support tickets, and software development.
The latest round evaluated 240 projects across 23 different work domains, with expert human reviewers scoring whether each deliverable was good enough to pay for.
The Numbers That Matter
| Model | Full Automation Rate |
|---|---|
| Claude Fable 5 | 16.1% 🏆 |
| Claude Opus 4.8 | 8.3% |
| GPT-5.5 | 6.3% |
Every model tested scored above every previously evaluated model — the trend is accelerating fast. But Fable 5's result stands out: it's roughly double the next best public model.
What This Actually Means
Sixteen percent might not sound huge, but context matters. The previous RLI leader was below 10%. Doubling in a single generation is a genuine leap. At this rate, extrapolating suggests frontier models could hit 30–50% full automation within 12–18 months.
For developers and companies, the takeaway is clear: AI can now replace a non-trivial slice of remote knowledge work. Not augment — replace. Entirely.
Tasks like:
- Writing production-ready SQL queries from natural language descriptions
- Creating slide decks for investor meetings
- Drafting legal contracts from bullet points
- Building simple web apps from a single prompt
…are now within the "AI can fully do this" bucket for a significant fraction of real-world projects.
The Bigger Picture
This isn't just a Claude milestone. The RLI is a model-agnostic benchmark, and it's telling us that the entire frontier is shifting. Opus 4.8 and GPT-5.5 also crossed thresholds that no model had hit before. Competition is driving all labs upward.
The question is no longer if AI will automate knowledge work — it's how fast.
Want to dig into the raw data? Check out the CAIS blog post and the RLI leaderboard.
Top comments (0)