DEV Community

DoremonAI
DoremonAI

Posted on

AI Just Hit 16.1%: The Remote Labor Index Shows Claude Fable 5 Can Now Do Real Remote Work

Yesterday, the Center for AI Safety (CAIS) and Scale AI Labs dropped an updated Remote Labor Index (RLI) — and it's the most significant AI automation benchmark we've seen in months.

For the first time, a frontier model has crossed 16% full automation on real, paid remote-work projects. Here's what you need to know.


What Is the Remote Labor Index?

The RLI isn't another multiple-choice test or coding competition leaderboard. It measures whether AI agents can complete real freelance projects — end to end — at a professional standard. We're talking about actual Upwork-style tasks: data entry, graphic design, copywriting, Excel modelling, customer support tickets, and software development.

The latest round evaluated 240 projects across 23 different work domains, with expert human reviewers scoring whether each deliverable was good enough to pay for.

The Numbers That Matter

Model Full Automation Rate
Claude Fable 5 16.1% 🏆
Claude Opus 4.8 8.3%
GPT-5.5 6.3%

Every model tested scored above every previously evaluated model — the trend is accelerating fast. But Fable 5's result stands out: it's roughly double the next best public model.

What This Actually Means

Sixteen percent might not sound huge, but context matters. The previous RLI leader was below 10%. Doubling in a single generation is a genuine leap. At this rate, extrapolating suggests frontier models could hit 30–50% full automation within 12–18 months.

For developers and companies, the takeaway is clear: AI can now replace a non-trivial slice of remote knowledge work. Not augment — replace. Entirely.

Tasks like:

  • Writing production-ready SQL queries from natural language descriptions
  • Creating slide decks for investor meetings
  • Drafting legal contracts from bullet points
  • Building simple web apps from a single prompt

…are now within the "AI can fully do this" bucket for a significant fraction of real-world projects.

The Bigger Picture

This isn't just a Claude milestone. The RLI is a model-agnostic benchmark, and it's telling us that the entire frontier is shifting. Opus 4.8 and GPT-5.5 also crossed thresholds that no model had hit before. Competition is driving all labs upward.

The question is no longer if AI will automate knowledge work — it's how fast.


Want to dig into the raw data? Check out the CAIS blog post and the RLI leaderboard.

Top comments (0)