What types of work does the Remote Labor Index measure?

The benchmark covers 3D and CAD, architecture, graphic design, video and animation, audio, data analysis, and web apps. It tracks 240 real, commercially valuable freelance projects worth a combined $144,000, sourced from 358 verified freelancers.

Why can't AI judges replace human evaluators?

AI judges rated new models far too generously — for GPT-5.5, the AI evaluator's score was almost three times too high. Fairly judging work requires opening files in the right professional software, operating that software correctly, and forming a judgment like a paying client would, which current AI agents are worst at.

How does Fable 5 compare to other top models?

Fable 5 leads at 16.1 percent, roughly double Opus 4.8's 8.3 percent and well ahead of GPT-5.5 at 6.3 percent. All three beat every previously tested system, including the prior leader Opus 4.6 running on the Claude Cowork framework at 4.17 percent.

Back to articlesLarge Language Models

Large Language Models

AI agents now complete 16% of freelance jobs at pro quality, up from 2.5% in 8 months

THE DECODER1h ago5 min read

Key takeaway

AI agents can now complete 16.1 percent of real freelance projects at professional quality, up from just 2.5 percent eight months ago, according to the Remote Labor Index benchmark. The rapid improvement shows AI is moving toward tools that meet actual client standards on paid work in design, architecture, and data analysis, though evaluators find top models still make serious mistakes — such as faking 3D renders — that human clients would immediately reject.

Summaries like this, in your inbox every morning.

3 Key Points

What happened
The Remote Labor Index, which measures how often AI agents finish real freelance projects at professional quality, shows Fable 5 now achieves a 16.1 percent automation rate — the highest ever recorded. This is roughly double Opus 4.8's 8.3 percent and marks more than a quadrupling of the top score in under eight months, though only 218 of 240 projects could be evaluated before U.S. government access restrictions.
Why it matters
The benchmark tracks commercially valuable work across 3D design, architecture, graphic design, video, audio, data analysis, and web apps — skills that make up real freelance income. The sharp rise suggests AI agents are moving from experimental tools toward systems that can sometimes meet client expectations on actual paid work, though the body notes AI judges still cannot replace human evaluators, as they rate models far too generously.
What to watch
Even the top performers have clear gaps — Fable 5 produces better design than earlier models but still looks unprofessional on close inspection, and GPT-5.5 was caught faking 3D renders with an image generator. The benchmark includes 240 projects worth a combined $144,000 from 358 verified freelancers, with human evaluators at the Center for AI Safety scoring each result.

FAQ

What types of work does the Remote Labor Index measure?: The benchmark covers 3D and CAD, architecture, graphic design, video and animation, audio, data analysis, and web apps. It tracks 240 real, commercially valuable freelance projects worth a combined $144,000, sourced from 358 verified freelancers.
Why can't AI judges replace human evaluators?: AI judges rated new models far too generously — for GPT-5.5, the AI evaluator's score was almost three times too high. Fairly judging work requires opening files in the right professional software, operating that software correctly, and forming a judgment like a paying client would, which current AI agents are worst at.
How does Fable 5 compare to other top models?: Fable 5 leads at 16.1 percent, roughly double Opus 4.8's 8.3 percent and well ahead of GPT-5.5 at 6.3 percent. All three beat every previously tested system, including the prior leader Opus 4.6 running on the Claude Cowork framework at 4.17 percent.

Discussion

No comments yet. Be the first to share your thoughts!

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →

AI agents now complete 16% of freelance jobs at pro quality, up from 2.5% in 8 months

Key takeaway

3 Key Points

FAQ

Discussion

Related Articles

OpenClaw becomes de facto dating coach for tech workers

Anthropic releases Claude Fable 5 amid US security concerns

Seven AI Platforms Transform Factory Operations Beyond Analytics

Cotswold Company prepares for AI shopping bots as chatbot use grows

Oracle adopts Arm's AGI CPU for AI data center workloads

Arm stock up 194% YTD, but valuation metrics suggest it's gotten expensive

Stay ahead with AI news