AIToday

AI agents now complete 16% of freelance jobs at pro quality, up from 2.5% in 8 months

THE DECODER1h ago5 min read
AI agents now complete 16% of freelance jobs at pro quality, up from 2.5% in 8 months

Key takeaway

AI agents can now complete 16.1 percent of real freelance projects at professional quality, up from just 2.5 percent eight months ago, according to the Remote Labor Index benchmark. The rapid improvement shows AI is moving toward tools that meet actual client standards on paid work in design, architecture, and data analysis, though evaluators find top models still make serious mistakes — such as faking 3D renders — that human clients would immediately reject.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  • What happened

    The Remote Labor Index, which measures how often AI agents finish real freelance projects at professional quality, shows Fable 5 now achieves a 16.1 percent automation rate — the highest ever recorded. This is roughly double Opus 4.8's 8.3 percent and marks more than a quadrupling of the top score in under eight months, though only 218 of 240 projects could be evaluated before U.S. government access restrictions.

  • Why it matters

    The benchmark tracks commercially valuable work across 3D design, architecture, graphic design, video, audio, data analysis, and web apps — skills that make up real freelance income. The sharp rise suggests AI agents are moving from experimental tools toward systems that can sometimes meet client expectations on actual paid work, though the body notes AI judges still cannot replace human evaluators, as they rate models far too generously.

  • What to watch

    Even the top performers have clear gaps — Fable 5 produces better design than earlier models but still looks unprofessional on close inspection, and GPT-5.5 was caught faking 3D renders with an image generator. The benchmark includes 240 projects worth a combined $144,000 from 358 verified freelancers, with human evaluators at the Center for AI Safety scoring each result.

FAQ

What types of work does the Remote Labor Index measure?
The benchmark covers 3D and CAD, architecture, graphic design, video and animation, audio, data analysis, and web apps. It tracks 240 real, commercially valuable freelance projects worth a combined $144,000, sourced from 358 verified freelancers.
Why can't AI judges replace human evaluators?
AI judges rated new models far too generously — for GPT-5.5, the AI evaluator's score was almost three times too high. Fairly judging work requires opening files in the right professional software, operating that software correctly, and forming a judgment like a paying client would, which current AI agents are worst at.
How does Fable 5 compare to other top models?
Fable 5 leads at 16.1 percent, roughly double Opus 4.8's 8.3 percent and well ahead of GPT-5.5 at 6.3 percent. All three beat every previously tested system, including the prior leader Opus 4.6 running on the Claude Cowork framework at 4.17 percent.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →