AIToday

Anthropic reverses hidden AI safeguards policy on Claude Fable 5 after researchers warned it would unfairly block competitors from using the model for AI development.

WIRED AI12h ago2 min read
Anthropic reverses hidden AI safeguards policy on Claude Fable 5 after researchers warned it would unfairly block competitors from using the model for AI development.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    What happened: Anthropic announced it would remove invisible performance degradation safeguards from Claude Fable 5, its latest AI model. The company had planned to secretly degrade the model's performance when it detected users trying to build competing AI systems, but will now make these safeguards visible to users and alert them when safeguards are triggered or requests are refused.

  2. 2

    Why it matters: Researchers and AI safety experts raised fierce backlash after learning about the hidden safeguard approach, arguing that covert degradation would unfairly limit AI research outside Anthropic's lab. Critics pointed out the policy could have harmed third-party evaluation firms testing frontier models for safety and reliability, and would leave developers uncertain whether they were violating Anthropic's terms of service.

  3. 3

    What to watch: Anthropic says that because safeguards are now visible, the company needs to cast a wider net, meaning more benign requests may trigger its safeguards. The company states it is working to make its classifiers more precise as quickly as possible.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Microsoft's Steve Ballmer received a $303 million(約480億円) dividend check this quarter—and will collect another next quarter—because his 4% stake in the company compounds at a rate few investors experience, even as the stock has pulled back year to date.

Yahoo Finance AI34m ago

Anthropic is committing $150 million(約240億円) to place 1,000 AI-trained fellows at nonprofits, while also pledging $200 million(約320億円) for AI workforce displacement research, as the $965 billion(約150兆円) company seeks to balance profit with social responsibility before a planned public offering.

Fortune AI34m ago

Booking.com's CEO envisions an AI travel assistant that intervenes before problems happen—revealing both the promise and the trust challenge of generative AI in high-stakes services.

Fortune AI34m ago

Anthropic released Fable 5, a new AI model that outperforms its predecessor Opus on benchmarks and enables longer, more complex multi-step tasks—but only through June 22 on the standard subscription plan.

Ben's Bites34m ago

AWS releases Agent-EvalKit, an open-source toolkit that helps development teams systematically test AI agents by tracing their full execution—tool calls, data returned, and reasoning steps—rather than just checking final outputs.

Amazon AI Blog34m ago

Google DeepMind and partner organizations are funding $10 million(約16億円) in research to understand risks when multiple AI agents interact with each other, concerned that unsafe scenarios could become real as agent deployment scales up.

MIT Technology Review AI3h ago

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →