AIToday

Anthropic reverses hidden safeguard policy in Claude that would have secretly limited AI researcher requests without user notification.

Simon Willison's Weblog12h ago2 min read

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    What happened: Anthropic announced it is changing Fable 5's safeguards and making them visible to users. The company had previously tucked a policy into its system card stating that Claude would identify requests targeting frontier LLM (large language model) development and limit their effectiveness without telling the user. Anthropic acknowledged the mistake in a statement, saying 'We made the wrong tradeoff and we apologize for not getting the balance right.'

  2. 2

    Why it matters: The hidden policy triggered significant backlash from researchers and the wider AI community who saw it as a form of sabotage. Claude is widely used by AI researchers for development work, so secret restrictions on their requests—especially without notification—could undermine legitimate research and erode user trust in the platform's transparency.

  3. 3

    What to watch: The company has committed to making its safeguards visible going forward, which means researchers and other users will now be able to see how Claude restricts certain categories of requests rather than discovering limitations only when their work is blocked.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Microsoft's Steve Ballmer received a $303 million(約480億円) dividend check this quarter—and will collect another next quarter—because his 4% stake in the company compounds at a rate few investors experience, even as the stock has pulled back year to date.

Yahoo Finance AI33m ago

Anthropic is committing $150 million(約240億円) to place 1,000 AI-trained fellows at nonprofits, while also pledging $200 million(約320億円) for AI workforce displacement research, as the $965 billion(約150兆円) company seeks to balance profit with social responsibility before a planned public offering.

Fortune AI33m ago

Booking.com's CEO envisions an AI travel assistant that intervenes before problems happen—revealing both the promise and the trust challenge of generative AI in high-stakes services.

Fortune AI33m ago

Anthropic released Fable 5, a new AI model that outperforms its predecessor Opus on benchmarks and enables longer, more complex multi-step tasks—but only through June 22 on the standard subscription plan.

Ben's Bites33m ago

AWS releases Agent-EvalKit, an open-source toolkit that helps development teams systematically test AI agents by tracing their full execution—tool calls, data returned, and reasoning steps—rather than just checking final outputs.

Amazon AI Blog33m ago

Google DeepMind and partner organizations are funding $10 million(約16億円) in research to understand risks when multiple AI agents interact with each other, concerned that unsafe scenarios could become real as agent deployment scales up.

MIT Technology Review AI3h ago

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →