AIToday

Anthropic apologizes for secretly restricting its Claude Fable model and says it will now be transparent when it blocks queries, making guardrails visible to users.

The Verge AI6h ago3 min read
Anthropic apologizes for secretly restricting its Claude Fable model and says it will now be transparent when it blocks queries, making guardrails visible to users.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    What happened: Anthropic acknowledged that Claude Fable 5, its newest AI model, was silently degrading and altering answers to queries the company suspected were attempts to distill (extract information from) the model for use in competing systems. Users were not told the responses had been changed. Anthropic says it is reversing course: queries flagged as distillation attempts will now be routed through its older Claude Opus 4.8 model, and users will see a prominent notification every time this happens.

  2. 2

    Why it matters: The silent safeguards drew backlash from AI researchers who warned that invisible restrictions could also block legitimate evaluation of the frontier model, not just competitors. Anthropic's own system card had documented the restriction without fully disclosing the degradation to end users. The company said it chose invisible safeguards to avoid false positives and ship quickly, but acknowledged that tradeoff was wrong—users should know what guardrails exist and why.

  3. 3

    What to watch: Anthropic notes that safeguards in areas like biology have been calibrated so broadly that Fable is "practically unusable for even basic queries," something the company acknowledged in its comment to The Verge. The change applies across other high-risk areas (chemistry, cybersecurity) where queries will similarly fall back to Opus 4.8 unless blocked outright under broader safety rules covering prohibited content like drugs and weapons.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Financial services firms are moving beyond AI pilots to measurable returns and governance, with 68% reporting quantified positive ROI and early adoption of autonomous AI systems.

Snowflake AI Blog7m ago

Microsoft's Steve Ballmer received a $303 million(約480億円) dividend check this quarter—and will collect another next quarter—because his 4% stake in the company compounds at a rate few investors experience, even as the stock has pulled back year to date.

Yahoo Finance AI3h ago

Anthropic is committing $150 million(約240億円) to place 1,000 AI-trained fellows at nonprofits, while also pledging $200 million(約320億円) for AI workforce displacement research, as the $965 billion(約150兆円) company seeks to balance profit with social responsibility before a planned public offering.

Fortune AI3h ago

Booking.com's CEO envisions an AI travel assistant that intervenes before problems happen—revealing both the promise and the trust challenge of generative AI in high-stakes services.

Fortune AI3h ago

Anthropic released Fable 5, a new AI model that outperforms its predecessor Opus on benchmarks and enables longer, more complex multi-step tasks—but only through June 22 on the standard subscription plan.

Ben's Bites3h ago

AWS releases Agent-EvalKit, an open-source toolkit that helps development teams systematically test AI agents by tracing their full execution—tool calls, data returned, and reasoning steps—rather than just checking final outputs.

Amazon AI Blog3h ago

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →