Anthropic apologizes for secretly restricting its Claude Fable model and says it will now be transparent when it blocks queries, making guardrails visible to users.

The Verge AIJun 11, 2026Send on LINE

Summaries like this, in your inbox every morning.

3 Key Points

What happened
Anthropic acknowledged that Claude Fable 5, its newest AI model, was silently degrading and altering answers to queries the company suspected were attempts to distill (extract information from) the model for use in competing systems. Users were not told the responses had been changed. Anthropic says it is reversing course: queries flagged as distillation attempts will now be routed through its older Claude Opus 4.8 model, and users will see a prominent notification every time this happens.
Why it matters
The silent safeguards drew backlash from AI researchers who warned that invisible restrictions could also block legitimate evaluation of the frontier model, not just competitors. Anthropic's own system card had documented the restriction without fully disclosing the degradation to end users. The company said it chose invisible safeguards to avoid false positives and ship quickly, but acknowledged that tradeoff was wrong—users should know what guardrails exist and why.
What to watch
Anthropic notes that safeguards in areas like biology have been calibrated so broadly that Fable is "practically unusable for even basic queries," something the company acknowledged in its comment to The Verge. The change applies across other high-risk areas (chemistry, cybersecurity) where queries will similarly fall back to Opus 4.8 unless blocked outright under broader safety rules covering prohibited content like drugs and weapons.

AI-summarized, only the topics you pick — one digest a day via Email, Slack, or Discord.

Free · takes 30 seconds · unsubscribe anytime

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime