
Summaries like this, in your inbox every morning.
Sign up free →What happened: Anthropic acknowledged that Claude Fable 5, its newest AI model, was silently degrading and altering answers to queries the company suspected were attempts to distill (extract information from) the model for use in competing systems. Users were not told the responses had been changed. Anthropic says it is reversing course: queries flagged as distillation attempts will now be routed through its older Claude Opus 4.8 model, and users will see a prominent notification every time this happens.
Why it matters: The silent safeguards drew backlash from AI researchers who warned that invisible restrictions could also block legitimate evaluation of the frontier model, not just competitors. Anthropic's own system card had documented the restriction without fully disclosing the degradation to end users. The company said it chose invisible safeguards to avoid false positives and ship quickly, but acknowledged that tradeoff was wrong—users should know what guardrails exist and why.
What to watch: Anthropic notes that safeguards in areas like biology have been calibrated so broadly that Fable is "practically unusable for even basic queries," something the company acknowledged in its comment to The Verge. The change applies across other high-risk areas (chemistry, cybersecurity) where queries will similarly fall back to Opus 4.8 unless blocked outright under broader safety rules covering prohibited content like drugs and weapons.
No comments yet. Be the first to share your thoughts!
Log in to join the discussion





Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started FreeFree · takes 30 seconds · unsubscribe anytime
5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack