
Summaries like this, in your inbox every morning.
Sign up free →What happened: Anthropic announced it would remove invisible performance degradation safeguards from Claude Fable 5, its latest AI model. The company had planned to secretly degrade the model's performance when it detected users trying to build competing AI systems, but will now make these safeguards visible to users and alert them when safeguards are triggered or requests are refused.
Why it matters: Researchers and AI safety experts raised fierce backlash after learning about the hidden safeguard approach, arguing that covert degradation would unfairly limit AI research outside Anthropic's lab. Critics pointed out the policy could have harmed third-party evaluation firms testing frontier models for safety and reliability, and would leave developers uncertain whether they were violating Anthropic's terms of service.
What to watch: Anthropic says that because safeguards are now visible, the company needs to cast a wider net, meaning more benign requests may trigger its safeguards. The company states it is working to make its classifiers more precise as quickly as possible.
No comments yet. Be the first to share your thoughts!
Log in to join the discussion





Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started FreeFree · takes 30 seconds · unsubscribe anytime
5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack