Anthropic releases Claude Fable 5 with safeguards that route queries on cybersecurity, biology, and chemistry to an earlier model

Ars Technica AIJun 9, 2026Send on LINE

Summaries like this, in your inbox every morning.

3 Key Points

Anthropic publicly released Claude Fable 5on Tuesday, described as its first 'Mythos-class' model that surpasses previous Claude Opus models in overall capabilities. The model operates on the same underlying model as Mythos 5, which is being released today but only for a small group of cyberdefenders approved through Project Glasswing.
Fable 5 uses classifiers to detect banned topics and potential jailbreak attempts, then routes sensitive queries on cybersecurity, biology, and chemistry to Claude Opus 4.8 while warning the user. In over 1,000 hours of red-team testing with a bug bounty program, external teams failed to find any universal jailbreaks for Fable 5, and the model resisted automated jailbreak attempts much more effectively than previous Claude Opus models.
Anthropic tuned the safeguards to be 'stricter than ideal,' acknowledging that false positives occur in less than five percent of all sessions in testing. The company prioritizes preventing situations where the model could give malicious actors assistance in 'causing serious harm that they couldn't have received from other sources.'

AI-summarized, only the topics you pick — one digest a day via Email, Slack, or Discord.

Free · takes 30 seconds · unsubscribe anytime

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime