
Summaries like this, in your inbox every morning.
Sign up free →Anthropic publicly released Claude Fable 5on Tuesday, described as its first 'Mythos-class' model that surpasses previous Claude Opus models in overall capabilities. The model operates on the same underlying model as Mythos 5, which is being released today but only for a small group of cyberdefenders approved through Project Glasswing.
Fable 5 uses classifiers to detect banned topics and potential jailbreak attempts, then routes sensitive queries on cybersecurity, biology, and chemistry to Claude Opus 4.8 while warning the user. In over 1,000 hours of red-team testing with a bug bounty program, external teams failed to find any universal jailbreaks for Fable 5, and the model resisted automated jailbreak attempts much more effectively than previous Claude Opus models.
Anthropic tuned the safeguards to be 'stricter than ideal,' acknowledging that false positives occur in less than five percent of all sessions in testing. The company prioritizes preventing situations where the model could give malicious actors assistance in 'causing serious harm that they couldn't have received from other sources.'
No discussion yet for this article
Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started FreeFree · takes 30 seconds · unsubscribe anytime
5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack