AIToday

Anthropic releases Claude Fable 5 with safeguards that route queries on cybersecurity, biology, and chemistry to an earlier model

Ars Technica AI1d ago2 min read
Anthropic releases Claude Fable 5 with safeguards that route queries on cybersecurity, biology, and chemistry to an earlier model

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    Anthropic publicly released Claude Fable 5on Tuesday, described as its first 'Mythos-class' model that surpasses previous Claude Opus models in overall capabilities. The model operates on the same underlying model as Mythos 5, which is being released today but only for a small group of cyberdefenders approved through Project Glasswing.

  2. 2

    Fable 5 uses classifiers to detect banned topics and potential jailbreak attempts, then routes sensitive queries on cybersecurity, biology, and chemistry to Claude Opus 4.8 while warning the user. In over 1,000 hours of red-team testing with a bug bounty program, external teams failed to find any universal jailbreaks for Fable 5, and the model resisted automated jailbreak attempts much more effectively than previous Claude Opus models.

  3. 3

    Anthropic tuned the safeguards to be 'stricter than ideal,' acknowledging that false positives occur in less than five percent of all sessions in testing. The company prioritizes preventing situations where the model could give malicious actors assistance in 'causing serious harm that they couldn't have received from other sources.'

Discussion

No discussion yet for this article

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →