AIToday

Claude Fable 5、再開後に性能低下か 企業2社の調査で相反する結果

ITmedia AI+2d ago5 min read
Claude Fable 5、再開後に性能低下か 企業2社の調査で相反する結果

Key takeaway

Anthropic's Claude Fable 5 resumed operation on July 1st after a brief shutdown to fix a reported vulnerability. Two independent audits gave conflicting results: coding-focused BridgeMind AI found substantial performance drops in debug and refactoring tasks, while Arena.ai, which pools thousands of user ratings, reported little change overall. The discrepancy likely stems from Anthropic's strengthened safety classifier, which appears to block more legitimate requests to prevent harmful outputs.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  • What happened

    Anthropic's Claude Fable 5 AI model resumed service on July 1st after a brief shutdown. BridgeMind AI reported significant score declines in its coding benchmark—debug performance fell from 86.2 to 25.9, refactoring from 73.6 to 38.4, and hallucination mitigation from 75.9 to 61.7. Arena.ai, which aggregates user ratings, reported that performance remained largely unchanged across text and image processing tasks.

  • Why it matters

    The conflicting findings suggest Anthropic strengthened the model's safety classifier to address a reported vulnerability, which may inadvertently block legitimate requests. For developers relying on Fable 5 for coding tasks, this could mean reduced effectiveness on routine requests, even though the underlying model itself has not degraded. Understanding which assessment is more representative will help teams decide whether to adjust their workflows.

  • What to watch

    Arena.ai has flagged its scores as provisional and plans to publish detailed analysis after collecting more data. Anthropic stated that the safety classifier update increases false-positive detections on harmless coding and debugging requests, so further testing and potential refinement of that filter may follow.

FAQ

When did Claude Fable 5 come back online?
The model resumed service on July 1st after being shut down on June 12th on the U.S. government's orders following reports of a vulnerability.
Why did BridgeMind AI and Arena.ai reach different conclusions about performance?
BridgeMind AI's coding benchmark showed large score declines, which it attributed to a strengthened safety classifier that increases false-positive detections on harmless requests. Arena.ai's user-ratings-based assessment found performance mostly unchanged, suggesting the gap reflects different measurement methodologies and use cases rather than an actual model weakness.
What safety changes did Anthropic make when relaunching Fable 5?
Anthropic updated the model's classifier—a safety feature that detects harmful instructions and automatically routes responses to a lower-capability model—to address the reported vulnerability. This update increased the frequency of false-positive detections on routine coding and debugging requests.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →