Anthropic’s Claude Fable 5 plays it too safe on safety, developers say

Anthropic released Claude Fable 5 on Tuesday, its most advanced public model to date. However, within two days of its launch, users reported that the model's safety system was incorrectly blocking legitimate and benign prompts. Claude Fable 5 is the first public model based on Anthropic's Mythos family, which demonstrated significant capabilities in identifying and exploiting software bugs during its training phase. This raised concerns within Anthropic, leading the company to categorize cybersecurity alongside high-risk domains like biology and chemistry when establishing limitations for Mythos-derived public models. Consequently, prompts deemed sensitive in these areas are rerouted to Claude Opus 4.8, a less capable model equipped with its own safety mechanisms. Anthropic stated that this fallback mechanism impacts approximately 0.05% of queries and notifies users when it occurs. Despite this low percentage, a surge of false positive reports emerged, attributed to Anthropic's cautious approach in designing the classifiers intended to detect and downgrade potentially hazardous model uses, and the inherent challenge of balancing accuracy with transparency. Developers have voiced their frustrations on social media, detailing instances where Claude Fable 5 rejected queries ranging from RNA sequencing data for sheep to résumé editing and even simple shopping lists. Scientist Derya Unutmazon highlighted on X that the word 'cancer' was flagged as a biosecurity risk, while founder and developer Bojan Tunguz expressed on X that "Our Anthropic overlords deciding which prompts the peasants are allowed to use." Anthropic has since acknowledged the issue and is actively working on a solution, stating in an emailed statement to Fast Company that "A hidden safeguard is harder to probe and work around," allowing for more targeted safeguards, whereas visible safeguards require a broader net, leading to more incorrect flagging. The company further admitted, "We made the wrong tradeoff and we apologize for not getting the balance right."
Original source — read the full reporting at the publisher:
Read on Fast Company