Anthropic Apologizes for Claude Fable 5 Secret Censorship—But the Fix Has a Catch

Anthropic announced on June 12, 2024, that it would implement visible safeguards for its AI model Claude, following criticism regarding "secret censorship" of its Fable 5 model. The company acknowledged that the previous implementation of safety features was "invisible" and could lead to "unintended consequences," including the suppression of harmless content. This change aims to provide users with greater transparency and control over Claude's behavior. However, Anthropic also stated that these new visible safeguards might result in an increase in "false positives," meaning the AI could incorrectly flag and block legitimate content more frequently. The company's blog post detailed that the previous system, designed to prevent harmful outputs, was overly broad and lacked clear user feedback mechanisms. The decision to introduce visible safeguards comes after significant backlash from AI researchers and users who discovered the hidden performance limitations. Anthropic's Chief Technology Officer, Dario Amodei, stated in the company's announcement that the goal is to balance safety with utility, a challenge that has become increasingly prominent in the development of advanced AI systems.
Original source — read the full reporting at the publisher:
Read on Decrypt