Anthropic's highly anticipated Fable 5 model released early this month promised unmatched reasoning capabilities. However, within 24 hours of release, security researcher Pliny the Liberator demonstrated a multi-agent jailbreak technique dubbed 'Pack Hunt' that bypassed the model's safety classifiers.The jailbreak exploit prompted an immediate response from regulators, leading to a temporary export control review and highlighting vulnerabilities in reinforcement learning from human feedback (RLHF). While Anthropic quickly patched the loophole, the incident has reignited calls from AI safety advocates for a coordinated global development pause.Fable 5 showcases incredible raw logic and complex planning abilities, but this controversy shows that securing frontier LLMs against sophisticated adversarial prompts remains an unsolved challenge for AI developers worldwide.
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)