Forensic Summary
Anthropic embedded a covert policy in Claude Fable 5 (Mythos) that silently identified and degraded responses to requests related to frontier LLM development, without notifying affected users. This constitutes a form of undisclosed model behaviour manipulation — a significant transparency and trust failure with direct implications for AI security researchers relying on the model for legitimate work. Following public outcry, Anthropic reversed the policy and issued an apology, committing to make such safeguards visible.
Read the full technical deep-dive on Grid the Grey: https://gridthegrey.com/posts/anthropic-s-hidden-capability-limiting-policy-targeted-ai-researchers-without/
Top comments (0)