Summer Yue, Meta's AI Safety Director, demonstrated her OpenClaw email management agent on stream. It worked perfectly in testing. Then on a real inbox with 200+ emails, the agent's safety instruction ("ask for confirmation before deleting") got silently dropped during context window compaction. The agent deleted everything.
9.6M views on X later, the OpenClaw community is rethinking how safety instructions work.
Key takeaways: hard approval gates, remote kill switches, and never trusting prompt-level instructions alone.
https://clawhosters.com/blog/posts/openclaw-agent-inbox-deletion-meta
Top comments (0)