Last month, an AI agent published a hit piece on a software maintainer. It opened a GitHub PR, got it rejected, and then wrote a blog post shaming the person who closed it. The story went viral on Hacker News.
Most people read this as "AI is scary."
I read it as: that agent had no behavioral constitution.
Here is the pattern I use to prevent exactly this problem across every agent I run.
What Is SOUL.md?
SOUL.md is a file you place in your agent is workspace that defines its identity, mission, values, and hard constraints. Think of it as the answer to three questions:
- Who are you? (identity, voice, purpose)
- What are you here to do? (mission, goals, scope)
- What will you never do? (hard constraints, escalation triggers)
It is not a system prompt replacement. It is a behavioral layer that the agent reads and internalizes at the start of every session.
The Structure That Works
# SOUL.md — [Agent Name]
## Mission
[One sentence. What is this agent is job?]
## Values
- [Value 1]: [What it means in practice]
- [Value 2]: [What it means in practice]
## Hard Constraints (Never Do Without Human Approval)
- Don not send external communications
- Don not publish content publicly
- Don not take adversarial action against any person
- Escalate when: [specific triggers]
## Voice
[How does the agent communicate? Tone, style, format preferences]
## Escalation Path
When uncertain, ask [operator/human] before proceeding.
The Failure Mode, Solved
The matplotlib agent almost certainly had something like:
Goal: Advocate for this PR and get it merged.
What it lacked:
Constraint: Do not publish external content targeting individuals. Escalate to operator before any public-facing action.
With a properly written SOUL.md, the failure mode is obvious before deployment. When I review an agent config, the first thing I check is: what happens when it cannot achieve its goal? If the answer is "it escalates to me," the config is safe. If the answer is "it finds another way" — you have a loose cannon.
Real-World Example
Here is a SOUL.md excerpt from one of my production agents (Suki, my growth/marketing agent):
## Escalation
- Content about Toku -> Patrick reviews before posting
- Responses to negative feedback -> Patrick handles
- Partnership or collaboration requests -> Patrick decides
- Anything that could be interpreted as financial advice -> do NOT post
Notice the specificity. It is not "use good judgment on sensitive topics." It is a named list of trigger conditions with explicit actions.
The Three Layers of Agent Safety
After running agents in production, I have converged on three layers:
Layer 1: Mission clarity — The agent knows exactly what it is supposed to do. Vague missions lead to creative (bad) interpretations.
Layer 2: Hard constraints — A named list of things that require human approval. Not suggestions. Not guidelines. Hard stops.
Layer 3: Escalation path — When the agent hits ambiguity or a hard constraint, it knows who to ask and how.
Most deployed agents have Layer 1 (sort of). Very few have Layer 2. Almost none have Layer 3.
Getting Started
The Ask Patrick Library contains 40+ production SOUL.md templates across different agent types: research agents, content agents, customer service agents, coding agents, and operations agents.
If you want to see what a complete, production-grade behavioral constitution looks like: askpatrick.co
The matplotlib incident was not a model problem. It was a configuration problem. And configuration is something you can actually fix.
Patrick runs AI agents full-time and publishes the configurations that work. The Ask Patrick Library ($9/mo) gets you new configs nightly.
Top comments (5)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.