If you are building agents, the most important decision you make is where to put the work that keeps the agent safe. Most of it goes into the agent — better prompts, more rules, tighter tool lists, careful output filters. That approach works. But there is a different approach that produces stronger results with less ongoing effort, and it starts from a different idea about what keeps an agent safe in the first place.
Let me start with a toy.
The Shape Sorter
A toddler gets a shape sorter. A board with holes. Square block, round block, triangle block.
Nobody explains the rules. The toddler tries the square in the round hole. It doesn't fit. They rotate it. Still no. They try the next hole. Click.
The toddler learned the rules by running into them. The toy didn't lecture. It just refused.
This is how agents should be kept safe. And how they should be built.
Where the Work Usually Goes
The default approach is to make the agent itself safe. Pick a model. Write a long prompt full of rules. Don't delete files. Don't call this API. Don't go outside the workspace. Don't say anything harmful. Trim the tool list to remove dangerous capabilities. Add output filters to catch bad responses on the way out. Tune the system carefully and watch closely.
This works. It produces working agents. But it has a limit worth naming.
The rules in the prompt are suggestions, not enforcement. A prompt that says don't delete files is a request. The agent honors it most of the time. Then a prompt injection slips in, or the model misreads context, or someone phrases something unexpectedly, and the rule dissolves. Words on a page have never physically stopped anyone from doing anything. The trim and the filter help, but they are bandages over the same wound — the agent itself is not the place where the safety actually lives.
You can keep building this way. But there is another approach, and it produces real safety guarantees instead of polite requests.
Where the Work Belongs
Build the environment first. Then write the manual. Then let the agent walk in.
Three pieces, in that order. Lets take a example of support agent.
The Support Agent
Think about how a customer support specialist comes to exist on a real team.
A new hire is not a different species from anyone else on day one. They start as a regular person who can read, write, and reason. What makes them safe to put in front of customers is two things, in order.
First, the support environment. The ticketing system. The CRM. The knowledge base. The refund tool with its hard cap. The escalation queue. The auth scope that only lets them see their own queue, not customer payment details. The communication channel that scans every outbound message before it reaches the customer. The environment doesn't care what the new hire intends. Try to refund more than the system allows — denied. Try to open a ticket that belongs to another team — blocked. Try to email a customer through an unapproved channel — rejected. Try to send a reply containing abusive language or leaked customer data — the channel filters it before it ever leaves. The environment is what holds the company together. It is the world the work happens in.
Second, the support playbook. How to greet a customer. How to triage a ticket. When to escalate. What to say when the answer is "no." Tone guidelines. Common scenarios with worked examples. The playbook is a head start — it lets the new hire skip the obvious mistakes and get useful on day three instead of day thirty.
Drop a competent person into a real support environment with the playbook on their screen, and within a few weeks, you can trust them with real customers.
Now ask: which of these two actually keeps the company from accidentally refunding a million dollars to the wrong customer, or sending a furious reply with a leaked password attached?
The environment. Always the environment.
The playbook is just a document. If the new hire ignores the playbook, misreads it, or never opens it, the company doesn't lose money because of the playbook — the system never let the refund go through. The customer doesn't receive an angry message with confidential data — the channel never let it leave. The environment is what reality is. The playbook is what someone wrote down about reality.
The Asymmetry
This is the part most people miss.
The environment enforces. The manual only informs.
The environment is physical. It responds to attempts, not intentions. Try something allowed, it succeeds. Try something disallowed, it fails. Operation not permitted. The agent cannot argue with it.
The manual is informational. It teaches. It accelerates. But it has no power over the agent. The agent can read it, ignore it, hallucinate around it, or be talked out of it by a prompt injection. None of that matters, because the manual was never the thing keeping the agent safe. The environment was.
Take the manual away and you still have a safe system. The agent will be slower — it will have to discover the patterns by trying, like the toddler at the shape sorter. But it cannot do harm. The environment never let it.
Take the environment's enforcement away and no manual can save you. The most carefully written manual in the world is a polite request. The agent will follow it most of the time, until the moment it doesn't. That moment is when you find out you never had safety at all.
The Builder's Order
If you are building an agent system, do these in order.
First, shape the environment. Decide what's possible. Choose the tools, the boundaries, the feedback loops, the outbound checks. This is the only step that protects you. Get this wrong and nothing downstream helps.
Second, write the manual. Now that the environment exists, describe it. The tools. The patterns. The conventions. This step does not protect you — the environment already did. This step makes the agent fast.
Third, let the agent walk in. It reads the manual, observes the environment, and starts working. If the first two steps were done well, it gets useful quickly. And if it ignores the manual entirely, the environment will still hold.
The common pattern is to do these in reverse — start with the agent, then bolt on tools, then think about safety last. That order can ship working systems, but each layer ends up retrofitting around the one before it. Building in the order above means each layer rests on the one below.
The Whole Craft
Don't build a cage around your agent. Build a world for it.
A world with physics — real, deterministic rules that hold whether the agent reads anything or not, whether its intentions are good or bad. That is the foundation.
A manual on the shelf, telling the agent how this world works. A head start, not a fence.
Then turn the agent loose. Let it try things. Let it fail. Let it learn the shape of what is possible by colliding with what isn't.
Build the environment first. That is the part that protects you. Write the manual second. That is the part that makes the agent fast. Let the agent walk in third.
That is the whole craft.

Top comments (1)
I see what you're saying about using deterministic rules to enforce safety, but what if new risks come up that the environment can't handle? Also, for coding interviews, I've been using prachub.com for the technical rounds. It's much better than going through random Glassdoor threads when prepping for algorithm follow-up questions.