Last week I shipped something that would have sounded reckless two years ago: an MCP server that lets an AI agent write to a merchant's live Shopify store. Create discount codes. Build customer segments. Draft WhatsApp campaigns against real order data.
Read-only agent integrations are everywhere now, and they're fine — but a read-only agent is just a chatbot wearing a dashboard. The useful version is the one that does the thing. The dangerous version is also the one that does the thing. A hallucinated SELECT is a wrong answer; a hallucinated discount code is free product going out the door.
So before turning writes on, I sat down and listed every way an agent could hurt a store. Then I built one guardrail per failure mode. Here's the list — it's short, and I think it generalizes to any agent surface that touches business data.
1. Read-only by default. Writes are a per-token opt-in.
The lazy design is one API key that can do everything the app can do. Instead, every agent token starts read-only — list customers, inspect segments, read campaign stats. Write capability is a separate flag the merchant flips per token:
{
"token": "fav_mcp_…",
"scopes": ["read"], // default
"writeEnabled": false // explicit opt-in, per token
}
This sounds obvious. It is obvious. It's also the single most-skipped step I see in agent integrations, because it's friction during development. Build the friction in anyway — "the agent could read everything but couldn't have sent that" is a sentence you really want available to you later.
2. Caps live in the tool schema, not the prompt.
Early on I had a system prompt that said something like "never create discounts above 30%." You can guess how durable that is. Prompts are suggestions; schemas are physics.
So the caps moved into the tool's input validation itself — the shape of it:
// create_discount — validated server-side, not prompt-side
{
percentage: z.number().min(1).max(100), // hard ceiling, enforced in the service
expiresInDays: z.number().optional(), // expiry + usage limits available
usageLimit: z.number().min(1).optional(), // …and once-per-customer
}
If the model asks for a 250% discount or a negative one, the call fails at the schema. No judgment call, no "the model usually behaves." The agent literally cannot express the dangerous request in the tool's grammar.
The general rule: any numeric an agent controls needs a server-side ceiling. Discount percentage, message volume, refund amount, batch size. If the cap only exists in natural language, it doesn't exist.
3. Outbound messages: the agent drafts, a human sends.
Writes are not all the same risk. I ended up sorting every tool by one question — what happens if this is wrong?
- Reversible writes (create a segment, tag a customer): let the agent do them. Worst case you delete a segment.
- Costly-but-fixable writes (a capped discount code): allow with the schema ceilings above.
-
Irreversible, customer-facing writes (broadcast a WhatsApp campaign to 2,000 people): the agent gets a
draft_campaigntool. There is nosend_campaigntool.
That last one isn't a missing feature — it's the design. You cannot unsend a broadcast. A wrong segment plus an eager agent equals 2,000 customers getting a message meant for 12, and that's a brand problem no rollback fixes. So the chain deliberately ends at a draft sitting in the merchant's dashboard with a send button only a human can press.
(If you read my last post about agents booking appointments while humans tap "pay" — same principle, different blast radius. The pattern keeps being: agent does the 90%, human owns the irreversible 10%.)
4. Typed tools or it didn't happen.
Every tool takes a JSON schema and returns structured data — no "pass me a query string" escape hatches. This isn't just developer hygiene; it's a guardrail, because the type system is where most agent mistakes get caught before they become writes.
An agent calling create_segment with a condition like TOTAL_SPENDINGS GTE "five hundred" fails validation instantly. The same mistake through a free-text interface becomes a silently empty segment — which becomes a campaign to nobody, or worse, to everybody.
Free-text tool inputs are how you end up debugging an agent's vibes. Schemas turn vibes into 400 errors.
5. An audit trail you actually check.
Every agent call logs what was called, with what arguments, by which token, when. Each token shows its lastUsedAt. Boring, standard, and the part everyone skips.
Here's why it matters for agents specifically: with a human operator, weird behavior gets noticed by the human doing it. An agent doesn't notice itself misbehaving. A retry loop that creates 14 identical segments at 3 a.m. is invisible unless something is recording it — and the merchant's first question after any surprise will be "what exactly did the agent do?" You want that answer to be a log line, not a shrug.
The checklist
If you're putting write tools in front of an agent — any agent, any backend — this is the whole list:
- Read-only default, writes opt-in per token
- Hard ceilings in the schema for every number the agent controls
- Sort writes by reversibility — irreversible + customer-facing ends at a draft, never a send
- Typed inputs everywhere, no free-text escape hatches
-
Log every call, surface
lastUsedAt, make the audit trail answer "what did it do?"
None of this is clever. That's sort of the point — the difference between a safe agent surface and a scary one isn't research-grade alignment work, it's the same boring engineering we already do for human-facing APIs, applied with the assumption that your newest API consumer is confidently wrong some percentage of the time.
(The store surface in question is the MCP server behind FavCRM's Shopify app — ~20 tools, reads plus the capped writes above. The patterns aren't specific to it.)
What I'm still unsure about: caps and drafts handle the known failure modes. The one that worries me is the slow one — an agent making 50 individually-reasonable small writes that add up to a mess no single guardrail catches. Rate limits help; I don't think they're the full answer.
Where do you draw the write line for agents on your stack — and has anyone found a good pattern for the death-by-a-thousand-valid-calls problem?
Top comments (0)