The dangerous AgentMail webhook bug starts with a response that looks completely fine.
POST /v0/webhooks -> 200 OK
The response body contains the webhook object. It includes your URL, event types, and usually enough fields to make the integration feel done.
But in a multi-tenant app, the field that matters is inbox_ids. If that array is silently dropped, your webhook is no longer scoped to the tenant inbox you meant to subscribe. It is subscribed broadly, and every tenant's message events can start flooding the same endpoint.
That is not a noisy failure. It is worse. The subscribe call worked. Your endpoint receives events. The logs look active. The leak is in the scope.
If your handler routes by webhook ID before it checks the inbox ID, the wrong customer can look like a valid event. That is the kind of bug that passes observability and fails privacy.
Why unit tests miss it
Most unit tests stub the webhook create call by echoing back exactly what the app sent.
request.inbox_ids -> response.inbox_ids
That proves your client serialized the payload. It does not prove AgentMail persisted the scope.
The test passes. CI passes. The code ships. Then production receives events for inboxes that do not belong to the tenant that configured the webhook.
The annoying part is that the local test is not obviously wrong. It has the right endpoint, the right payload shape, and the right assertion against the create response. It just trusts the wrong copy of the object.
The pattern is reconcile-after-write
The fix is small enough to name:
POST -> GET -> diff
Treat the POST response as untrusted. It may be your request echoed back. Treat the later GET as the provider state. That is what AgentMail actually stored.
The diff is the bug.
const sentInboxIds = ["inbox_tenant_123"];
const created = await agentmail.post("/v0/webhooks", {
url: "https://app.example.com/webhooks/agentmail",
event_types: ["message.delivered", "message.bounced"],
inbox_ids: sentInboxIds,
});
const stored = await agentmail.get(`/v0/webhooks/${created.id}`);
const got = [...(stored.inbox_ids ?? [])].sort();
const want = [...sentInboxIds].sort();
if (JSON.stringify(got) !== JSON.stringify(want)) {
throw new Error("AgentMail webhook scope drifted");
}
That is the whole check. It is not a bigger mock. It is a read after the write.
Why the agent caught it
The reason a coding agent caught this where the unit test did not is not that it was smarter about webhooks. It ran a better-shaped test.
AgentMail has a curated webhook_lifecycle_create_read_delete workflow in FetchSandbox. When the agent ran it through the FetchSandbox MCP server, the workflow forced the boring step people skip:
create inbox
subscribe webhook scoped to that inbox
read webhook back
compare inbox_ids
delete webhook
The workflow shape matches the bug shape. A stubbed unit test checks what your app meant to send. The workflow checks what the provider kept.
This is not only AgentMail
The same bug appears anywhere a control-plane API returns an object that looks like the request you sent:
- IAM policies
- ACL rules
- draft resources
- webhook subscriptions
- tenant-scoped notification settings
If a create call returns 200, but the security or tenancy boundary matters, do not stop at the create response.
Read it back. Diff the fields that protect the boundary. Fail the test before the provider starts sending another tenant's events to your endpoint.
Run the brownfield demo
The brownfield-agentmail-demo repo shows this end to end if you want to run it against the workflow yourself.
The important part is not AgentMail-specific. The habit is: after a write that configures scope, reconcile what the provider persisted.
A webhook that passes create tests can still be the webhook that leaks across tenants.
FetchSandbox's AgentMail workflow docs include the create, read, and teardown path.
Top comments (0)