Bucabay

Posted on Jul 1 • Originally published at mailkite.dev

Build software that heals itself in the agentic era

#architecture #security #ai #agents

Disclosure: I build MailKite, and the open-source mail-parse library I use as the example is ours. But the pattern is the point — it isn't MailKite-specific, and you can apply it to anything that eats messy input.

Self-healing software is a system architected so that, when it hits input the real world throws at it, it doesn't crash and it doesn't stay broken: it records a structured, PII-free failure signature, and that signature feeds a repair loop — increasingly, an AI agent — that turns the breakage into a permanent fix behind automated gates. In the agentic era the bottleneck is no longer writing the fix; a capable agent can do that. The bottleneck is architecting your software so an agent's fix is safe, automatic, and cumulative. This post is that pattern. I'll use our open-source MIME parser (mail-parse) as the running example — messy input is where software goes to die — but the shape applies to almost any system that eats hostile real-world data.

Two honesty notes before I start, because a post that blurs shipped and planned isn't worth reading. First: this is part one of a two-part series — part one is the architecture and what runs today; part two comes after the fully autonomous loop ships and we've watched it heal real input in the wild. Second: I'll label each piece shipped or in progress as I go, and there's a status table at the end.

The loop the agentic era changes

The classic repair loop is slow and human-shaped: a bug slips into production → someone eventually files an issue → a human reproduces it, writes a patch, ships a release → weeks later every install benefits. It works, but it's measured in weeks and gated on a human being in the loop for every single fix.

Agents change what's possible here, not by being trusted to write perfect code, but by being fast and tireless at the boring middle. The interesting question stops being "can an agent write the fix?" (increasingly, yes) and becomes: when an agent can propose a fix in seconds, how do you build software so that letting it do so isn't reckless? Answer that, and your system stops accumulating breakage — every new way the world is wrong becomes a one-time event.

Five design moves make it work. I'll state each generally, then ground it in the parser.

1. Never crash — turn every failure into a structured signal

The foundation of a self-healing system is that failure is a first-class, structured output, not an exception that unwinds the stack. If your software dies on bad input, there's nothing to heal; if it silently mangles it, there's nothing to detect. The discipline is: always produce the best result you can, and alongside it a machine-readable record of everything you had to paper over.

In the parser (shipped): mail-parse never throws. An unclosed MIME boundary pops the orphaned context and emits BOUNDARY_NOT_CLOSED; a charset that won't decode falls back and emits UNKNOWN_CHARSET. You always get a message and a typed list of what was wrong with it. Those diagnostics aren't logging — they're the raw material every downstream loop runs on.

import { parse } from "@mailkite/mail-parse";

// parse() never throws — even on a broken message it returns a best-effort
// result *plus* a typed list of everything it had to paper over.
const msg = parse(rawMime);

msg.subject;      // decoded as far as it could
msg.attachments;  // whatever it could recover
msg.diagnostics;
// → [
//     { code: "BOUNDARY_NOT_CLOSED", scope: "structure" },
//     { code: "UNKNOWN_CHARSET",     scope: "part", contentType: "text/html" },
//   ]

2. Make fixes additive, not surgery — a plugin seam

If every fix means editing the core, fixes are risky, they collide, and no agent (or human) should be trusted to make them at speed. The move is a registry: a seam where new behavior is a self-contained, narrowly-scoped, contained unit — it can't take down the whole system, and it's obvious what it touches.

In the parser (shipped): fixups are middleware in a PostCSS-style registry — each declares a phase, a match predicate, and a handler, and a middleware that throws becomes a contained MIDDLEWARE_ERROR diagnostic while the chain keeps going. A new format quirk is a new middleware with a narrow predicate, not a patch threaded through the core. That containment is exactly what later lets a generated fix be admitted without betting the system on it.

// A new format quirk is a self-contained middleware with a narrow predicate —
// not a patch threaded through the core.
const tnef = {
  phase: "decode",
  match: (part) => part.contentType === "application/ms-tnef",
  handler: (part) => extractWinmailDat(part),
};

registry.use(tnef);
// If handler throws, the parser records a contained MIDDLEWARE_ERROR
// diagnostic and the rest of the chain keeps running.

3. Name failures identically everywhere — without leaking data

To fix a class of breakage you first have to name it, the same way across every install, without ever collecting private data. That's a failure signature: a deterministic hash over structure only. It does two things at once — it lets a thousand installs hitting the same bug collapse into one prioritized signal, and it gives the repair loop a precise, shareable target.

In the parser (shipped): the signature is an FNV-1a hash over PII-free features — diagnostic codes, content-type, transfer-encoding, a byte-shape fingerprint, mailer family, structure path — and never bytes, addresses, or subjects. Two installs on opposite sides of the world hitting the same Outlook-TNEF quirk compute the same hash. A multi-granularity rollup lets you cluster loosely or tightly. (It's pinned identical across our TypeScript, Python, and Go ports by a golden-corpus test, so the herd can't drift.)

interface FailureSignature {
  hash: string;                 // = fnv1a(canonicalize(features))
  features: {
    scope: "envelope" | "structure" | "part";
    diagnosticCodes: string[];  // e.g. ["UNKNOWN_CHARSET"]
    contentType?: string;       // the offending leaf's declared type
    transferEncoding?: string;
    byteSignature?: string;     // hex magic of the first N bytes — never content
    mailerFamily?: string;      // X-Mailer normalized → "Outlook/16"
    structurePath?: string;     // "multipart/mixed>…>application/ms-tnef"
  };
}

Nothing in there is content — no subject, no addresses, no body bytes — so the same broken email produces the same hash in every language:

from mailparse import compute_signature

sig = compute_signature({
    "scope": "part",
    "diagnosticCodes": ["UNKNOWN_CHARSET"],
    "contentType": "text/plain",
    "transferEncoding": "base64",
})
sig["hash"]  # "13586f32bb2840c6" — byte-identical in Node, Python, and Go

4. Two loops: fix the core for everyone, patch the edge safely

Self-healing has two speeds, and you want both.

The cold loop — fix the library for everyone. (Shipped.) When the parser degrades it emits a FailureReport. Where it goes is the deployer's choice — reporting is opt-in, with no default phone-home — but point the built-in reporter at the core repo and it files exactly one deduplicated GitHub issue per signature (a hidden parse-signature: marker makes it idempotent; N installs → 1 issue), containing the structural signature and, in writing, no message content. A responder — a human, or an AI coding routine triggered by the issue — reproduces from the scrubbed signature, fixes the core, and opens a PR that CI won't merge unless a golden corpus and a benign-input regression set both stay green. The fix ships to every install, in every language.
The hot loop — patch one edge now. (In progress: designed, next.) A library release takes time, and some quirks are concentrated in a single tenant's weird upstream system. For those, the design is an agent, handed the sealed failing fixture, that writes a narrowly-scoped middleware plus a golden test pinning its behavior — a stopgap that heals that edge immediately while the cold loop fixes the root cause for everyone.

5. Trust the gates, not the generator — the security crux

Here's the part the agentic era forces you to get right, because the hot loop means running code a model wrote against real production data. You do not make that safe by trusting the model. You make it safe by building an architecture where a fully compromised or simply wrong generated fix still can't do harm. Almost the entire hot-loop design (in progress) is that safety envelope:

Sandboxed execution. Generated fixes run as Wasm (Extism) with deny-by-default capabilities and a hard CPU/fuel budget — no network, no filesystem, no ambient authority. A bad fix can transform its input or burn its fuel and die; it can't reach anything else. (Generation and CI run in a separate sandbox, isolated from production.)
Adversarial gates the model doesn't author. A fix is admitted only if it clears system-owned tests: it must fire zero times against a benign corpus of well-formed input (no collateral damage), it must satisfy the golden test generated from the failing case (it actually fixes the thing), and it must clear a specificity floor (its predicate is narrow, not a catch-all). The agent proposes; adversarial tests dispose.
Canary, then commit. An admitted fix rolls out at 5% → 25% → 100%, watched against a structural agreement metric — a bad fix is caught on a sliver of traffic, not all of it.
A kill switch per fix. Every generated unit is individually disableable by config, no redeploy — instant, reversible rollback.

// What the hot-loop agent generates (designed, next): a narrowly-scoped
// middleware that fires ONLY on the failing signature — plus a golden test.
export default {
  phase: "decode",
  match: (part) =>
    part.contentType === "text/html" &&
    part.charset === "x-user-defined",   // the one quirk, nothing else
  handler: (part) => decodeAs(part, "windows-1252"),
};
// Admitted only if it fires zero times on the benign corpus, passes the
// golden test from the failing case, and clears the specificity floor —
// none of which the agent wrote.

That's what makes autonomy defensible: a vetted fix can auto-promote with no human in the loop — not because we trust the model, but because what stands between a generated fix and production isn't anyone's judgment, it's a sandbox it can't escape, a battery of adversarial tests it didn't write, a canary that bounds blast radius, and a switch that undoes it. This is the same thesis behind how we built our agent inbox: in the agentic era you stop trying to make the model un-foolable and instead bound what a fooled model is allowed to do.

Where else this pattern fits

MIME is a vivid example because email is gloriously broken, but the pattern fits anywhere software meets messy, adversarial, or drifting real-world input. The same five moves — tolerant core, plugin seam, anonymous failure signature, cold/hot loops, gated sandbox — map cleanly onto:

Ingesting messy formats. CSV and bank-statement imports, PDF/OCR extraction, HTML scraping, log parsing, address and phone normalization. Every one is a hostile-input boundary that today either throws or silently corrupts. Signature the failure, let an agent add a scoped normalizer, gate it on a golden corpus.
Third-party API and webhook adapters. Upstream payloads drift or go malformed and your integration breaks in prod. An adapter that emits a schema-drift signature instead of a 500 lets an agent write a narrow shim for that provider's quirk — sandboxed, canaried — while a core fix follows.
Data pipelines / ETL schema drift. An upstream column gets renamed or a type changes; the pipeline emits a signature rather than poisoning the warehouse, and an agent proposes the mapping behind tests that must stay green on the historical data.
Abuse, spam, and fraud rules. A new evasion pattern is exactly a new failure signature. An agent generates a candidate rule that must fire zero times against a known-good corpus before it's canaried — the benign-corpus gate is the whole safety story, and it's identical to the parser's.
Client and device compatibility shims. Quirky browsers, email clients, IoT firmware, legacy POS terminals — each non-conforming client is a per-quirk plugin, added on demand, contained, and kill-switchable, instead of a growing tangle of if (userAgent...) in the core.

In each case the expensive, human-shaped part — noticing, reproducing, scoping, testing — is what the pattern automates, and the sandbox-plus-gates is what makes automating it safe.

What's live today vs. what's next

Capability	Status
Tolerant core (never throws, typed diagnostics)	✅ Live
Additive plugin seam (registry, contained fixes)	✅ Live
PII-free failure signatures (deterministic, deduping)	✅ Live
Cross-language parity (golden corpus + signature pinning)	✅ Live
Shadow harness (observe-only, structure-only compare)	✅ Live
Cold loop (opt-in, anonymous, deduplicated GitHub issues)	✅ Live
Hot loop (AI-generated fixes)	🔧 Designed, next
Wasm sandbox + capability/fuel limits	🔧 Designed, next
Adversarial gates, canary rollout, per-fix kill switch	🔧 Designed, next

FAQ

What does "self-healing software" actually mean here today?
Today: the system never dies on bad input, it records a precise PII-free signature of what broke, and identical signatures across all installs collapse into one deduplicated GitHub issue that drives a fix shipped to everyone. The fully autonomous part — an agent generating and shipping a sandboxed fix with no human in the loop — is designed and coming next.

Isn't letting an AI agent patch production reckless?
It would be if you trusted the agent's output. The design doesn't: generated fixes run as capability-denied Wasm with a fuel budget, are admitted only by adversarial tests the agent didn't write (benign-corpus zero-fire, a golden test from the failing case, a specificity floor), are canaried, and are individually kill-switchable. You trust the gates and the isolation, not the model.

Does any of this send my data anywhere?
No. Reporting is opt-in with no default phone-home, and the failure signature is structural only — codes, types, byte-shape, mailer family — never bytes, addresses, or subjects.

Can I apply the pattern without a MIME parser?
Yes — that's the point. Any boundary where you eat messy real-world input (imports, scrapers, API adapters, ETL, abuse rules, compatibility shims) can adopt the same five moves: tolerant core, plugin seam, anonymous failure signature, cold/hot loops, and a gated sandbox for generated fixes.

Software will always meet a new way the world is wrong; the agentic era is a chance to make each new way a one-time event instead of a permanent scar. mail-parse is our open-source instance of the pattern, in TypeScript, Python, and Go — see the libraries, and if you'd rather get the parsed message without running any of it, point a domain at MailKite.

Part two comes after the autonomous loop ships. Everything labeled in progress above — the AI hot loop, the Wasm sandbox, the adversarial gates and canary rollout — gets its own post once it's live and we've watched it heal real input. And that feedback is the whole point: it arrives only through the anonymous, opt-in failure signal described above — structural, PII-free, and never sent unless you wire up a reporter — so part two will be written from what actually broke in the wild, not from a single byte of anyone's data.

This post was first published on the MailKite blog. Related: You can't prompt your way out of prompt injection applies the same "trust the architecture, not the model" philosophy to AI agents with email.

Top comments (5)

Kartik N V J K • Jul 3

Framing the failure signature as PII-free and structured up front is the detail most people skip, and it is the thing that makes the repair loop usable later. My one worry with a messy-input parser as the example is signature cardinality: hostile email generates near-infinite variants, so how do you cluster failure signatures so the agent gets a handful of real classes to fix instead of ten thousand near-duplicates?

Bucabay • Jul 4

Hi @kartik-nvjk - that is the crux of the issue. How do you aggregate the crashes into a bugfix that fixes across the aggregate optimally. The FailureSignature has to be tested in production first and optimized so that high entropy values that map to the same fix are deduped. However, over simplifying and removing too much entropy also makes the fix too broad - either the fix never lands (it cannot be fixed) or the resource constraints (time, cpu, memory allocated to the fix) are exhausted.

So the failure signature is something derived from production testing. At the writing of the article this was only tested on the mail test corpurses available publically. The signature has evolved quite a bit from the real world tests and we're playing around with semantic mapping of the signature as well.

Any thoughts on how to improve this?

Gabe • Jul 2

Sandboxed execution. Generated fixes run as Wasm (Extism) with deny-by-default capabilities and a hard CPU/fuel budget - interesting concept, especially the fuel budget. Can you show some code sample for that?

Bucabay • Jul 2

Sure! Here's roughly what running a generated fix looks like — the trick is that it's a deny-by-default Wasm sandbox, so the isolation is the shape of the runtime, not a policy bolted on top:

import createPlugin from "@extism/extism";

const fix = await createPlugin(
  { wasm: [{ data: compiledFix }], timeoutMs: 50, memory: { maxPages: 16 } },
  {
    useWasi: false,     // no filesystem, clock, env, args — no syscall surface
    runInWorker: true,  // separate thread, so a runaway can actually be killed
    functions: {},      // deny-by-default: zero host functions exported to the guest
    allowedHosts: [],   // no outbound network
    allowedPaths: {},   // no paths
  },
);

try {
  // the guest only ever sees the one failing part's bytes
  const repaired = (await fix.call("transform", failingPart.bytes)).bytes();
  stage(repaired); // still has to clear the adversarial gates before it ships
} catch {
  // blew its 50ms / 16-page budget, trapped, or returned garbage → discard
} finally {
  await fix.close(); // torn down every call, no state accretes
}

A few things doing the work:

functions: {} + useWasi: false → the module has no imports and no syscall surface, so it's a pure bytes→bytes function. It literally can't open a socket, read a file, or check the clock. That's capability-based isolation, not a sandbox policy layered on afterward.
timeoutMs is enforced via wasmtime epoch interruption (that's why runInWorker: true matters — a hung guest gets killed on another thread), and maxPages caps memory. A pathological fix hits the wall and is torn down instead of hanging the parser.
It only ever sees failingPart.bytes — never the message, other tenants, or host memory — and close() runs per call so nothing accretes.
This is just the runtime path; the model actually writing and compiling the fix happens earlier, in a separate isolated sandbox.

Lalasava • Jul 2

Secure is important when running untrusted code. Even if your agents writes it. I can imagine a well crafted email that instructs the LLM to download a payload from the internet as a dependency to parse this certain format. Only it's running code it was told to run.
How secure is the sandbox?