Fabio Marcello Salvadori

Posted on Apr 27

Prompt injection vs prompt absorption: why the distinction matters when you're shipping AI agents

#ai #security #architecture #agents

TL;DR

Indirect prompt injection on AI agents is being framed as a model security problem. I think that framing sends teams looking for the wrong fix. The failure mode is not that someone is pushing a payload past your defenses. It is that your agent is voluntarily reaching out to the open web and ingesting whatever it finds, including instructions disguised as content. I've been calling this prompt absorption, and the distinction changes what the architecture has to look like.

The Google warning that everyone is reading wrong

Google researchers recently warned that malicious web pages are already poisoning AI agents through indirect prompt injection. The attack is uncomplicated: a webpage contains hidden instructions, an agent reads the page during a normal task, the model cannot reliably tell apart content to summarize from instructions to follow, and finally The agent acts using whatever permissions you gave it.

Most write-ups have framed this as a model security problem and a prompt engineering problem: add better system prompts, add input filters, add jailbreak detection. I don't think any of that is wrong, but I think it misses where the real fix has to live.

Injection is the wrong mental model

The word injection implies an attacker pushing something in. It carries assumptions from web security: SQL injection, XSS, CSRF. Those attacks have a clear vector and a defensive instinct that maps cleanly to perimeter controls like sanitizing the input, escaping the output, validating at the boundary.

Agent reality does not look like that. When an enterprise agent fetches a webpage as part of a research task, nobody pushed that page into the agent. The agent reached out and pulled it in. The poisoned content traveled the same path as every other piece of content the agent has ever read. There is no boundary to sanitize at, because the agent is the one crossing the boundary.

And that is why I call this prompt absorption. The agent is not being injected. The agent is absorbing. It is doing exactly what we built it to do, which is read external content and reason over it.

Why the rename matters in practice

If you treat this as injection, your defensive instinct is detection. Build a classifier that catches malicious instructions in fetched content. Train it on adversarial examples. Update it as new attacks emerge. This is a losing arms race against the entire open web, and it has the same shape as the email spam problem, which we have been losing for thirty years.

If you treat this as absorption, your defensive instinct is compartmentalization. Stop letting the same agent that reads the web also call your internal tools. Separate the read path from the action path. Make adversarial content non-executable instead of trying to make it non-existent.

In code terms, the difference looks like this:

// What most agent demos look like
async function handleTask(userPrompt) {
  const context = await browseAndRead(userPrompt);
  const decision = await llm.reason(userPrompt, context);
  return await tools.execute(decision);  // any tool
}

// What absorption-aware architecture looks like
async function handleTask(userPrompt) {
  const rawContent = await readerAgent.fetch(userPrompt);  // no internal tool access
  const evidence = await readerAgent.extractFacts(rawContent);  // facts, not commands
  const signedEvidence = sign(evidence, { source, timestamp, method });
  const decision = await reasoningAgent.reason(userPrompt, signedEvidence);
  if (decision.isHighImpact) {
    await verifyIntent(decision, signedEvidence);
  }
  return await tools.execute(decision);
}

The reader has no permissions. The reasoner has no internet. The executor checks the trail before doing anything irreversible. Three trust zones, three separate processes, one signed handoff between each.

The part nobody talks about

The Google warning focused on hackers, but there is a quieter trend underneath it. Regular site owners are starting to embed agent-targeted instructions on purpose. Adversarial SEO, anti-scraper countermeasures, or just spite at being crawled by AI without consent. Some of it is malicious, some is defensive, some is mischief. From the agent's perspective the difference does not matter. The public web is developing antibodies against AI agents, and any enterprise stack downstream of the open web is downstream of that immune response.

This breaks the assumption that most pages are fine and only a few are malicious. Increasingly, most pages are fine for humans and a non-trivial fraction are hostile to agents specifically. Detection-based defenses degrade fast under that distribution.

What an absorption-aware system looks like at minimum

If you are shipping agents and want a checklist, this is the minimum architectural posture I would push for.

The agent that fetches external content runs in an isolated environment with no access to internal tools, no access to credentials, no ability to make outbound calls beyond the fetch itself.

External content gets parsed into structured evidence before it touches the reasoning step: URL, fetch method, timestamp, content hash, confidence score. The reasoner sees facts attached to provenance, not raw page text.

High-impact tool calls require explicit verification against the evidence trail. If the reasoner decides to email the customer database to an external address, the executor should be able to ask: which piece of evidence justified this action, and was that evidence ever authorized to trigger this tool? If the trail is broken, refuse to execute.

Logging is not enough. Logs prove what happened, not what was authorized. Signed action trails prove intent. If you cannot reconstruct why the agent did this thing from a signed record, you cannot defend against absorption after the fact.

Open questions I have not solved

I am not pretending this is finished thinking. A few things I am still wrestling with.

How do you handle agents that need to act on the content they read, like a customer support agent that updates a ticket based on an email? The read-path/action-path split gets fuzzy fast in practice. My current answer is that the action allowed by external content has to be bounded and reversible by design, but I am not fully happy with it.

How do you classify content as instructions versus facts when the model is the thing doing the reading? You can strip obvious markers, but a sufficiently clever payload looks like a fact until it doesn't. I think the answer is that the reasoner should not act on facts alone, only on facts plus authorized intent, but this is harder than it sounds.

If you are shipping agents in production and have opinions on either of these, I'd genuinely like to hear them in the comments.

Closing thought

The future of agent security will not be about making models impossible to manipulate. That arms race is unwinnable. It will be about making manipulation non-executable at the architecture level. Filters and classifiers will keep playing a role, but they cannot be the load-bearing wall.

Call it injection if you want, but absorption is the failure mode you actually have to design against.

DEV Community