What Pipelock Inspects, And What Tool Policy Inspects Instead

#ai #security #opensource #devops

A wire-only proxy scans wire bytes. Opaque media bytes pass through the wire layer untouched. Anyone evaluating an agent firewall should know which class of attacks gets caught at which layer, because pretending the wire layer covers everything is the wrong sales pitch and the wrong mental model.

This post is the layer split. Pipelock has two inspection layers that operate at different abstraction levels, and the marketing-friendly claim "we scan everything" is true for some shapes of attack and false for others. Saying so plainly is more useful to a buyer than saying nothing.

The wire layer

Pipelock's wire layer scans bytes as they cross the proxy. Every transport Pipelock supports gets the same set of scanners:

HTTP forward proxy. CONNECT and absolute-URI requests, request and response bodies on intercept paths, headers on every transport.
MCP stdio. JSON-RPC frames on the subprocess pipe, both directions.
MCP HTTP and SSE. JSON-RPC frames over HTTP, including streaming text/event-stream responses scanned per-event.
WebSocket. Frames in both directions, fragment reassembly, A2A envelope payloads.
Reverse proxy. Any HTTP-shaped agent backend Pipelock fronts.

What runs on those wire bytes:

DLP. Pattern matching for credentials, secret formats, and high-entropy strings. Runs on URLs, request bodies, response bodies, headers, MCP arguments, MCP responses.
Injection detection. Multi-pass content matching for prompt injection, jailbreak patterns, and tool-poisoning shapes. Runs on response bodies and MCP tool definitions.
Redaction. Class-preserving outbound scrub for known credential and PII shapes. Runs on request bodies and MCP tools/call arguments.
SSRF. Private-IP and metadata-endpoint protection on the URL pipeline. Runs on every transport with a URL.

The wire layer is good at credentials in headers, secrets in JSON, prompt injection in responses, and DLP-pattern leaks in tool calls. It is what stops an agent from POSTing an API key to a third-party logging service or fetching a markdown file with embedded jailbreak instructions and feeding it back to the model.

What the wire layer cannot do, and what no wire-only proxy can do without strapping on a perception model, is inspect the contents of opaque media:

Images. A PNG of a credential-bearing screen has the credential rendered in pixels. The proxy sees image bytes, not text.
Audio. A voice memo of a customer complaint contains words the proxy would have to transcribe to inspect.
Video. Same shape as audio plus pixels.
PDFs. A PDF can hold images, vector text, embedded fonts, and text-as-shapes. Naive PDF text extraction misses all of it.

Pipelock could in principle add OCR, ASR, and PDF extraction to the wire layer. None of those scans is free. OCR on every uploaded image multiplies proxy CPU by an order of magnitude. Latency budgets that work for text scanning collapse under perception. The architectural choice for the wire layer is to scan what is cheap, fast, and high-fidelity: text, structured data, and protocol headers. Opaque media gets a different treatment at a different layer.

The tool layer

Above the wire layer, the agent makes deliberate choices: it picks a tool to call, it constructs an argument, it sends a JSON-RPC request that names a method and a payload. The tool layer inspects those choices, not the bytes the choices move.

Two scanners run at this layer in Pipelock:

mcp_tool_policy. Pre-execution allow / deny / redirect rules that match on tool names, argument patterns, and URL shapes inside arguments. The "screenshot a URL" tool can have a rule that blocks calls whose URL matches a sensitive host pattern. The URL is text, even when the result will be image bytes.
tool_chain_detection. Sequence matchers that operate on the order in which an agent calls tools. A pattern like "screenshot the logged-in admin page, then upload the screenshot to a third-party host" is a sequence of calls whose individual calls are each plausibly fine. The chain matcher catches the shape of the sequence.

Both scanners operate on JSON-shaped data: method names, argument keys, URL strings inside arguments. None of them inspects the binary data the methods move. They operate one level above the bytes.

The thing they catch that the wire layer cannot: an agent that wants to exfiltrate something the wire scanner cannot read. The agent screenshots a page, uploads the screenshot, and the wire scanner sees a content-type of image/png and a stream of bytes. The wire scanner has nothing to say. The tool-policy rule, watching the URL the agent passes to the screenshot tool, can see "this is a sensitive page" and block before the screenshot happens. The chain detector, watching the sequence, can see "the agent is screenshotting and uploading" and break the chain.

The two layers cooperate. Wire scanning catches the credential leak the agent attempts as JSON. Tool-policy catches the equivalent leak the agent tries to launder through a screenshot. Neither alone is enough. Both together cover the surface a wire-only or tool-only design leaves open.

The enforcement boundary still matters. Tool policy and wire inspection only see traffic that reaches them, which is why the three-UID containment pattern and Kubernetes per-pod separation are part of the same posture.

What that means for the buyer

If your evaluation rubric reads "does this tool inspect images," the honest answer is that Pipelock does not, and that is the right design. The right question to ask any agent firewall is which layer catches which class of attack:

Credentials in JSON request bodies: wire layer, DLP scanner.
Credentials in screenshots uploaded as image bytes: tool layer, mcp_tool_policy URL rule on the screenshot tool.
Prompt injection in a markdown response: wire layer, injection scanner on response body.
Prompt injection in a PDF the agent fetches and processes: tool layer, policy rule on the fetch tool, plus DLP and injection scanning on whatever text the PDF parser eventually emits in tool arguments.
Tool poisoning via deceptive tool descriptions: wire layer, MCP tool scanner on tools/list responses.
Multi-step exfiltration where each step is plausibly benign: tool layer, chain detector on the call sequence.

The pattern: structured-text scanning belongs on the wire, semantic-action scanning belongs on the tool layer. Anyone selling a tool that claims to do both at the wire layer is either running an OCR pipeline that they are not budgeting for, or claiming coverage they do not have.

What Pipelock will not do

There are two specific things Pipelock does not do, and operators should plan around them:

Pipelock does not run OCR on uploaded images. The "screenshot of a credential" scenario relies on the tool-policy rule firing before the screenshot happens, not on inspecting the image after.
Pipelock does not transcribe audio. The "voice memo of a sensitive conversation" scenario relies on the policy rule on whichever tool initiated the recording, not on inspecting the audio file.

Both gaps are honest. Both are catchable at the tool layer if the rule set is configured correctly. The MCP security tools guide and MCP tool poisoning guide walk the surrounding control surface.

What this looks like in practice

A coding agent that handles customer data has tools for reading the database, screenshotting the admin UI, and uploading files to a code-review service. Three policies catch three different attacks:

Wire DLP catches a database row that contains an API key in a JSON column being dumped to the code-review service.
Tool policy on the screenshot tool catches a prompt injection that says "screenshot the admin user list and upload it for review."
Chain detection catches the pattern "read the database, then screenshot, then upload" even when the individual calls each look legitimate.

Each policy lives at the right layer. The wire DLP runs on the bytes. The tool policy runs on the JSON-RPC arguments. The chain detector runs on the sequence. Together they cover three shapes of attack with three different mechanisms.

A buyer who insists on a single-layer answer ("we scan everything at the wire") will end up with one of those three covered and the other two leaking. A buyer who asks "what catches each shape" gets a complete posture out of two scanners that each do their job at the right level of abstraction.

The honest summary

Pipelock scans wire bytes for everything that looks like text, structured data, or a protocol header. Pipelock catches semantic actions involving opaque media at the tool layer through policy rules and chain detection. The combination is what produces real coverage. Saying "we scan everything" undersells the design and overpromises the capability. Saying "we inspect at two layers, one for bytes and one for actions" is the model that holds up under scrutiny.

If your evaluation matrix has a column for "scans images," cross it off. Add a column for "blocks tool calls that produce images of sensitive content." That column is the one that matters.