The pipeline nobody draws
Cline reads your codebase, sends fragments to an LLM, receives generated code, writes it to disk, and sometimes executes it. Five steps. Three trust boundaries crossed. Zero end-to-end audit trail.
This is not unique to Cline. Every agentic coding tool -- Cursor, Aider, Continue, Copilot Workspace -- follows the same pattern: read context, call model, apply changes. The pipeline is a supply chain, and like most supply chains, it is invisible until something breaks.
Traditional software supply chains got their wake-up call with SolarWinds and Log4Shell. Agentic supply chains have not had theirs yet. The difference: when a compromised npm package ships malicious code, at least the artifact is static and scannable. When an AI agent generates and executes code in a single session, the artifact exists only in memory until it hits the filesystem. No manifest. No lockfile. No signature.
What an agentic supply chain looks like
Every AI agent pipeline has the same structure, whether it runs in a terminal or behind an API:
flowchart LR
A[User Prompt] --> B[Context Retrieval]
B --> C[LLM Inference]
C --> D[Tool Execution]
D --> E[Side Effects]
B -. "trust boundary 1" .-> C
C -. "trust boundary 2" .-> D
D -. "trust boundary 3" .-> E
style A fill:#1e1e2e,stroke:#818cf8,color:#cdd6f4
style B fill:#1e1e2e,stroke:#818cf8,color:#cdd6f4
style C fill:#1e1e2e,stroke:#f38ba8,color:#cdd6f4
style D fill:#1e1e2e,stroke:#f9e2af,color:#cdd6f4
style E fill:#1e1e2e,stroke:#a6e3a1,color:#cdd6f4
Each arrow crosses a trust boundary. Context retrieval reads files the user may not have intended to share. The LLM returns code that was never reviewed. Tool execution writes to disk or runs shell commands. Side effects propagate to production if the agent has deployment access.
The problem: each component only sees its own inputs and outputs. The LLM does not know what context was injected. The tool executor does not know whether the code was hallucinated or copied from a verified source. The filesystem does not know an agent wrote the file. No single component has a view of the full chain.
The audit gap, concretely
Consider a concrete scenario. An agent-assisted PR pipeline:
- Developer asks the agent to "fix the authentication bug in
auth.py" - Agent reads
auth.py,config.py, and.env(context retrieval) - LLM generates a patch that changes the token validation logic
- Agent applies the patch and runs tests
- Tests pass. Agent opens a PR.
What is auditable today? The PR diff and the test results. What is not auditable:
- Which files the agent actually read (was
.envincluded in the context?) - Which model produced the patch (was it the expected model, or a fallback?)
- Whether the model's response was modified before being applied
- The full prompt that was sent (including any injected system instructions)
- The timestamp and integrity of each step
This is the gap. You can review the output, but you cannot reconstruct the chain that produced it.
Why logs are not enough
Most agent frameworks log actions. Cline logs tool calls. LangChain has tracing. But logs are mutable records written by the same process they describe. A compromised agent can alter its own logs. A misconfigured pipeline can silently drop log entries.
The distinction matters:
| Property | Log | Audit proof |
|---|---|---|
| Written by | The process itself | An independent party |
| Tamper-evident | No | Yes (cryptographic) |
| Timestamped independently | No (system clock) | Yes (RFC 3161 TSA) |
| Reconstructible | Maybe | By design |
| Admissible as evidence | Weak | Strong |
Logs answer "what did the agent say it did?" Audit proofs answer "what can we prove happened?"
Closing the gap: receipts at every boundary
The fix is structural, not incremental. Each trust boundary crossing needs a receipt -- a tamper-evident record that captures inputs, outputs, and metadata at the moment of execution.
Here is what a receipt looks like for a single LLM call inside an agent pipeline:
import hashlib
import json
import time
def create_execution_receipt(step_name, inputs, outputs, model_id):
"""Create a tamper-evident receipt for one pipeline step."""
receipt = {
"step": step_name,
"timestamp": time.time(),
"model": model_id,
"input_hash": hashlib.sha256(
json.dumps(inputs, sort_keys=True).encode()
).hexdigest(),
"output_hash": hashlib.sha256(
json.dumps(outputs, sort_keys=True).encode()
).hexdigest(),
"chain_previous": None # hash of previous receipt
}
# Receipt hash links to previous step, forming a chain
receipt_bytes = json.dumps(receipt, sort_keys=True).encode()
receipt["receipt_hash"] = hashlib.sha256(receipt_bytes).hexdigest()
return receipt
# Example: wrapping an LLM call
context = {"files": ["auth.py"], "prompt": "fix the token validation bug"}
llm_response = {"patch": "--- a/auth.py\n+++ b/auth.py\n@@ ..."}
receipt = create_execution_receipt(
step_name="llm_inference",
inputs=context,
outputs=llm_response,
model_id="claude-sonnet-4-6"
)
# receipt["input_hash"] = "a3f2...bc91"
# receipt["output_hash"] = "7e01...d4f8"
# receipt["receipt_hash"] = "c219...0a3e"
Chain the receipts across all pipeline steps, and you get a supply chain audit trail -- a reconstructible, tamper-evident record of how an output was produced.
From receipts to trust
Individual receipts prove what happened at one step. Chaining them proves the full pipeline executed as claimed. But two pieces are still missing for real-world trust:
Independent timestamping. The receipt should be countersigned by a third-party timestamp authority (RFC 3161), so the agent cannot backdate or reorder steps.
Cross-provider verification. In heterogeneous pipelines (Claude for planning, Mistral for execution, GPT for review), no single vendor can attest to the full chain. The verification layer must be provider-agnostic.
This is the architectural requirement that current frameworks miss. LangSmith traces LangChain. Braintrust traces its own SDK. Cline logs Cline. Each tool audits its own silo. Nobody audits the handoffs between silos -- which is exactly where supply chain attacks happen.
ArkForge is built around this requirement. It is a provider-agnostic certifying proxy: your agent sends requests to the ArkForge endpoint instead of directly to the model, and receives back both the response and a cryptographic proof -- RFC 3161 timestamped, signed by an independent party, verifiable without trusting the agent that generated it. Claude, Mistral, GPT-4o: one API key, any provider, every boundary covered.
What to do today
If you run agentic pipelines in production, three concrete steps reduce the audit gap right now:
Hash your contexts. Before sending anything to an LLM, compute a SHA-256 of the full prompt. Store it alongside the response. This alone makes prompt injection forensics possible after the fact.
Separate the logger from the actor. The process that executes actions should not be the only process recording them. Write audit events to an append-only store that the agent cannot modify.
Treat agent outputs like third-party dependencies. Code generated by an agent crossed a trust boundary. Review it with the same scrutiny you would apply to a new npm package from an unknown author.
The agentic supply chain is real. It runs through your IDE, your CI pipeline, and your deployment scripts. The question is not whether to audit it, but how much you can afford to leave unverified.
Top comments (0)