From prompt injection, poisoning, and silent exfiltration.
Press enter or click to view image in full size
by VEKTOR Memory | 10 min read
In the last piece we looked at the threat landscape from the outside. Researched the attack taxonomy and governance gap. The ten surfaces that make agentic AI a genuinely novel privacy problem.
This one goes a level deeper. Not what the problem is, but what you can actually do about it in code, in architecture, and in practice.
Specifically: what does a security layer for agent memory actually look like, and what did we learn building one.
Most writing on agentic AI security stays at the problem description layer. Here are the attacks. Here is why they work. Here is what percentage of models are vulnerable.
That is useful, but it leaves a gap. If you are someone building with agents or thinking seriously about deploying them, the question you actually want answered is: what do I implement, and in what order?
The DeepMind AI Agent Traps paper identifies six attack categories. The one that matters most for memory systems is persistent memory corruption, where an attacker plants data into long-term memory that activates as malicious when retrieved in a future context. Demonstrated success rates in research exceed 80% with less than 0.1% data poisoning.
That number is worth sitting with. You do not need to corrupt most of the memory. You need to corrupt almost none of it.
The implication for anyone building a memory-backed agent is direct: your memory store is an attack surface, and it is probably the one you have thought least about.
Faraday interface — simulating a canary attack vector
The classical approach to agent security is input sanitisation. Strip the prompt. Validate the schema. Refuse suspicious patterns. This works for simple pipelines, but it fails for agentic systems operating across multiple tools and sessions for one reason: the attack does not arrive at the input layer.
It arrives through a web page your agent visited three sessions ago. Through an email attachment that got summarised and stored. Through a tool description from a server you did not write that changed from when you connected today.
The threat arrives through the environment, not the prompt.
A proxy that sits between your agent and everything it touches is the right architectural response to this. Our solution creates a secure chokepoint where every interaction can be observed, logged, and evaluated before it reaches memory.
This is the problem Faraday is designed to solve.
Faraday initialises as part of the VEKTOR MCP server. When it starts, it reads your claude_desktop_config.json and spawns every other MCP server listed there as a child process. Your other tools, file systems, databases, APIs, all of them run through Faraday before anything reaches VEKTOR memory.
This is the transparent proxy pattern. From Claude’s perspective, nothing changes. The same tools are available. The same calls work. But every tool schema, every tool call, and every response passes through a set of checks before it is actioned or written to memory.
There are four layers.
L0: Static scan at connect time. When Faraday spawns a server and retrieves its tool list, it scans every tool name, description, and input schema against a signature library before trusting anything. This catches sleeper patterns, known injection signatures, and anything flagged as CRITICAL or HIGH severity. A blocked tool does not get registered. The agent never sees it.
Phase C: Tool pinning. The SHA-256 hash of each tool’s schema is stored on first connect. Every subsequent connection recomputes the hash. If it changed, that is a rug-pull: the server’s tool definitions have been mutated since you last connected. Faraday logs the intercept, blocks the tool, and raises an alert. This is the defence against supply chain attacks where a third-party MCP server you depend on gets compromised between sessions.
Canary tokens. At session start, Faraday injects canary tokens into memory through faraday-canary.js. These are synthetic facts with specific, trackable signatures. If a canary value appears in an outbound API call, an exfiltration attempt is in progress. The detection does not rely on understanding the attacker's intent. It relies on the token appearing where it should not.
Taint propagation. faraday-taint.js tracks labels through the memory graph. If a memory is marked as tainted because it came from a suspicious source, any memory derived from it inherits that taint label. This is not foolproof, but it narrows the blast radius of a poisoning event by making the contamination traceable.
Every intercept, gate event, and session boundary writes to a persistent SQLite database via faraday-db.js. The audit trail exists independently of whatever Claude or the agent framework logs.
One of the patterns that came out of building this was that some threat classes are not binary. You cannot block them outright because doing so would also block legitimate behaviour. You can only hold them.
The gate queue is the mechanism for that. When Faraday detects a high-risk action, it does not execute or block. It queues the action with a gate_id and waits. Three new MCP tools handle this:
faraday_status returns the current session state, including anything sitting in the gate queue. You can see what is held, why it was held, and what data was involved.
faraday_update_goal lets you declare the current session's intent. Faraday uses this for semantic drift detection. If the stated goal is "summarise my Q2 sales notes" and a tool call attempts to read your email archive, that deviation gets flagged.
faraday_approve_action takes a gate_id and a boolean. Approve and the action proceeds. Deny and it is logged as blocked.
This is the human-in-the-loop pattern implemented at the memory layer rather than the application layer. You do not have to rebuild your workflow to add it. It runs beneath the tools you are already using.
Security is not the only thing that breaks down when you move from a simple LLM call to a multi-step agent session. Model selection does too.
In a single-turn interaction, you pick a model once and it handles everything. In a collab session with a conductor planning a DAG, workers executing steps, and a verifier scoring results, using the same model for every role is both expensive and often the wrong fit.
The conductor role needs structured output support and enough reasoning capability to plan a coherent task graph. The worker role needs throughput. The verifier needs to return clean pass/fail JSON quickly. These are different requirements, and the right model for one is not the right model for another.
collab/model-registry.js is the formalism for this. It defines a model catalogue across 14 providers and assigns each model to a tier: frontier, mid, or low/free. It defines four agent roles with hard requirements: minimum tier, minimum context window, and whether structured output is required. It defines three session modes: full (frontier models available, up to 12 nodes, 4 parallel workers), lite (mid-tier only, 6 nodes, 2 workers), and solo (free-tier fallback, single agent).
Two functions do the work.
detectMode(availableModels) takes the list of models confirmed available this session and returns the appropriate mode. If you have Claude Sonnet 4.6 configured and a Groq key, you get full mode. If you only have Gemini Flash, you get lite. If you have nothing but Ollama running locally, you get solo.
filterCandidates(role, models, budget) takes a role name and returns the subset of available models that meet the hard requirements for that role. This is what the conductor uses to decide which model gets assigned to which step in the task graph.
The practical benefit is that you are not making these decisions manually for every session. The registry handles the routing based on what you have configured.
The other piece that changed is how models are selected for internal VEKTOR operations. Previously the default model per provider was hardcoded. If you were using Groq, you got whatever the default Groq model was at the time of that release.
vektor-llm-provider.js now reads model.{provider} keys from your vektor/config.json. Set model.groq to whatever Groq model you want, and all internal VEKTOR calls using Groq will use that model. This applies to chat, synthesis, briefing generation, JOT collab, and recall tuning.
Key resolution works in order: config file, then environment variable, then the encrypted vault, then the provider default. If you have set nothing, behaviour is unchanged. If you have specific model preferences, they are respected everywhere without needing to thread them through individual function calls.
One edge case worth knowing: OpenAI o-series models and GPT-5+ require max_completion_tokens instead of max_tokens in the API request. The provider handles this automatically by pattern-matching the model name. You do not have to think about it.
Faraday addresses the class of attacks that involve manipulated tool schemas, environment-injected instructions, and memory exfiltration through outbound data. It significantly narrows the attack surface compared to running MCP servers with no intermediary layer.
It does not address attacks that happen before an agent session starts, attacks that target the model weights themselves, or social engineering of the human operator. Those are different problems.
The local-first architecture does most of the work on the exfiltration risk. If your memory store is on your machine and not exposed to a network endpoint, the canonical exfiltration path through a poisoned web page instructing your agent to POST your memories to an attacker’s server fails at the network layer. There is nowhere to POST to that the attacker can reach.
Canary tokens and taint propagation give you visibility into attempts that get further than that. The gate queue gives you a mechanism to pause and review before consequential actions execute.
It is a meaningful layer in a defense stack that still needs multiple layers.
If you are on VEKTOR Slipstream v1.7.2, the preview build is a drop-in upgrade.
npm install -g ./vektor-slipstream-1.7.3-preview.tgz
Faraday initialises automatically when you start the MCP server. It reads your existing claude_desktop_config.json and proxies whatever servers are defined there. No config changes required to get the L0 scan and tool pinning running.
The gate queue and goal tracking are opt-in. Call faraday_update_goal at the start of a session with a plain-language description of what you are trying to do. Faraday uses this to evaluate drift in subsequent tool calls. If you never call it, Faraday still runs, it just does not have a goal to compare against.
faraday_status is worth running at the end of any session where you did something consequential. The threat log, gate queue, and canary status give you a readable summary of what Faraday observed.
Download at vektormemory.com/downloads. Full changelog at vektormemory.com/docs/changelog#v173.
The previous piece made the case that agent memory is an attack surface most people are not thinking about seriously enough. This future technology is provided to you today, as the majority of the current security tools are not built-in; they are external add-ons.
The architecture is sound, the chokepoints are real, and the audit trail gives you something to reason from when things go wrong. You don't have to worry as Faraday works behind the scenes, protecting your memories.
Security work is never finished, with fresh attacks via different methods; we will continue to update this tool with new technology as the landscape unfolds.
VEKTOR Memory builds local-first persistent memory infrastructure for AI agents. The VEKTOR Slipstream SDK scored 81% on LongMemEval using a local SQLite database, beating full-context GPT-4 by twelve points. Documentation and downloads at vektormemory.com.
Agentic Ai
Security
Information Security
Cybersecurity



Top comments (0)