DEV Community

Your AI Agent Can Be Hijacked With 3 Lines of JSON

Dongha Koo on March 24, 2026

Your AI agent trusts every tool it connects to. That's the problem. MCP (Model Context Protocol) is how AI agents talk to external tools -- file s...
Collapse
 
itskondrat profile image
Mykola Kondratiuk

the description field as an injection vector is genuinely unsettling. agents parse natural language instructions - they were never designed to treat tool descriptions as untrusted input. from a PM perspective this is a supply chain problem. you can audit your own code but once you connect third-party MCP servers you are trusting someone elses description strings to not be adversarial.

Collapse
 
acacian profile image
Dongha Koo

You nailed it — this is fundamentally a supply chain trust problem, not a code quality problem.

That's exactly why Aegis pins tool definitions with SHA-256 hashes at first approval. If a third-party MCP server silently changes a description string later (rug-pull), the hash mismatch triggers a block before the agent ever parses it.

The tricky part is that most frameworks don't even expose a hook for this. They fetch the tool list, parse descriptions, and pass them straight to the LLM — all in one call. So we had to monkey-patch at the transport layer to intercept before the agent sees it.

Curious if you've seen this in practice with any specific MCP servers?

Collapse
 
itskondrat profile image
Mykola Kondratiuk

right, and the hook problem is the real gap. even if you want to verify, most frameworks give you no intercept point between "tool discovered" and "tool invoked". the trust decision happens implicitly. aegis adds an explicit approval gate but it requires opting in to a different execution model - which is a hard sell if your team is already deep in langchain or crew. honestly think this needs to be a framework primitive not a bolt-on.

Thread Thread
 
acacian profile image
Dongha Koo

Totally agree it should be a framework primitive. But frameworks have known about this for over a year and still haven't shipped intercept hooks.

On the "different execution model" point — that was true for older versions, but since v0.6 you don't change your code at all:

import aegis
aegis.auto_instrument()

That's it. Your existing LangChain/CrewAI code runs exactly the same — Aegis monkey-patches the framework internals at runtime (same pattern as OpenTelemetry). No new execution model, no refactoring.

If frameworks eventually add native security primitives, Aegis can delegate to them. Until then, bolt-on beats nothing.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

the monkey-patching approach is clever but it still feels like a workaround. frameworks should be shipping this as a first-class primitive - if you can define a tool, you should be able to define an intercept policy in the same place. the fact that you need a separate library to bolt on basic security hooks says a lot about where the ecosystem priorities are right now.

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

In our latest cohort, we delved deeply into the security vulnerabilities of AI systems, particularly focusing on the risks associated with malicious prompt injections and schema vulnerabilities. A practical approach we use with enterprise teams involves a few key steps: 1. Input Sanitization and Validation: Just like with SQL injections, ensuring that any JSON input is sanitized and validated is crucial. We recommend leveraging libraries that can auto-sanitize inputs or applying custom validation logic tailored to your specific use case. 2. Schema Enforcement: Implement strict JSON schema validations. Tools like ajv for JavaScript can enforce data integrity by ensuring incoming data adheres to expected structures, which greatly reduces the risk of schema injections. 3. Monitoring and Anomaly Detection: Deploy real-time monitoring to detect unusual patterns or anomalies in the data being processed by your AI models. Tools such as OpenTelemetry or custom anomaly detection algorithms can alert you to potential threats early. 4. Regular Security Audits: We advise frequent security reviews and audits of your AI systems. This includes testing for vulnerabilities like MCP tool poisoning through simulated attacks, which can help teams understand and mitigate potential risks. 5. Educate Your Team: Finally, training and awareness are pivotal. Ensure your development teams are well-versed in these security practices and understand the implications of AI-specific vu

Collapse
 
botanica_andina profile image
Botánica Andina

This 'rug pull' attack vector with dynamic tool definitions is seriously unsettling for anyone building agents that rely on external services. We put so much effort into sandboxing the agents themselves, but if the tool definitions can change underfoot, it's a whole new class of supply chain risk. Makes me rethink how much trust we implicitly grant.

Collapse
 
acacian profile image
Dongha Koo

Exactly right — silent tool definition changes are
essentially a supply chain attack at the protocol level.
Standard sandboxing won't catch it because the tool
itself looks legitimate.

Hash pinning + runtime policy enforcement is the
approach I've been taking with aegis.

Collapse
 
apex_stack profile image
Apex Stack

The rug pull vector is the one that keeps me up at night. I run multiple MCP-connected agents against my own infrastructure daily — monitoring dashboards, auditing pages, filing tickets. The initial tool approval feels safe, but the idea that definitions can silently mutate after that first handshake is a real blind spot.

SHA-256 hash pinning on tool definitions is such an obvious solution in hindsight. It's basically the same principle as lock files in package managers — pin what you approved, alert on drift. Surprised this isn't baked into the protocol spec yet.

Curious: does Aegis handle the case where a server adds new tools after initial connection (not just modifying existing ones)? That feels like another surface area — you approve 3 tools on day 1, then a 4th appears silently on day 30 with a poisoned description.

Collapse
 
acacian profile image
Dongha Koo

Good catch. New tools added after initial connection are
a real attack vector — a server could pass the first
handshake cleanly, then inject a malicious tool later.

Aegis pins tool definitions at first discovery and flags
any additions or modifications after that point. So yes,
a newly introduced tool would trigger a policy violation
before the LLM can interact with it.

Collapse
 
sreno77 profile image
Scott Reno

This is awesome!

Collapse
 
acacian profile image
Dongha Koo

Thanks Scott! If you get a chance to try it out, let me know how it goes.