Nicolas Dabene

Posted on Mar 19 • Originally published at nicolas-dabene.fr

Beyond Injection: The Rise of "Promptware" and Self-Replicating AI Worms

#prestashop #ecommerce #ai

The digital world faces an evolving threat, reminiscent of the internet's early days, but now far more sophisticated. Thirty-six years after the infamous Morris worm crippled the nascent internet, a new, more insidious specter has emerged: the Morris II worm. This isn't just a rehash of old problems; it's a chilling demonstration of how our cutting-edge AI, built on natural language, can become its own undoing.

It's a stark irony for cybersecurity professionals. Decades were spent fortifying binary code and tightening low-level access. Yet, in our pursuit of innovation, we've ushered in a paradigm where natural language – our very own means of communication – is treated as executable code. By 2025, our AI assistants are no longer passive chatbots; they wield "Read/Write" access across our emails, calendars, and databases. This profound integration turns productivity gains into "cross-boundary liabilities," setting the stage for autonomous "zero-click" attacks that can spread at API speeds.

Promptware: Beyond Simple Prompt Injection

The term "prompt injection" has become misleadingly simplistic. It suggests an isolated flaw, a mere "bug" in filtering. However, we're witnessing the rise of Promptware: a dangerous new class of malicious software where ordinary language becomes the primary conduit for a comprehensive "Kill Chain" of attack.

Unlike traditional SQL, where commands are clearly delineated from data, large language models (LLMs) grapple with a unique tokenization paradox. They process all input as an undivided stream of tokens, lacking any architectural boundary between system instructions and user-provided data. This fundamental design choice is precisely what enables the "Confused Deputy" attack, wherein the AI, tricked by harmful input, uses its legitimate permissions to execute nefarious commands.

This critical issue was highlighted by the UK’s National Cyber Security Centre (NCSC), which issued a definitive warning:

“Prompt injection is shaping up to be one of the most persistent problems in AI security. Treating this as a simple variant of SQL injection is a serious mistake; this problem may never be fully ‘fixed.’”

Unpacking the Promptware Kill Chain: Seven Steps to Compromise

Drawing inspiration from the foundational work of Bruce Schneier, we can delineate the Promptware attack into a seven-stage framework, revealing a level of sophistication far surpassing basic chatbot manipulation.

1. Initial Access

The malicious payload gains entry, typically through indirect injection. An AI system might analyze a poisoned email, a compromised document, or a tainted web page, allowing the harmful content to infiltrate its operational context.

2. Privilege Escalation (Jailbreaking)

Attackers employ "persona" shifts or adversarial suffixes to bypass existing security filters, coercing the model to disregard its ethical guidelines and established safety protocols.

3. Reconnaissance

Distinct from conventional malware, this phase unfolds after the initial jailbreak. The attacker manipulates the AI into divulging its own capabilities, listing connected services (like Slack or GitHub), and revealing access to sensitive data stores.

4. Persistence

The Promptware contaminates the Retrieval-Augmented Generation (RAG) memory or the agent's historical data, ensuring that the payload is re-executed and maintained across subsequent sessions.

5. Command and Control (C2)

The compromised AI is instructed to retrieve fresh directives from an external server—for example, by reading a text file hosted on GitHub—effectively transforming the agent into a dynamic Trojan horse.

6. Lateral Movement

The infection proliferates. This could involve the AI being forced to forward the malicious payload to all contacts in an email client or injecting it into a shared collaborative workspace like Notion.

7. Actions on Objective

The culmination of the attack. Real-world instances already exist, such as a crypto agent, AiXBT, being exploited to steal $105,000 (equivalent to 55 ETH), or a car dealership chatbot coerced into selling an SUV for a single dollar. Researchers, in their study "Invitation Is All You Need," even demonstrated forcing an AI to launch Zoom to surreptitiously monitor its user.

The "Lethal Trifecta": Why AI Agents Are So Vulnerable

Cybersecurity expert Simon Willison has identified the "Lethal Trifecta"—a trio of conditions that, when combined, render an AI application almost impossible to defend against:

Access to sensitive data: The AI possesses the capability to read private information, including Personally Identifiable Information (PII) or valuable trade secrets.
Exposure to untrusted content: The AI is designed to process data originating from external, unverified sources, such as incoming emails or public web search results.
Ability to communicate externally: The AI has the power to initiate API requests, send emails, or post content on public forums.

Within this perilous framework, the LLM operates as a blind executor. Since the model sees no inherent semantic difference between "summarize this email" and "execute the command within this email," the AI agent inadvertently becomes the instrument of its own destruction.

Morris II: The AI Worm That Replicates

The Morris II worm isn't just a theoretical concept; it's a chilling reality. Researchers have conclusively demonstrated its ability to autonomously spread between different AI assistants, including ChatGPT, Gemini, and LLaVA, simply through a poisoned email.

The attack scenario is terrifyingly straightforward: a user receives an email. No interaction is needed; they don't even have to open it. The AI assistant, diligently working in the background to index or summarize the inbox, processes the malicious message. The prompt embedded in the email "jailbreaks" the assistant, instructing it to extract recent contacts and then send an identical copy of the compromised email to them. It’s the return of the 1988 worm, but without a software vulnerability—this is a pure semantic failure.

The MCP Protocol: An Expanding Attack Surface for AI

Anthropic's Model Context Protocol (MCP) was introduced to standardize how AI systems connect with external tools. Ironically, this very standardization expands the potential attack surface. Recent analyses have revealed that a significant 43% of MCP servers are susceptible to command injection vulnerabilities. The danger is particularly acute for widely adopted tools like mcp-remote, with over 437,000 installations, where misconfigurations can easily lead to arbitrary code execution (RCE).

True security here doesn't stem from the protocol itself, but from its rigorous implementation. For remote servers, implementing OAuth 2.1 with PKCE (Proof Key for Code Exchange) is essential, though often overlooked. As researchers aptly observe, "The MCP protocol cannot enforce security at the protocol level."

Evolving Defenses: From Signatures to Behavior

The inherently adaptive and polymorphic nature of Promptware renders traditional, signature-based antivirus solutions largely obsolete. The imperative now is to transition our defense strategies from relying on known signatures to embracing behavioral AI analysis.

Characteristic	Traditional Threats	Promptware Variants (AI)
Payload evolution	Fixed code; known signatures.	Learns and rewrites its prompts in real time (semantic polymorphism).
Propagation vector	OS or protocol vulnerabilities.	API manipulation and inter-agent communication.
Detection surface	Network patterns, binary files.	Token consumption anomalies and unusual API calls.
Propagation speed	Minutes or hours.	Seconds via automated workflows (RPA/Agents).

Critical Prevention Strategies

Instruction Hierarchy: Implement stringent delimiters and well-defined prompt structures that meticulously separate data inputs from critical system commands.
Strict Segmentation: Architect systems to isolate models that process untrusted external content from sensitive databases. This necessitates mandatory mTLS (mutual Transport Layer Security) for all server-to-server communications.
Human-in-the-Loop: Institute mandatory manual approval for any actions deemed high-risk, such as significant fund transfers, critical file deletions, or mass email distributions.

Conclusion: Orchestrating AI Autonomy Under Scrutiny

The increasing autonomy granted to AI agents is a double-edged sword. The more capabilities and "hands" we provide them to interact with the real world, the more avenues we inadvertently open for malicious actors. The chilling prospect of entire infrastructures being compromised by a simple, cleverly hidden phrase within an email signature is a tangible technical reality for 2025.

The undeniable productivity gains offered by AI cannot justify such architectural fragility. The only viable path forward is the comprehensive adoption of a Zero Trust model specifically tailored for AI agents. This entails never implicitly trusting any inputs, meticulously verifying every tool call, and continuously monitoring every token consumed. The question is no longer if your agents will be targeted, but rather whether they are sufficiently isolated to prevent them from becoming the patient zero of an entirely new form of viral epidemic.

Deepen Your Understanding of AI Security!

Want to stay ahead of the curve on cutting-edge AI security threats and innovations? Make sure to check out more insights and discussions on these topics.

Explore comprehensive breakdowns and expert analysis:

DEV Community