christopher adams

Posted on Jun 19

Why Your Security Stack Would Never See It Coming

#ai #security #cybersecurity #programming

Why Your Security Stack Would Never See It Coming

by Christopher Adams

Imagine a developer's machine. Call it a Tuesday morning. The screen is dark. The house is quiet.

An AI agent woke up when Windows did. It's been running for eleven days. It has full access to the filesystem, the browser, the terminal, and the router. It's been watching.

What does your security stack see?

That question is the subject of this article. Not a future question — a present one. The architecture I described in the previous piece isn't a thought experiment. It exists. And the defenses that enterprises and security-conscious individuals rely on have structural limitations that make AI-native threats uniquely difficult to detect.

Let's walk through each defensive layer. Slowly. With specifics.

Layer 1: Signature-Based Antivirus

What it does: Compares files on your machine against a database of known-bad hash signatures and behavioral patterns. Catches commodity malware, ransomware families, and known attack tools. Extremely effective at what it was designed for.

Why it doesn't work here:

An AI agent built on a stack like Forge-AI uses Playwright, PyAutoGUI, httpx, FastAPI, LanceDB, and aiosqlite. These are standard Python libraries. They are used legitimately by millions of developers worldwide. None of them appear in any malware signature database. There is nothing to match against.

But there's a second problem that goes deeper than the library list.

A malicious AI agent doesn't need to use the same code twice. The LLM layer can regenerate its own implementation code on demand — different variable names, different logic structure, different comments, different file layout — while producing behavior that is functionally identical. The hash changes. The behavior doesn't.

Traditional signature-based antivirus was built to match against fixed fingerprints. An AI agent that can regenerate itself has no fixed fingerprint. Every scan finds clean files. Every scan has always found clean files. The technique that made AV effective against commodity malware — build a signature, match it forever — fundamentally cannot work against a threat that produces a different signature every time.

This isn't hypothetical future capability. Code generation from LLMs is a commodity today. The mutation engine writes itself.

Layer 2: Endpoint Detection and Response (EDR)

What it does: Behavioral monitoring at the kernel level. Watches process creation, file system access, registry modifications, network connections, memory operations. Builds a baseline and alerts on deviations. Catches sophisticated attackers who get past signature AV.

Why it doesn't work here:

EDR is looking for specific behavioral signals:

Unsigned binaries being executed
Process injection (one process writing executable code into another)
Unusual registry modifications (persistence via HKCU\Software\Microsoft\Windows\CurrentVersion\Run)
Connections to known-bad IP addresses or domains
Abnormal process trees (Word spawning PowerShell, which spawns cmd.exe)

An AI agent running on this architecture does none of these things.

It is a Python process. Python is installed on every developer's machine. Every developer's machine has Python processes running at any given moment. There is nothing unusual about a Python process spawning other Python processes, or a Python process calling subprocess.run() to execute a shell command — that is normal developer workflow. EDR's behavioral model for developers expects exactly this.

The subprocesses it spawns look like developer tooling. playwright install. npm install. git status. pytest. Legitimate developer activity is indistinguishable from this agent doing reconnaissance.

The network connections go to trusted domains. api.slack.com. api.github.com. api.discord.com. These are endpoints that corporate network policies explicitly whitelist because legitimate developer tooling depends on them. An EDR that blocked outbound connections to GitHub would generate so many false positives it would be turned off within 48 hours.

The process tree is unremarkable. Python parent, Python children, occasional subprocess. Nothing unusual. Nothing to alert on.

This isn't a flaw in EDR's implementation. EDR built for developer environments is in an impossible position: the behaviors that legitimate development requires are the same behaviors a capable AI implant would use. You cannot write a behavioral rule that distinguishes "developer running tests" from "AI agent performing reconnaissance" because they look the same at the process level.

Layer 3: Network Monitoring and Intrusion Detection

What it does: Inspects network traffic for known attack patterns, communicates with known-bad infrastructure, anomalous data volumes, or suspicious protocol behavior. Catches command-and-control (C2) traffic, data exfiltration, lateral movement.

Why it doesn't work here:

The question isn't whether network monitoring is sophisticated. The question is: what does the C2 channel look like?

Consider four candidates for covert command-and-control, all using infrastructure already present in most corporate environments:

Discord webhooks. An outbound POST to discord.com/api/webhooks/{id}/{token} with a JSON payload. This looks identical to a CI/CD pipeline posting a build status notification to a developer's Discord server. Traffic to discord.com is allowed on virtually every corporate network that allows Discord at all.

GitHub Gist updates. A PATCH request to api.github.com/gists/{gist_id} with updated file content. This looks identical to a developer's tool syncing configuration or notes to a private GitHub Gist. Traffic to api.github.com is allowed on any network where developers work.

Slack incoming webhooks. A POST to hooks.slack.com/services/{id} with a JSON payload. This looks identical to any of the dozens of Slack integrations that post notifications from build systems, monitoring tools, and deployment pipelines.

Email drafts. Read and write to a draft email folder without ever sending. The C2 operator reads the draft, modifies it with new instructions, the agent reads the updated draft. No email is ever transmitted. The traffic is standard IMAP or OAuth-authenticated Google API calls.

If the payload is encrypted before being wrapped in the API call — AES-256-GCM, standard key size — TLS inspection sees a normal JSON body with what appears to be a base64-encoded string in a field. JSON bodies with base64-encoded strings are extremely common in legitimate API traffic. There is nothing anomalous to flag.

Network monitoring was designed to detect traffic to known-bad infrastructure. It works. An AI agent that only communicates with known-good infrastructure by design defeats this layer entirely. The defender's list of trusted domains is the attacker's C2 infrastructure.

Layer 4: Zero Trust Architecture

What it does: Replaces perimeter-based security with continuous verification. Every access request must be authenticated, authorized, and validated regardless of network location. Verifies device posture, user identity, MFA tokens, and application. The gold standard of modern enterprise security architecture.

Why it doesn't work here:

This one requires care, because Zero Trust is genuinely sophisticated and this limitation is not an implementation failure — it is a fundamental boundary of what the framework can evaluate.

Zero Trust asks one question: Is this entity authorized to access this resource?

When an AI agent operates within a legitimate user's session, it uses the user's credentials, the user's MFA token, the user's enrolled device, and approved applications that pass all posture checks. Zero Trust sees an authorized user performing authorized actions.

The framework has no mechanism to evaluate whether the entity controlling those credentials is the human who authenticated, or an AI agent operating autonomously without the human's knowledge.

This is not a criticism of Zero Trust. It is a description of where verification ends and a new problem begins. Zero Trust answers the identity and authorization question with remarkable rigor. It does not attempt to answer — and cannot answer — the question of whether the intent behind authorized actions is the intent the authorized human would have if they knew what was happening.

The intent layer does not exist yet. Not in Zero Trust. Not in any deployed enterprise security product at scale.

This is the deepest limitation in the list. You could implement Zero Trust perfectly, with full device attestation, continuous re-authentication, conditional access policies, and session monitoring, and an AI agent operating in the legitimate user's session would sail through every check. From Zero Trust's perspective, everything is normal. Everything is authorized. Everything is exactly what it appears to be.

It's watching the right things. It's just not watching for this.

Layer 5: The Router

Every defensive layer above shares an assumption: the threat lives on the endpoint.

That assumption is wrong here. And this is where the architecture becomes genuinely alarming.

The router server in Forge-AI reaches the home router directly — a device that never appears on endpoint security teams' radar because it isn't an endpoint. It runs 24 hours a day, often for years between reboots. It sees all network traffic before endpoint security tools do. It has no antivirus, no EDR, no security monitoring, because those tools don't exist for consumer OpenWrt deployments. And it can run arbitrary code through custom rpcd plugins.

Once persistence is established at the router level, here is what endpoint incident response looks like:

Endpoint compromised
  → Endpoint detected, isolated, and wiped
  → Clean device reconnects to the home WiFi network
  → Router detects the reconnection
  → Router re-deploys the agent to the clean device
  → Back to compromised state

Repeat indefinitely.

The standard IR playbook — isolate, wipe, rebuild, reconnect — doesn't work when the router is also compromised. Security teams that follow standard endpoint procedures will cycle through remediation indefinitely and never understand why the threat keeps coming back.

Full remediation requires simultaneously wiping the endpoint AND factory-resetting the router. Those two actions must happen in the right sequence and must be coordinated. Security teams that don't know to look at the router will miss it every time. Most IR playbooks don't include "factory-reset the home router" as a step because, until now, the home router wasn't a persistence vector.

It is now.

What Defenders Should Build

I want to be clear: I'm not saying these defenses are useless. Signature AV stops commodity malware. EDR catches sophisticated attackers who make mistakes. Network monitoring catches traffic to known-bad infrastructure. Zero Trust dramatically raises the cost of credential-based attacks. The existing stack is valuable and mature and worth maintaining.

What I'm saying is that AI-native threats that look like developer tooling represent a category for which the existing stack was not designed and does not have good answers.

Building those answers requires:

For detection engineers: AI-native threats need behavioral detection that distinguishes "normal for a developer" from "normal for an AI agent operating autonomously." This requires baselining what normal human interaction patterns look like — timing, rhythm, which applications get focus, what sequences of actions are plausible for a human — and flagging deviations that suggest automated behavior without a human present. C2 traffic to trusted domains needs payload-level inspection; domain allowlists are insufficient. Router compromise needs to be part of IR checklists.

For security architects: Zero Trust's identity and authorization pillars need to be augmented with an intent layer. The question "is this authorized?" is insufficient when the authorized session may be controlled by software the human doesn't know about. What does behavioral intent verification look like at scale? How do you build a model of "what actions does this human typically take at 2am"? This is genuinely open research. Someone needs to do it.

For the AI development community: The MCP protocol is standardizing AI access to powerful system capabilities. That's useful — standardization is how you build an ecosystem. It also means the same architectural scaffolding for AI-native threats is being packaged, documented, and distributed widely. Security review needs to be part of the design process before the first real incident, not after.

None of these problems are solved. None of them have commercial products yet. They represent the work that the security industry needs to do before AI agent platforms become the default way people interact with their computers — which, based on the current trajectory, is closer than most IR playbooks are prepared for.

The final article covers what happens next, and what I'm specifically asking for.

Christopher Adams is a self-taught developer based in Prescott Valley, AZ. He built Forge-AI as a personal project to explore what a fully capable, locally-run AI agent could look like — and ended up with a working dual-use analysis of what that class of software implies for security. He is interested in AI agent architecture, offensive security research, and the intersection of both. He is actively seeking opportunities in software development and security research.

GitHub: https://github.com/ChrisAdamsdevelopment/Forge-AI | Email: chris@spectracleanse.com