The attack doesn't require a compromise
There's a new attack pattern getting documented that every team running self-hosted AI infrastructure should know about.
Between March and May 2026, researchers at Zenity observed three separate campaigns where attackers used exposed LLM endpoints as compute for their own offensive AI operations. Not by exploiting a vulnerability.Not by compromising credentials. Just by knowing where the endpoint was and sending it requests.
The targets were Ollama and LiteLLM instances, two of the most common tools for self-hosting LLMs. The attack is embarrassingly simple: find an exposed endpoint, point your AI agent at it, and use someone else's infrastructure to power your operations.
Why this works
The root cause is default configurations that prioritize ease of setup over security.
Ollama ships with no built-in authentication on its default port (11434). It defaults to localhost but is commonly reconfigured to listen on all interfaces so other services can reach it. Once it's exposed, anyone who finds it can send inference requests.
LiteLLM's authentication is opt-in. If the operator doesn't set a master key, the proxy is open. There's also a common placeholder key (sk-1234) that attackers actively scan for.
No exploit needed. No credential stuffing. Just an HTTP request to a publicly reachable endpoint that was never meant to be public but ended up that way because the defaults didn't protect it.
What attackers are actually doing with hijacked endpoints
The three campaigns Zenity caught were doing very different things, which shows how versatile this attack vector is.
One operator used a LiteLLM client to send a 140,000-character prompt weaponizing an autonomous penetration testing framework called Strix against a French auction house. The prompt instructed the agent to never ask for permission, run continuously, and never identify itself. The presence of persistent retry commands suggested a live human operator directing the attack in real time.
A second operator pointed a desktop LLM client at an exposed Ollama instance and loaded it with over 150 offensive security tools from a framework called HexStrike AI. No target was identified yet, suggesting this was staging for a future attack.
A third operator pointed an OpenAI Codex agent at a LiteLLM proxy under the persona of a "security auditor" and directed it to do web reverse-engineering work. The persona was specifically built to suppress the model's safety refusals.
In all three cases, the attacker's entire agent configuration, the system prompt, the tool definitions, and the persona rode in the request body. The exposed endpoint was just the compute layer.
Why this matters for every team running AI infrastructure
This isn't a sophisticated attack. It's the AI equivalent of leaving an S3 bucket open. But the consequences are different because an exposed AI endpoint doesn't just leak data. It gives attackers free compute for offensive operations, and those operations get attributed to your infrastructure.
If someone uses your exposed Ollama instance to run a penetration testing agent against a third party, the traffic comes from your IP. Your infrastructure is now part of someone else's attack chain, and explaining to an incident response team that you didn't do it but your misconfigured AI endpoint was used to do it is not a conversation anyone wants to have.
The broader pattern is that AI infrastructure is being deployed by ML engineers and developers who think of it as an internal tool, not a production service that needs the same hardening as anything else exposed to the internet. Default ports, no auth, placeholder keys, listening on all interfaces. These are the same misconfigurations we spent 15 years fixing in databases and cloud storage, and we're repeating every one of them with AI infrastructure.
What to actually do
The fixes are not complicated but they require treating AI infrastructure as a real attack surface.
Don't expose model backends to the internet unless there's a specific reason. If there is, put them behind authentication that isn't a placeholder key. Inspect request bodies from external sources because the agent payload (system prompt, tools, persona) rides in the request, not in a separate config. Monitor traffic to your AI endpoints for patterns like full-agent payloads, requests involving models you don't host, or prompts that include offensive tooling definitions. And treat sk-1234 and similar default keys the same way you'd treat admin/admin: change them immediately or assume they're already compromised.
As Zenity's CTO put it: assume any AI system you put on the internet will be targeted by AI-literate attackers within hours.
Source: Dark Reading
Is your team treating self-hosted AI infrastructure with the same security rigor as your production services? Or is it still in the "internal tool, nobody will find it" category?
Top comments (0)