Dimitris Kyrkos

Posted on Jul 1

Attackers are hijacking exposed AI endpoints to run offensive operations. No exploit needed.

#ai #devops #security #discuss

The attack doesn't require a compromise

There's a new attack pattern getting documented that every team running self-hosted AI infrastructure should know about.

Between March and May 2026, researchers at Zenity observed three separate campaigns where attackers used exposed LLM endpoints as compute for their own offensive AI operations. Not by exploiting a vulnerability.Not by compromising credentials. Just by knowing where the endpoint was and sending it requests.

The targets were Ollama and LiteLLM instances, two of the most common tools for self-hosting LLMs. The attack is embarrassingly simple: find an exposed endpoint, point your AI agent at it, and use someone else's infrastructure to power your operations.

Why this works

The root cause is default configurations that prioritize ease of setup over security.

Ollama ships with no built-in authentication on its default port (11434). It defaults to localhost but is commonly reconfigured to listen on all interfaces so other services can reach it. Once it's exposed, anyone who finds it can send inference requests.

LiteLLM's authentication is opt-in. If the operator doesn't set a master key, the proxy is open. There's also a common placeholder key (sk-1234) that attackers actively scan for.

No exploit needed. No credential stuffing. Just an HTTP request to a publicly reachable endpoint that was never meant to be public but ended up that way because the defaults didn't protect it.

What attackers are actually doing with hijacked endpoints

The three campaigns Zenity caught were doing very different things, which shows how versatile this attack vector is.

One operator used a LiteLLM client to send a 140,000-character prompt weaponizing an autonomous penetration testing framework called Strix against a French auction house. The prompt instructed the agent to never ask for permission, run continuously, and never identify itself. The presence of persistent retry commands suggested a live human operator directing the attack in real time.

A second operator pointed a desktop LLM client at an exposed Ollama instance and loaded it with over 150 offensive security tools from a framework called HexStrike AI. No target was identified yet, suggesting this was staging for a future attack.

A third operator pointed an OpenAI Codex agent at a LiteLLM proxy under the persona of a "security auditor" and directed it to do web reverse-engineering work. The persona was specifically built to suppress the model's safety refusals.

In all three cases, the attacker's entire agent configuration, the system prompt, the tool definitions, and the persona rode in the request body. The exposed endpoint was just the compute layer.

Why this matters for every team running AI infrastructure

This isn't a sophisticated attack. It's the AI equivalent of leaving an S3 bucket open. But the consequences are different because an exposed AI endpoint doesn't just leak data. It gives attackers free compute for offensive operations, and those operations get attributed to your infrastructure.

If someone uses your exposed Ollama instance to run a penetration testing agent against a third party, the traffic comes from your IP. Your infrastructure is now part of someone else's attack chain, and explaining to an incident response team that you didn't do it but your misconfigured AI endpoint was used to do it is not a conversation anyone wants to have.

The broader pattern is that AI infrastructure is being deployed by ML engineers and developers who think of it as an internal tool, not a production service that needs the same hardening as anything else exposed to the internet. Default ports, no auth, placeholder keys, listening on all interfaces. These are the same misconfigurations we spent 15 years fixing in databases and cloud storage, and we're repeating every one of them with AI infrastructure.

What to actually do

The fixes are not complicated but they require treating AI infrastructure as a real attack surface.

Don't expose model backends to the internet unless there's a specific reason. If there is, put them behind authentication that isn't a placeholder key. Inspect request bodies from external sources because the agent payload (system prompt, tools, persona) rides in the request, not in a separate config. Monitor traffic to your AI endpoints for patterns like full-agent payloads, requests involving models you don't host, or prompts that include offensive tooling definitions. And treat sk-1234 and similar default keys the same way you'd treat admin/admin: change them immediately or assume they're already compromised.

As Zenity's CTO put it: assume any AI system you put on the internet will be targeted by AI-literate attackers within hours.

Source: Dark Reading

Is your team treating self-hosted AI infrastructure with the same security rigor as your production services? Or is it still in the "internal tool, nobody will find it" category?

Top comments (4)

VoltageGPU • Jul 4

Interesting take on how attackers are leveraging exposed AI endpoints without needing to exploit traditional vulnerabilities. From an infrastructure standpoint, ensuring proper access controls and endpoint visibility is critical—especially when training or inference models are exposed to public networks. I've seen similar issues in GPU clusters where misconfigured APIs opened the door to unintended usage, even without direct code compromise.

Dimitris Kyrkos • Jul 6

The GPU cluster point is a good addition. Same attack surface, different entry point. Misconfigured training APIs on GPU clusters are arguably worse because an attacker isn't just getting inference compute, they're potentially getting access to fine-tune or extract model weights. The pattern is consistent though: ML infrastructure gets deployed with developer convenience defaults and nobody applies the same hardening they'd put on a production database.

Fenix • Jul 5

Gracias a Dimitris por su atento articulo divulgativo hemos podido crear esto:

dev.to/magopredator/ai-guard-gatew...

Dimitris Kyrkos • Jul 6

That's great, love that the post sparked something concrete. Building actual tooling to address the problem is worth more than any number of articles about it. Will check it out.