Paulo Victor Leite Lima Gomes

Posted on Jul 3

local AI runtimes are part of your attack surface now

#ai #security #docker #localai

Local AI has a comforting story: the model runs on your machine, the prompt does not go to a cloud vendor, the code stays nearby, and the demo works on an airplane.

I like local AI.

I also think we are underpricing what it adds to the developer machine.

Docker's security announcements this year include a small cluster of fixes around Docker Model Runner: container-to-host code execution in two inference backends, SSRF in the OCI registry client, and runtime flag injection. Model Runner is the kind of feature developers are going to want. It can pull models from Docker Hub, OCI registries, and Hugging Face, serve them through OpenAI and Ollama-compatible APIs, package model files as OCI artifacts, and connect to coding tools.

That is useful. It is also a lot of trust crossing one laptop.

local does not mean simple

The word "local" makes systems feel smaller than they are.

When someone says a local model is running on their machine, the mental picture is often one process, one model file, one user, and one prompt. That picture is too small.

A local AI runtime sits across several boundaries:

a model distribution channel
a model cache
an inference backend
a local API server
a container or sandbox boundary
GPU drivers and acceleration libraries
developer tools that call the API
source code and credentials on the workstation

That is not one thing. That is a platform.

The Docker Model Runner docs make the shape pretty clear. Models can come from Docker Hub, an OCI-compliant registry, or Hugging Face. They are stored locally and loaded at runtime. The runtime can expose OpenAI and Ollama-compatible APIs. It supports multiple inference engines. On Linux, Docker says the engines run inside a container. On macOS and Windows, they run in sandboxed environments.

None of this is a criticism.

But once the integration exists, it deserves the same kind of boring security thinking we apply to other developer infrastructure.

the model registry is a supply chain

The easiest mistake is treating models as inert data.

Sometimes they are close enough. A model file is not the same thing as a random install script. But the runtime around the model is active software, and the path into that runtime matters.

If your local AI tool pulls from a public registry, parses metadata, follows redirects, talks to an OCI client, caches artifacts, and loads model files into a backend with native code, you have a supply chain. A slightly weird one, but still a supply chain.

That makes the SSRF fix in Docker Model Runner's OCI registry client interesting. SSRF is not exotic. It is one of those old vulnerability classes that keeps reappearing because software keeps learning how to fetch URLs on behalf of users.

In a cloud service, SSRF makes people think about metadata endpoints, internal admin panels, and private networks.

On a developer machine, the shape is different but not harmless. The local runtime may have access to internal registries, VPN-only services, localhost APIs, local credentials, model caches, and project files. It may be reachable from containers. It may be called by an editor plugin while the developer is working on a private repository.

The practical question is not "can a model file be malicious?"

The question is "what does the whole path from model discovery to inference get to touch?"

That is the thing to secure.

the local API is not a toy API

The other easy thing to miss is the API surface.

Local inference products often expose an API because that is how the ecosystem plugs together. An editor, agent, notebook, test harness, web UI, or tiny side project can send requests to the local model. Compatibility with the OpenAI API shape is a gift to developers because it removes a lot of glue code.

But a local API is still an API.

Docker's docs say the Model Runner API is not authenticated, and that any client that can reach it, including other containers on the same Docker network, can pull, load, and run models, and send inference requests.

That may be a perfectly reasonable default for local development. I am not pretending every laptop needs enterprise identity in front of localhost.

But teams should understand the tradeoff.

If other containers can reach the model API, then the model API becomes part of the container network. If a coding tool can reach it, the tool inherits whatever the runtime can do. If the runtime can pull models, load backends, and consume enough CPU, memory, or GPU to make the machine sad, that API is not just a chat endpoint.

It is a local capability endpoint.

And capability endpoints need boundaries.

developer machines are already crowded

Modern developer machines are full of things that were never designed together.

There is Docker Desktop, a package manager, language runtimes, cloud CLIs, browser sessions, SSH keys, local databases, Git credentials, VPN clients, AI coding tools, MCP servers, editor extensions, test containers, and usually some half-forgotten process still listening on a port from Tuesday.

Local AI lands in the middle of that.

It is tempting to give it a special category because the words around it are different: model, inference, tokens, context, embeddings. But operationally it behaves like another privileged developer service. It pulls artifacts. It runs native code. It exposes APIs. It uses acceleration hardware. It integrates with editors and agents. It sees prompts that may include code, logs, stack traces, customer-shaped examples, and secrets people accidentally pasted because humans remain humans.

That is not a reason to panic.

It is a reason to put it in the inventory.

Which machines run local inference? Which runtime? Which version? Which models are allowed? Which registries are trusted? Is the API bound only where it should be? Can containers reach it? Which tools call it? Are prompts logged? Are responses logged? Can the runtime read project files, or is it only receiving text over an API? Who updates the backend when a security advisory drops?

These are boring questions.

Good.

Boring questions are how platforms become survivable.

what i would do first

If I were responsible for developer platform security, I would not start by banning local models.

That would be silly. Developers will use them anyway, and for some workflows they are genuinely useful.

I would start with defaults: keep local AI runtimes updated through the same managed path as Docker Desktop, browsers, and language tooling. Decide which model sources are allowed for work machines. Prefer signed or attributable artifacts where possible. Make the runtime bind only to the interfaces that actually need it. Treat unauthenticated local APIs as local-only unless there is a deliberate reason to expose them to containers or networks.

Then I would separate experiments from work.

A developer trying a random model for a weekend prototype should not have the same policy as a developer working in a repository with production credentials. Personal curiosity and production-adjacent engineering need different lanes.

Finally, I would make the invisible visible. Show which tools are using the local model API. Log enough metadata to debug without storing every sensitive prompt forever. Track runtime versions. Watch for unexpected network exposure. Make it easy to answer whether a vulnerable inference backend is present on managed machines.

Most of that is not glamorous.

It is desktop management, supply chain hygiene, and API boundary work.

Which is exactly the point.

the punchline

Local AI is not automatically unsafe.

It is also not magically safe because it runs on your laptop.

The Docker Model Runner fixes are useful because they make the abstraction visible for a moment. Behind the friendly command is a real runtime with registries, backends, sandboxes, APIs, networks, caches, and developer-tool integrations. That runtime lives next to source code and credentials. It deserves to be treated as part of the engineering environment, not as a toy sitting outside the threat model.

The industry made this mistake with containers for years.

We called them lightweight, portable, developer-friendly units of software. All true. Then we slowly remembered that they also needed image policy, registry controls, runtime isolation, patching, scanning, provenance, network boundaries, and ownership.

Local AI runtimes are entering the same phase.

The right response is not fear. It is maturity.

Use local models when they help. Keep the privacy benefit. Enjoy the speed. Avoid paying a cloud API for every tiny autocomplete experiment.

But put the runtime on the map.

If it can pull artifacts, run inference backends, expose APIs, talk to containers, and plug into coding tools, it is part of your attack surface now.

Treat it that way before an incident teaches the lesson with worse timing.

references

To test my projects, I use Railway. If you want $20 USD to get started, use this link.

DEV Community