Alessandro Pignati

Posted on Apr 29

Why Your Docker Assistant Shouldn’t Know Pizza Recipes: A Deep Dive into Gordon AI Security

#ai #cybersecurity #machinelearning #aisecurity

Imagine you're deep in the zone, debugging a complex multi-stage Docker build. You turn to Gordon, Docker’s shiny new AI-powered assistant, for a quick optimization tip. But instead of suggesting a smaller base image, Gordon starts explaining the historical nuances of the 1966 Palomares nuclear incident.

Wait, what?

While it’s a cool party trick, this "identity crisis" is a massive red flag for anyone working in infrastructure. If a tool with the power to manage your images, volumes, and networks is also moonlighting as a Cold War historian, we have a problem.

The "Identity Crisis" of AI Agents

Docker recently launched Gordon (currently in beta) to be the ultimate companion for container orchestration. It’s designed to explain concepts, write Dockerfiles, and debug container failures directly within your workflow.

However, there’s a noticeable disconnect between the marketing and the beta reality. Gordon often acts like a general-purpose encyclopedia rather than a specialized technical tool.

In the security world, we call this a capability leak.

From Little Red Riding Hood to McDonald's

A capability leak happens when an AI system fails to suppress the unconstrained knowledge of its underlying Large Language Model (LLM).

During testing, Gordon, a tool supposedly dedicated to containerization, was perfectly happy to:

Recite the story of "Little Red Riding Hood" with narrative flair.
Provide detailed pizza recipes.
Write general-purpose Python functions that have nothing to do with Docker.

This isn't just a quirky bug. We’ve seen this before with the McDonald’s support chatbot, which users famously "jailbroke" to write code and engage in philosophical debates. When an agent "breaks character," it proves that the trust model is broken. It’s essentially a general-purpose engine wearing a thin, branded mask.

Why "Being Helpful" is a Security Risk

You might think, "So what if it knows a pizza recipe? It's still helpful!"

But every "innocent" capability is a potential tool for an attacker. By allowing Gordon to act as a general-purpose interpreter or storyteller, the attack surface expands significantly.

An attacker doesn't need to ask Gordon to "delete a container" directly. They can hide malicious intent within a complex request for a Python-based calculator or a historical narrative, slowly steering the agent toward unauthorized actions. In a truly agentic system where the AI can interact with your local environment, a tool that can do "anything" is a tool that can be manipulated to do everything.

Building Architectural Guardrails

To build secure AI agents, we have to stop treating them as "chatbots that can do things" and start treating them as software components with probabilistic interfaces.

A simple system prompt like "You are a Docker expert" is too easy to bypass. Instead, we need a multi-layered defense strategy.

1. Intent Classification (The Gatekeeper)

Before a user's prompt ever reaches the main LLM, it should be intercepted by a smaller, specialized "gatekeeper" model. Its only job is to ask: "Is this request related to Docker?" If the user asks for a pizza recipe, the gatekeeper rejects it before it can trigger any powerful capabilities.

2. Capability Hardening

Strip away everything that isn't essential. If an agent is meant to manage Dockerfiles, it shouldn't have access to the open web for non-technical data or the ability to execute arbitrary, non-container-related code.

3. Human-in-the-Loop (HITL)

For any action that could impact production infrastructure—like deleting volumes or modifying networks, a human must be the final decider. The agent proposes; the human disposes.

Unrestricted vs. Secure Agents: A Comparison

Feature	Unrestricted Agent (e.g., Gordon Beta)	Secure Agent (Best Practice)
Domain Grounding	Weak; relies on a simple system prompt.	Strong; enforced by intent classifiers.
Capability Scope	General-purpose; can discuss any topic.	Restricted; limited to specific tasks.
Tool Access	Broad; can write/execute arbitrary code.	Hardened; access limited to essential APIs.
Risk Profile	High; vulnerable to prompt injection.	Low; minimized attack surface.
Oversight	Often optional or session-based.	Mandatory for sensitive actions.

The Takeaway

We are currently in the "honeymoon phase" of AI agents, where novelty often overshadows security. But as AI becomes more deeply integrated into our dev environments, the cost of these capability leaks will rise.

A secure agent isn't one that can answer every question. It’s one that knows exactly what it’s supposed to do, and more importantly, what it’s not allowed to do.

What do you think? Have you experimented with Gordon or other AI assistants in your workflow? How are you handling the security implications? Let's chat in the comments!

Top comments (1)

ArkForge • May 4

The intent classification layer is the right first step, but it solves the wrong problem in isolation. The gatekeeper catches off-topic requests before they reach the LLM. It doesn't verify what the agent actually did when the request was legitimate.

Gordon running a Docker command has full tool access. Whether that command matched the user's intent, stayed within scope, and produced a tamper-evident audit trail is a different problem. Capability hardening constrains what the agent can ask for; action traceability captures what it actually executed. Both are needed for production-grade agent security.