I let Claude AI decide whether to patch my Docker vulnerabilities — here's what it found

#security #ai #docker #devops

Every security scanner will tell you what's vulnerable.

None of them will tell you what to actually do about it.

You get a list. CVE IDs, severity badges, affected packages. Then you're alone with the question that actually matters: is this exploitable in my setup, can I safely apply the fix, and does patching this break anything?

I've been building a self-hosted Docker management platform. Last week I wired up an AI layer on top of the vulnerability scanner — not to automate patching, but to automate the reasoning about whether to patch. Here's what happened.

The problem with "critical"

Critical doesn't mean the same thing in every context.

CVE-2025-58050 is a critical vulnerability in pcre2. On paper: patch immediately. In practice: it's in a socket proxy image maintained by a third party. I don't control that Dockerfile. I can pull the latest image and hope the maintainer already shipped the fix — or I can wait. Neither option is obvious from the CVE report alone.

CVE-2026-27143 is a critical vulnerability in Go stdlib. Fix available: upgrade to 1.25.9. Sounds straightforward. The complication: the binary that ships this version of stdlib is cloudflared, Cloudflare's tunnel client. I didn't write it. I can't easily recompile it. The fix depends on Cloudflare publishing an updated release.

Both are "critical." Neither has the same answer. A scanner can't tell you that. A rule can't tell you that. It requires judgment.

What Level 1 does

The first layer of automation is rule-based. No AI, no API calls, no external dependencies.

When a scan completes and finds a critical vulnerability, Level 1 fires an alert. It knows: this image has N critical CVEs, the threshold is 1, therefore alert. Fast, deterministic, always on.

This covers the obvious cases. A container with a known critical CVE should be flagged. That part doesn't need AI.

What Level 1 can't do is answer: should I patch this right now, and at what risk?

What Level 2 does

When Level 1 detects a critical vulnerability, Level 2 kicks in if an Anthropic API key is configured. It builds a structured prompt with everything the model needs to reason about the situation: the image name, the CVEs, the packages affected, the versions with fixes available, and whether those fixes represent a patch bump, a minor version change, or a major version jump.

Claude Haiku then returns a structured analysis:

Exploitability — is this remotely exploitable or does it require local access?
Urgency — how quickly does this need to be addressed?
Version Risk — what's the upgrade risk? Patch bump vs minor vs major?
Fix Impact — is the fix likely to break anything?
Recommended Action — a concrete next step with reasoning

This analysis goes into the notification. The email you receive isn't "critical CVE detected." It's a security brief.

What it found on my stack

Three critical vulnerabilities across two images. Here's what the AI analysis said about each:

CVE-2026-27143 and GHSA-p77j-4mvh-x3m3 — both in a tunnel manager image, both in third-party binaries (Go stdlib and gRPC shipped inside cloudflared). Exploitability: remote. Urgency: critical. Version risk: patch bump. Fix impact: low — but the fix depends on the upstream vendor shipping an updated binary. Recommended action: defer until the vendor publishes an updated release, monitor for new cloudflared versions.

CVE-2025-58050 — in a socket proxy image maintained by a third party. Package: pcre2. Fix available: 10.46-r0. Exploitability: remote. Urgency: critical. Fix impact: low. Recommended action: pull the latest image version to pick up the fix if the maintainer has already shipped it. If not, defer and accept the risk with documentation.

In both cases the AI correctly identified that I don't control the vulnerable code. It didn't tell me to patch something I can't patch. It told me what I actually needed to know: these are third-party dependencies, here's the risk profile, here's what to do while you wait.

The part I didn't expect

The most useful output wasn't the vulnerability analysis. It was the differentiation between "you can fix this" and "you're waiting on someone else."

That distinction is obvious to a human who investigates the CVE. It's not obvious to a scanner. It requires knowing what the affected binary is, who maintains it, and whether the fix is in your control.

The AI got this right without me telling it explicitly. It reasoned from the package name and context to the correct conclusion about ownership and fixability.

That's the gap that rules-based automation can't close. You can write a rule that says "alert on critical CVEs." You can't write a rule that says "if the vulnerable binary is a third-party dependency with no upstream fix available, recommend deferral with documentation."

What I learned about AI in security workflows

The value isn't automating the patch. It's automating the triage.

A human security engineer looking at these three CVEs would spend 20-30 minutes researching each one: checking exploitability databases, looking at the upstream project's changelog, assessing version risk, writing up a recommendation. The AI does this in seconds, for every scan, every time.

The output isn't always perfect. The model can misread version risk or miss context about your specific setup. Every decision is visible in the feed with the full reasoning attached — you can always override, and you should review anything the model flags as critical before acting.

But the triage is right often enough to dramatically reduce the cognitive load of managing vulnerabilities across a fleet of containers. You stop reading CVE lists and start reading executive summaries with recommended actions.

That's a different kind of tool.

Building this as part of an open source self-hosted Docker management platform. If you're working on similar infrastructure automation problems, drop a comment.