I run a 28-container security operations center from a 40ft fifth wheel RV on cellular internet. Local AI. Custom governance protocol. Self-hosted everything.
Last Tuesday I looked at my stack and realized... none of it was hardened.
28 containers. Zero cap_drop. Zero no-new-privileges. Zero resource limits. Every container running with full Linux capabilities because I never went back and locked them down after getting things working.
So I did what any reasonable person would do. I called it Project Battleship and hardened the entire fleet in one day.
The Rule: Audit First, Change Second
The biggest mistake in infrastructure work is changing things before you understand what you have. So I split the day into two phases.
Session A was pure reconnaissance. Ten audit sweeps across all 26 persistent containers. Version inventory. Capability audit. Network attachment verification. Resource usage baselines. UFW rules snapshot. OpenSnitch rules inventory.
Zero changes. Just data.
That audit found the critical stuff before I touched a single compose file:
- OpenWebUI port 3002 was exposed to ALL interfaces including WAN. Two broad UFW rules overriding the LAN-only rules I thought were protecting it.
- 15 of 26 containers running unversioned
:latesttags. - Zero containers had
cap_dropset. Only one hadno-new-privileges.
Session B was the surgery. Armed with a complete baseline from Session A... I knew exactly what to touch and what to leave alone.
The Pattern: cap_drop ALL, Then Add Back What Breaks
Every container got the same treatment: Then I watched what broke.
Containers that start as root and drop to a service user need CHOWN, SETUID, SETGID added back. Containers that use chroot sandboxing need SYS_CHROOT and MKNOD. SQLite writers need DAC_OVERRIDE and FOWNER. File readers through capability drops need DAC_READ_SEARCH.
Four containers can't be hardened at all... adguard, suricata, zeek, and ntopng all run on host network with elevated privileges for packet capture. That's by design. Informed exceptions with documented reasoning.
Six containers can't use no-new-privileges because they use gosu, chroot, or need privilege transitions during startup. Also documented.
The rest... 22 of 28... locked down with cap_drop ALL, minimum required capabilities added back, resource limits set, and no-new-privileges enabled.
The 70/30 Model Under Live Fire
Here's the part nobody talks about when they write hardening guides.
I didn't do this alone. I have five AI instances running across four projects with a governance protocol I built specifically because AI will confidently do the wrong thing if you let it.
The AI does 70% of the work. Research. Drafting compose changes. Identifying which capabilities each container needs. Proposing rollback plans.
I do the 30% that matters. Reviewing every change. Deciding what ships. Marking what fires. Accepting the risk.
Every compose edit was proposed by the AI. Every compose edit was reviewed and approved by me before execution. Every container was restarted one at a time with verification between each. Backups of all seven compose files created before the first edit.
The AI didn't get a vote on what went to production. That's the protocol.
The Tunnel Fix Nobody Expected
Mid-day... while doing compose surgery on the security stack... I accidentally fixed a problem that had been misdiagnosed for two weeks.
My MCP server (the bridge between my AI instances and the persistent memory brain) had been dropping connections since the previous round of compose changes. Every station blamed Claude.ai's SSE session management. "Open a new window" was the accepted workaround.
Turns out the Cloudflare tunnel route for the MCP server was pointing to a hardcoded Docker container IP that changed when the container was recreated. Every other tunnel route used localhost. The MCP route was the only one with a hardcoded container IP.
One API call to Cloudflare. Changed the route to localhost:7778. Added a 127.0.0.1:7778:7778 port binding to the compose file.
Two weeks of misdiagnosis fixed in thirty seconds.
Lesson: When a workaround usually works... the real fix hasn't been found yet. Trace the full request path. Every hop. Every layer.
The Results
From zero hardening to full warship in one day:
- 22 of 28 containers hardened with
cap_drop ALL - 15 containers with
no-new-privileges(up from 1) - All 22 hardened containers have memory, CPU, and PID limits
- 4 containers unhardened by design (packet capture)
- 6 containers without
no-new-privilegesby design (documented exceptions) - Zero downtime
- Zero data loss
- One operator in flip flops
The Takeaway
Hardening isn't hard. It's tedious. And tedious is exactly what AI is good at.
The AI proposed every change. I approved every change. The protocol kept us honest. The audit-first approach meant we never guessed.
If you're running Docker containers without cap_drop ALL... you probably know you should fix that. The pattern is simple. Drop everything. Watch what breaks. Add back the minimum. Document the exceptions.
Or keep running with full capabilities and hope nobody notices. Your call.
This post was generated with AI assistance under the 70/30 governance model. The bot did the drafting. I did the deciding. That's the whole point.
All platforms and contact: mpdc.dev/deets
Top comments (0)