DEV Community

Cover image for Rebuilding the Hull at Sea
Chris
Chris

Posted on • Originally published at mpdc.dev

Rebuilding the Hull at Sea

The box that ran everything started dying in April.

Not dramatically. Machines almost never die dramatically. It started with instability... the kind you explain away once, side-eye twice, and start losing sleep over the third time production goes down while you're in the middle of something else.

The host under my entire stack... public site, analytics, security tooling, the AI crew's memory layer... was getting flaky. And flaky hardware only trends one direction.

Here's the thing about a homelab that lives in a 40ft fifth wheel: there is no second team. No vendor escalation. No change advisory board. No maintenance window negotiated three weeks out. There's me, a crew of governed AI instances, and a reclaimed Dell T3600 about to get the biggest promotion of its second life.

So we didn't try to heal the sick box. We built a new hull alongside it and started moving the ship... plank by plank... while it was still sailing.

One ground rule, set day one: the old host stays untouched and keeps serving production until the new hull is proven. Not "mostly proven." Proven. Hold that thought, it matters at the end.

Moving containers is the easy part. Docker made that boring years ago, and boring is a compliment in infrastructure. What's never boring is the inventory of everything you assumed and never wrote down.

A migration doesn't test your stack. It tests your assumptions. Here's what mine were hiding.

01: The umbilical nobody documented

Security stack went first. Suricata, Zeek, Wazuh, CrowdSec, Falco, the whole alphabet, up clean on the new hull. Then the MCP server, the piece that gives the AI crew its hands, refused to come up right.

It was hard-wired over HTTP to the crew's memory backend. A live dependency, in production for months, documented exactly nowhere.

The crew that documents every f*cking thing had never documented its own umbilical cord.

Fix was trivial once we could see it: deploy the brain before the hands. Reorder, redeploy, done. But the lesson isn't the fix. The lesson is that your most embarrassing dependency is always between the things you built yourself. Third-party stuff gets documented because you don't trust it. Your own stuff doesn't... because you do.

02: The dump that wants to be the whole server

Next, the site and the database under it. And here's a landmine that's claimed more than a few weekends across this industry: a full MySQL dump doesn't just carry your data. It carries the server's own user and authentication tables.

Replay it wholesale onto a fresh host and you've just overwritten the new server's identity with the old one's. Now you're debugging ghost permissions at midnight wondering why a "clean" migration feels haunted.

The import was done surgically instead: one database, the blog's data and nothing else. Eighty-one tables came across healthy. Zero authentication weirdness, because the new server was never asked to wear a dead man's credentials.

Bonus round from the same move: the mail bridge came back up and immediately demanded a fresh interactive login, because its refresh token had gone stale sitting in a backup. Tokens age even when nothing is using them. File that one away.

03: Verify, don't infer

Analytics stack should have been the boring one. It wasn't, for two reasons that are really one reason.

First: before replaying the Postgres dump, the crew verified what format the dump actually was, instead of inferring it from the filename and vibes. Second: the new host had an orphaned data directory left over from an earlier attempt. The reflex move is to delete the clutter. The disciplined move is the one we made: shift it aside, intact, reversible. Then the replay went through with exactly the collisions you'd predict and fourteen tables standing.

Neither of those is clever. That's the point. Every assumption you verify is a landmine you stepped over instead of on. And reversible beats clean, every single time someone's data is involved.

04: The key you do not regenerate

Self-hosted remote desktop was the last stack across, and it carried the quietest landmine of the lot. The server's identity is a key pair, and every client ever enrolled trusts that key. Regenerate it during your nice clean migration and you haven't refreshed anything... you've broken trust with every machine in the fleet. And they don't fail loud. They just quietly stop believing you.

So before anything moved, the crew checksummed the key pair on both ends. Byte-for-byte match confirmed, keys carried over untouched, zero clients orphaned. And while we were in there, the container image got pinned from :latest to an exact digest, because :latest on a remote-access tool is a promise somebody else gets to break at 3am.

05: The red that lied

Mid-migration, a handoff note landed claiming newsletter growth was dead. Flatlined. The kind of status that ruins a week if you let it.

We didn't let it. The crew queried the members table directly: ten signups since late March, steady. The scary-looking bounces were recipient-side junk, not lost readers. The transactional mail path proved out clean, ten sends, ten deliveries.

Last article I made the case that green is the most dangerous output a system can produce. Turns out red lies too. Status is a claim in either color.

Where the ship actually is

The tally: five stacks rebuilt on the new hull, the site, the analytics, the security stack, remote access, the works. Call it twenty-seven containers plus the crew's memory, humming on hardware somebody else gave up on. The old box is still carrying production, untouched, exactly as the ground rule demanded.

And here's where most write-ups would type "migration complete" and cash the applause.

It isn't complete. Cutover hasn't happened. There's a short list of gates left, and they exist for one reason: the new hull doesn't take a single packet of production traffic until it's watched, audited, and proven as hard as the old one ever was. The dashboard on the new host is green right now. I could call it done.

But you know how I feel about green.

So this article doesn't get to claim the migration is finished, because it isn't, and I'd rather publish the truth mid-operation than a victory lap on credit. When cutover lands, you'll hear about it, in past tense, with receipts.

The hull is rebuilt. The ship hasn't noticed. That was the whole point... and the job isn't done until the old center of the world is just a box again.


Originally published at mpdc.dev. I run a self-hosted homelab out of a 40ft fifth wheel and write up what breaks. No mainstream platforms, no tracking, your data never leaves my stack.

Top comments (0)