Somewhere and somehow, there are a handful of sysadmins who have never completely broken a VPS. They might even manage to maintain, update, and optimize their VPSs on a regular basis. They keep them going indefinitely. These people are coveted by industry, make bank, and generally keep everything we love about the internet going.
I am not that sysadmin. There's a good reason the SSD Nodes engineers don't let me near any of the important buttons. Or any of the buttons for that matter.
A good sysadmin does not break every VPS they touch. So, a short "survival guide" for terrible* sysadmins like me. How can we learn from our mistakes? How can we implement bad sysadmin-friendly tools to halt our bad habits? How can we rid ourselves of this curse?
I've used some s****y passwords
You step away for just a moment---maybe you even ask a stranger to hold your spot for you---but when you return, someone has invaded your turf.
Not a great feeling.
Once, I accidentally deployed a new VPS, using an older variant of my standard Ansible playbook, with password
as the password. I hadn't noticed because the password is hashed inside of the playbook. I logged in via password (not SSH key + passphrase), installed Docker, and moved on. The next time I logged in, something felt off. I ran a docker ps -a
and found a cryptocurrency miner running on my VPS.
The only natural response was to immediately terminate my connection, log into the administrative area, and reinstall the operating system. In no way am I qualified to mitigate the damage, cut out the intruder, and protect the system from being attacked again. Despite being a yet-unused VPS, I still burned time and once again showcased my sysadmin idiocy.
How can you prevent this?
- Use SSH keys and passphrases, instead of just passwords, while also disabling password-based SSH logins.
- Pair that passphrase with a manager like Bitwarden to keep you from having to remember it.
- Or, choose an SSH passphrase and user passwords that are both complex but easy enough to remember.
- Mostly, don't choose
password
or anything you'd find on one of those most commonly used passwords lists.
I've locked myself out via iptables
You type in a seemingly harmless iptables
rule and find yourself unable to do anything else. You kill the session, maybe close down the terminal itself, and try again. No dice. You've just locked yourself out---one of the classic beginner sysadmin mistakes.
Given that most firewall setups are done very early in a VPS' life, you shouldn't have lost out on too much time at this point. Still, the only real solution is to reinstall the OS and try again.
And if you're like me, you've locked yourself out, reinstalled, and promptly locked yourself out again. Time for some alternatives.
How to stop losing the keys
- Use a tool like
iptables-apply
, which forces you to confirm that the rules work. If you don't confirm (because you're locked out), they revert. - Set a "failsafe" on a timer. The
at
command is great for this. Something simple, likeecho 'service iptables stop' | at now + 1min
will stop theiptables
service after a minute. If you locked yourself out, grab a cup of coffee, log back in, and try again. - Check with your VPS provider if they offer an out-of-band console for lock-out situations. They can be a saving grace in a desperate case like this.
I've made critically dumb mistakes multiple times in a row
As in the many iptables
missteps.
As in trying to connect PHP to my Nginx web server.
As in blindly trying to perform a major upgrade without thinking about
the potential consequences.
As in trying to make SSH more secure, only to accidentally make it
secured from myself.
As in trying to install just one more service on top of a dozen others
I've finally managed to cobble together.
These are the worst ways to kill a VPS. You've already put in some real time, had a series of successful installs/deployments, and then hit yourself with that stomach-dropping feeling yet again. Unfortunately, there's also hundreds of ways to manage this, and only one way out: reinstall.
How I've fixed this
- Write documentation for your personally complex processes. Installation procedures, quirky configurations, gaps in your expertise---having a written walkthrough, in your own words goes a long way.
- Use infrastructure-as-code, like Ansible, to standardize how you work. At least you'll know the steps you took to get you in your current hole.
- Always insert timed failsafes, if possible.
- Back up configuration files before you get all touchy-feely.
What works locally breaks remotely
Years ago I was built an awful Node.JS web app for subscribing and listening to podcasts. It's quite embarrassing to reminisce over, but I honestly felt I was on the leading edge of the podcast revolution. Now, a few years and one PocketCasts acquisition later, I wish I had stuck with it.
But that's a whole different regret.
I had the perfect setup on my local machine: the precise working versions of ExpressJS and other dependencies, npm
with the correct permissions, a MongoDB database without any unnecessary cruft.
The time finally came to spin up a VPS and deploy the web app. I installed all the dependencies, crossed my fingers, and typed in node app.js &
. I was met with enough error verbiage to last me three Page Up
hits. It felt a little like this:
Cobbling together the environment was so complicated that I even added the following note to the Github repository:
The challenge is figuring out how to get it running, because I don't particularly feel like writing up the installation step-by-step. Have fun hacking!
In the end, with a lot of headaches, I got the web app running and acquired myself roughly a half-dozen users. One of them was my sister, if that says anything about how successful the whole venture was.
Ways to sync up local and remote
- Use Docker or something similar, like LXC or even Kubernetes. These tools will help you launch consistent environments both near and far.
- Rely more on static-file deployments, like Hugo over WordPress. Reduce your reliance on dependencies and build tools like
gulp
. - Use Ansible, as I suggested before.
- Use some
CI/CD
system. I'm still not advanced enough for these though, so take that with a grain of salt.
And now I put way too much faith in Docker
Docker has eased my development and deployment processes so much I also rely on it for everything, such as self-hosting other open source web apps on my VPS.
I've forgotten how to deploy a LEMP stack on my own. I don't understand the process of running multiple services on a single VPS any other way. Configuring an Nginx reverse proxy on my own? No way.
By easing certain inadequacies of ours, we reveal others.
How to break the habit
I have no idea. Honestly.
In the end, I assume the worst
The gold star sysadmins might already do all this, but for people like me, there's still a lot to learn. In reality, there are still quite a few VPSs that will meet untimely but accidental ends.
As long as we bad sysadmins continue our VPS-breaking ways with a consistent desire to learn from our mistakes, we'll continue to make progress. Failure might be our only way to progress.
It's not magic. It's just a blinking cursor on a distant server. And, despite what you might think, your VPS doesn't mind if you have to start from scratch. Again.
Heyo! Many thanks to Nikita Sobolev's "I am a mediocre developer" post for the inspiration. This was originally posted at the Serverwise blog.
Top comments (1)
Some other things for your list:
pageant.exe
,ssh-agent
,gpg-agent
, etc.) make it so that even with a ridiculously complex password, you only need to enter it once every few hours.iptables
or by way of third-party tools.Whenever you're considering adding new rules (you still use
iptables
, directly, rather than viafirewalld
?), only add them to the running configuration, not the on-disk configuration. If you lock yourself out, instead of having to rebuild (even if your VPS provider doesn't offer remote there may be alternatives to rebuilding), all you need to do is reboot to make the offending rule go away. Once you've validated that your new rule does what you need it to do, then save the config to disk.You haven't lived until you've locked yourself out of a co-located physical system and had to make an hour-long drive to fix a bad firewall rule. It tends to pretty firmly instill habits.