Why your VPS might be part of a botnet — and how to find out

#linux #security #devops #sysadmin

Last week I got a 3am email from my hosting provider. Subject: "Abuse report — your IP is participating in a DDoS." My first reaction was disbelief. My second was opening a laptop in bed and SSH'ing in like a chump.

This happens more often than people admit. A server you set up six months ago, forgot about, and never patched becomes someone else's attack tool. Recent law enforcement actions against bulletproof hosting operations have made one thing clear — a lot of compromised infrastructure being used in attacks is just neglected developer boxes. Old WordPress installs, exposed Redis, weak SSH keys.

Let's walk through how to actually figure out if your box is one of them, and what to do about it.

The symptoms are quieter than you'd expect

The Hollywood version of a compromised server is CPU pegged at 100% and obvious malware. Real life is duller. Modern attack toolkits are tuned to stay under the radar so the operator can keep using the box.

What you'll actually see:

Slightly elevated baseline CPU (5-15% from nothing)
Unexplained outbound traffic to weird ports
New cron entries you didn't write
SSH login attempts spiking from your own server's outbound logs (it's scanning for the next victim)
Mysterious processes with names like kdevtmpfsi, xmrig, or randomly-generated 6-char names

Step 1: Look at outbound connections first

Ingress filtering is easy. Egress is where most teams have nothing in place, and it's exactly where a compromised box gives itself away.

Start with ss — it's faster than netstat and ships with iproute2 on most modern distros:

# Show all established outbound TCP connections with the owning process
ss -tnpo state established

# Group by remote IP to spot fan-out patterns
ss -tn state established | awk 'NR>1 {print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head

If you see hundreds of connections to a single IP, or a wide spray to random IPs on port 22, 23, 80, or 445, that's scanner behavior. Legitimate apps usually talk to a small set of known endpoints.

Next, pull the live process tree. I like pstree -p because it shows parentage — a lot of malware spawns from cron or from a web server worker, and the parent process is the giveaway:

pstree -panu

Look for processes whose parent is cron, sh, or your web server but whose command line is something opaque like a long base64 string or a binary in /tmp or /dev/shm.

Step 2: Check the usual hiding spots

Attackers are creatures of habit. Here's the rapid sweep I run:

# World-writable temp dirs are the #1 dropzone
ls -lat /tmp /var/tmp /dev/shm 2>/dev/null | head -30

# Recently modified files in system bin dirs
find /usr/bin /usr/sbin /usr/local/bin -mtime -30 -ls 2>/dev/null

# Cron entries for every user (not just root)
for u in $(cut -d: -f1 /etc/passwd); do
  echo "--- $u ---"
  crontab -u "$u" -l 2>/dev/null
done

# Systemd timers and services added recently
systemctl list-timers --all
find /etc/systemd/system /lib/systemd/system -mtime -60 -ls

The /dev/shm trick catches a lot of cryptominers — they drop the binary into shared memory because it's tmpfs (no disk writes, less forensic evidence) and runs from RAM.

If you find a suspicious binary, before you delete it, get a hash and check it against VirusTotal:

sha256sum /tmp/.suspicious_binary
# Then upload the hash (not the file) to virustotal.com

Step 3: Trace the entry point with auditd

This is the part most tutorials skip, and it's the most important. Cleaning the malware without knowing how it got in means you'll be doing this exact same dance next week.

Install auditd if you don't already have it, and set up rules to watch the obvious vectors:

# Watch execve syscalls — every program execution
auditctl -a always,exit -F arch=b64 -S execve -k exec_trace

# Watch writes to common malware dropzones
auditctl -w /tmp -p wa -k tmp_writes
auditctl -w /dev/shm -p wa -k shm_writes

# Watch SSH key file changes (very common persistence trick)
auditctl -w /root/.ssh/authorized_keys -p wa -k ssh_keys

Then search the existing logs for evidence:

# Look for shell execution chains from web server users
ausearch -k exec_trace --start recent | grep -E 'uid=(33|www-data|nginx|apache)'

In my 3am incident, this is what cracked it — a PHP file in an abandoned WordPress install was being POSTed to, spawning /bin/sh, which pulled down a payload via curl. Classic webshell chain. The site hadn't been touched in two years.

Step 4: Fix it properly

Don't just kill -9 and move on. The malware will respawn from whatever persistence hook it installed. Order of operations:

Snapshot first. If your provider supports it, take a disk snapshot before you change anything. You may need it for forensics or to satisfy an abuse report.
Cut network egress before killing processes. A common mistake is killing the miner first, which triggers a watchdog reinstall. Block outbound first:

# Drop all outbound traffic except SSH from your management IP
nft add table inet filter
nft add chain inet filter output { type filter hook output priority 0 \; policy drop \; }
nft add rule inet filter output ct state established,related accept
nft add rule inet filter output oifname lo accept
# Add your specific allowlist rules here

Kill the processes and remove persistence. Cron, systemd units, ~/.bashrc lines, /etc/ld.so.preload, modified SSH authorized_keys. Check all of them.
Rotate every credential that touched the box. SSH keys, API tokens in env files, database creds, cloud provider credentials. Assume they're all in someone's wallet now.
Reinstall, don't clean. I know. It's annoying. But a rootkit you didn't find will outlive your cleanup. If the box held anything sensitive, the right move is destroy and rebuild from a known-good config.

Prevention: stop being an easy target

Most compromises I've cleaned up came from a small handful of root causes:

Unattended upgrades disabled. Turn them on. unattended-upgrades on Debian/Ubuntu, dnf-automatic on RHEL-likes. Yes it's a small risk. Running months-old kernels is a much bigger one.
SSH password auth enabled. Set PasswordAuthentication no and use keys. Add fail2ban for the noise.
No egress filtering. Default-deny outbound is annoying to set up but it's the single thing that would have prevented every botnet enrollment I've seen. Even a basic rule blocking outbound to common scan ports (22, 23, 445, 3389) from anywhere but a specific allowlist would catch most of them.
Forgotten services. That staging box. That old CMS. That Redis you exposed for "five minutes" to test something. Run nmap against your own external IPs once a quarter. You will be surprised.

The boring stuff genuinely works. I've stopped chasing exotic hardening guides and just kept a checklist of these basics for every new box. The 3am abuse emails have gone to zero.