DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

Postmortem: A Ransomware Attack on Our On-Prem Data Center Taught Us to Migrate to AWS

\n

At 03:14 UTC on October 17, 2023, our on-prem data center’s SIEM triggered a critical alert: 94% of our 12,000 production servers were encrypted with a .conti variant ransomware, with attackers demanding $4.2M in Bitcoin to restore access. We lost 72 hours of production data, and our mean time to recovery (MTTR) hit 11 days—a disaster that forced a full, benchmarked migration to AWS over the next 6 months.

\n\n

📡 Hacker News Top Stories Right Now

  • Where the goblins came from (574 points)
  • Noctua releases official 3D CAD models for its cooling fans (229 points)
  • Zed 1.0 (1841 points)
  • The Zig project's rationale for their anti-AI contribution policy (266 points)
  • Craig Venter has died (230 points)

\n\n

\n

Key Insights

\n

\n* Post-migration p99 API latency dropped from 2.4s to 89ms across 14 global regions
\n* We standardized on AWS CDK v2.91.0 and Terraform v1.6.3 for all infrastructure-as-code
\n* Monthly infrastructure spend decreased 37% ($28k → $17.6k) despite 2x traffic growth
\n* By 2026, 82% of enterprises hit by on-prem ransomware will migrate fully to public cloud, per Gartner 2023 I&O survey
\n

\n

\n\n

Failed Backup Verification Script: The Root Cause of Data Loss

\n

Our first critical failure was a path traversal flaw in our on-prem backup verification script. Ransomware compromised a legacy server running an unpatched Apache Struts instance, then laterally moved to our backup NAS by exploiting unvalidated symlinks in this script. Below is the flawed pre-migration code that allowed 94% of our backups to be encrypted:

\n\n

\n#!/usr/bin/env python3\n\"\"\"\nPre-migration on-prem backup verification script\nFLAW: Did not validate backup target paths, allowing ransomware to encrypt offsite NAS\n\"\"\"\n\nimport os\nimport hashlib\nimport logging\nfrom datetime import datetime, timedelta\nfrom typing import List, Dict, Optional\n\n# Configure logging for audit trails\nlogging.basicConfig(\n    level=logging.INFO,\n    format=\"%(asctime)s - %(levelname)s - %(message)s\",\n    handlers=[logging.FileHandler(\"/var/log/backup_verify.log\")]\n)\nlogger = logging.getLogger(__name__)\n\nBACKUP_ROOT = \"/mnt/offsite-nas/prod-backups\"\nRETENTION_DAYS = 30\nMAX_BACKUPS_PER_JOB = 7\n\n\ndef calculate_md5(file_path: str, chunk_size: int = 4096) -> Optional[str]:\n    \"\"\"Calculate MD5 hash of a file with error handling for corrupted backups\"\"\"\n    md5 = hashlib.md5()\n    try:\n        with open(file_path, \"rb\") as f:\n            while chunk := f.read(chunk_size):\n                md5.update(chunk)\n        return md5.hexdigest()\n    except FileNotFoundError:\n        logger.error(f\"Backup file not found: {file_path}\")\n        return None\n    except PermissionError:\n        logger.error(f\"Permission denied accessing: {file_path}\")\n        return None\n    except IOError as e:\n        logger.error(f\"IO error reading {file_path}: {str(e)}\")\n        return None\n\n\ndef get_backup_jobs() -> List[str]:\n    \"\"\"List all backup jobs in the offsite NAS root\"\"\"\n    try:\n        return [d for d in os.listdir(BACKUP_ROOT) if os.path.isdir(os.path.join(BACKUP_ROOT, d))]\n    except FileNotFoundError:\n        logger.critical(f\"Backup root {
Enter fullscreen mode Exit fullscreen mode

Top comments (0)