DEV Community

DevCorner2
DevCorner2

Posted on

πŸ›‘οΈ Modern Database Backup Strategies: Techniques, Trade-offs & Real-World Usage in Top Tech Companies

In an era where data is currency, the importance of resilient, fast, and reliable backup strategies cannot be overstated. Whether it’s a fintech handling millions of transactions per second or a SaaS company syncing customer configs, database backups are the last line of defense against data loss, corruption, and disasters.

This post dives deep into the backup strategies employed by top-tier companies (Google, Netflix, Amazon, Meta, etc.), and helps you decide when to use which, with a focus on trade-offs, performance impact, and restoration goals.


🧰 1. Full Backups

Definition: A complete copy of the entire database at a specific point in time.

βœ… When to Use:

  • Small to medium-sized databases
  • Initial base for incremental/differential backups
  • Nightly backups when storage & time are not a concern

βš–οΈ Trade-offs:

Pros Cons
Simplest to restore High storage cost
No dependency on other files Slower backup time
Ideal for compliance Not feasible for very large datasets

🏒 Real-World:

  • Stripe: Performs full encrypted backups for compliance and auditing needs.
  • Meta: Stores full snapshots in cold storage for long-term retention.

πŸ” 2. Incremental Backups

Definition: Only the changes made since the last backup (full or incremental) are saved.

βœ… When to Use:

  • Large datasets with low write frequency
  • To reduce backup window and storage
  • Systems with tight RTO (Recovery Time Objective) and RPO (Recovery Point Objective)

βš–οΈ Trade-offs:

Pros Cons
Saves bandwidth and storage Restore requires chaining backups
Fast backups Slower and more complex restores
Efficient for frequently updated Prone to chain corruption

🏒 Real-World:

  • Netflix: Leverages incremental S3-based backups for Cassandra.
  • Google Cloud Spanner: Supports incremental export with minimal lock time.

🧱 3. Differential Backups

Definition: Backs up all changes made since the last full backup (not the last backup).

βœ… When to Use:

  • You want a balance between full and incremental
  • Systems where restore speed matters more than backup time

βš–οΈ Trade-offs:

Pros Cons
Faster restore than incremental Grows in size over time
Safer than incremental Less efficient than incremental
Simple to manage Still requires full base backup

🏒 Real-World:

  • Microsoft Azure SQL Database: Uses differential for mid-week recovery points.

πŸ§ͺ 4. Logical Backups (SQL Dumps)

Definition: Backs up database schema and data in a human-readable format (e.g., SQL dump).

βœ… When to Use:

  • Migrating between DBMS types (MySQL β†’ PostgreSQL)
  • Smaller databases where portability matters
  • Versioning schema with Git

βš–οΈ Trade-offs:

Pros Cons
DB-agnostic & portable Slower and more resource intensive
Easy to diff and version Not suited for big datasets
Useful for partial restores Lacks consistency on large writes

🏒 Real-World:

  • GitLab: Uses logical backups to mirror production data for staging.
  • Startups & Open Source Projects: Often default to logical due to simplicity.

πŸ’Ύ 5. Physical Backups (Binary Snapshots)

Definition: File-level copies of the database's data files, often using filesystem or volume-level snapshot tools (e.g., LVM, ZFS, EBS snapshots).

βœ… When to Use:

  • High-throughput databases (PostgreSQL, Cassandra, MongoDB)
  • Large datasets requiring fast restore
  • Consistency needed across multiple nodes (sharded systems)

βš–οΈ Trade-offs:

Pros Cons
Extremely fast restore times Platform-dependent (e.g., EBS only)
No performance hit Snapshot may require quiescing writes
Can be scheduled with volume managers Complex cross-region replication

🏒 Real-World:

  • Amazon RDS / Aurora: Uses EBS snapshots under the hood.
  • Airbnb: Uses EBS snapshots for MySQL replicas across regions.

🌍 6. Streaming Replication + PITR (Point-In-Time Recovery)

Definition: Continuously streams write-ahead logs (WALs) or binlogs to replicas or storage for precise recovery.

βœ… When to Use:

  • Mission-critical systems (financial, telecom)
  • Systems with low RPO requirements
  • Append-only workloads

βš–οΈ Trade-offs:

Pros Cons
Near real-time backups Complex to manage/monitor
Enables PITR Requires fine-tuned log storage
Critical for HA + failover systems Storage-intensive over time

🏒 Real-World:

  • Robinhood: Uses WAL streaming for PostgreSQL with PITR.
  • Facebook: Applies PITR + logical backups in combination for MyRocks.

🧬 7. Change Data Capture (CDC) + Backup Pipelines

Definition: Tracks row-level changes via logs and exports them into a backup store like S3, GCS, etc.

βœ… When to Use:

  • Real-time sync across regions or cloud providers
  • Analytical pipelines (e.g., Snowflake, BigQuery)
  • Decoupling backup from primary DB load

βš–οΈ Trade-offs:

Pros Cons
Real-time change tracking Requires infra (Debezium, Kafka, etc)
Ideal for hybrid transactional/analytical Complex to set up
Can recover specific changes Expensive for small orgs

🏒 Real-World:

  • Uber: Uses CDC + Kafka for MySQL to BigQuery pipelines.
  • DoorDash: Relies on Debezium for PostgreSQL change tracking.

🚦 Backup Selection Matrix

Use Case Recommended Strategy
Small, simple app Full + periodic logical dump
Large, write-heavy system Full + Incremental or Differential + PITR
High availability (HA) systems Streaming Replication + WAL/Binlog Backups
Disaster recovery compliance Physical Snapshots + Offsite Storage
Multi-cloud/hybrid sync CDC pipelines + Object Storage (S3, GCS)
Data migration Logical backups or CDC with schema export

🧠 Best Practices for Backup Management

  1. Automate everything – Cron, Airflow, or Kubernetes CronJobs.
  2. Encrypt backups at rest and in transit.
  3. Test restores regularly – backups are only as good as your ability to restore.
  4. Version your backups – use timestamped folders.
  5. Store offsite and cross-region – S3 Glacier, GCS Coldline, etc.
  6. Monitor & alert on backup failures.

πŸ“¦ Tools & Technologies Widely Used

Tool/Service Purpose
pg_basebackup, WAL-G PostgreSQL physical + streaming backups
mysqldump, xtrabackup MySQL logical and physical backups
Velero Kubernetes-aware volume backups
Restic, BorgBackup Cross-platform encrypted backups
AWS Backup, GCP Backup Managed cloud-native backup
Debezium, Kafka Real-time CDC pipelines

🧭 Conclusion

Top tech companies don’t rely on a single backup strategyβ€”they combine multiple layers (full + incremental + PITR + CDC) to build resilient, auditable, and fast-recovery systems. Your choice should align with:

  • Data volatility
  • Downtime tolerance
  • Recovery time (RTO) & point (RPO) goals
  • Storage & compute budgets

A well-architected backup strategy is like insuranceβ€”you hope you never use it, but when you do, it must just work.


Top comments (0)