DEV Community

Wakeup Flower
Wakeup Flower

Posted on

Four Disaster Recovery (DR) strategies in AWS explained

1. Multi-Site (Active-Active)

  • Description: Full production runs simultaneously in multiple regions.
  • Real-Life Example:

    • Global banking system: Customers in New York, London, and Tokyo need access to their accounts at all times.
    • If the New York data center fails, London or Tokyo handles transactions without downtime.
  • RTO/RPO: Near zero (instant failover)

  • Cost: Very high (you pay for multiple full environments)

  • Use Case: Mission-critical apps with zero tolerance for downtime, e.g., stock trading platforms, airline reservation systems.


2. Warm Standby

  • Description: A smaller-scale version of your environment is running in another region. It can scale up when needed.
  • Real-Life Example:

    • E-commerce website: Main site in US-East, warm standby in US-West with minimum servers and database replicas.
    • During a disaster in US-East, US-West scales up to handle full traffic.
  • RTO/RPO: Medium — typically minutes to a few hours

  • Cost: Medium — you pay for standby resources, not full production

  • Use Case: Apps that are critical but can tolerate brief downtime, e.g., online stores, internal enterprise applications.


3. Pilot Light

  • Description: Minimal critical resources are running; rest of the environment is off but can be launched on-demand.
  • Real-Life Example:

    • SaaS analytics platform: Only the database is running in a secondary region.
    • During a disaster, application servers, load balancers, and other services are launched quickly.
  • RTO/RPO: Medium-High — some time required to bring services online

  • Cost: Low-Medium — only the critical part runs continuously

  • Use Case: Apps where cost savings are important but faster recovery than backup/restore is needed, e.g., SaaS reporting tools, business intelligence dashboards.


4. Backup & Restore

  • Description: Data is backed up; environment is built only when needed.
  • Real-Life Example:

    • Archival video content: Stored in S3 with snapshots.
    • If the primary site is lost, you restore the content to a new environment, which may take hours or days.
  • RTO/RPO: High — hours to days; data loss depends on backup frequency

  • Cost: Low — you only pay for storage, not running instances

  • Use Case: Non-critical workloads or infrequent access content, e.g., backups, dev/test environments, archival systems.

DR Strategy RTO RPO Cost Complexity Best For
Multi-Site Very low Near zero High High Mission-critical apps, zero downtime
Backup & Restore High Depends on backup Low Low Non-critical workloads, archival data
Warm Standby Medium Low-Medium Medium Medium Critical apps with moderate downtime tolerance
Pilot Light Medium-High Low-Medium Low-Medium Medium Cost-conscious apps needing faster recovery than backup

Top comments (2)

Collapse
 
camara_lee_fc05715aab8a87 profile image
Info Comment hidden by post author - thread only accessible via permalink
Camara Lee

Investment fraud is growing hugely and I advice the public to stay away from these fraudulent companies. For those who are victims to this scams like myself, I strongly recommend you to approach DAREK RECOVERY. I invested almost $270,000 and was unable to get my hard earn money back from an unregulated company that kept demanding for more deposits. Darek Recovery has successfully recovered investment for their clients. I have hired them as well and they did not disappoint, they recovered all of my money at no upfront cost. You can reach them on recoverydarek @ gmail. com If you are also a victim .

Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more