Piter Adyson

Posted on Jan 19

MongoDB backup best practices — Essential strategies for MongoDB backup and recovery

#mongodb #database

MongoDB's flexibility comes with unique backup challenges. Unlike traditional relational databases, MongoDB's document model, sharding architecture and replica sets require specific backup strategies. This guide covers essential best practices for protecting your MongoDB data, from basic single-server setups to complex sharded clusters. Whether you run MongoDB Atlas or self-managed instances, these strategies ensure you can recover from any failure scenario.

Understanding MongoDB backup fundamentals

MongoDB stores data as BSON documents in collections. The database uses a storage engine (typically WiredTiger) that manages data files, journal files and indexes. Understanding this architecture helps you choose appropriate backup methods and avoid common mistakes.

How MongoDB stores data

WiredTiger creates several file types in the data directory. Data files contain actual documents, journal files provide crash recovery and metadata files track internal state. During normal operation, MongoDB writes changes to the journal before updating data files. This write-ahead logging ensures consistency even if the server crashes.

Backups must capture a consistent snapshot of these files. If you simply copy files while MongoDB runs, you risk inconsistent backups where some files are newer than others. Different backup methods solve this problem in different ways.

MongoDB backup methods

MongoDB offers several backup approaches:

Method	Best for	Consistency	Downtime	Storage size
mongodump	Small databases (<100GB), cross-version portability	Application-consistent	No downtime	Smaller (BSON compressed)
Filesystem snapshot	Large databases, fast recovery	Point-in-time consistent	Seconds (if using fsync+lock)	Full size (compressed with snapshot)
Cloud provider backup	Managed MongoDB (Atlas, AWS DocumentDB)	Automated consistent snapshots	No downtime	Managed by provider
Replica set delayed member	Simple continuous backup	Eventually consistent	No downtime	Full database size
Continuous backup (oplog)	Point-in-time recovery requirements	Continuous with oplog replay	No downtime	Base + oplog archives

No single method fits all scenarios. Small databases benefit from mongodump's simplicity. Large sharded clusters need filesystem snapshots or cloud provider backups. Production systems typically combine multiple methods for layered protection.

The 3-2-1 backup rule for MongoDB

The 3-2-1 rule provides a simple framework for backup reliability: keep 3 copies of your data, on 2 different storage types, with 1 copy offsite.

Implementing 3-2-1 with MongoDB

For a production MongoDB deployment, this means:

Primary data: Your running MongoDB database
Local backup: Daily snapshots on the same server or local network
Remote backup: Offsite storage in S3, Google Cloud Storage or another region

This protects against hardware failures (local backup), site disasters (remote backup) and human errors (multiple copies). If ransomware encrypts your primary database and local backups, the remote copy remains safe.

Storage type diversity

Use different storage technologies for each backup copy. Local backups might live on SAN storage while remote backups use object storage. This protects against storage system bugs and security vulnerabilities.

Many MongoDB disasters involve multiple simultaneous failures. A configuration error that corrupts the primary database might also affect backups on the same storage system. Physical separation between copies prevents cascading failures.

Backup frequency and retention

How often you backup and how long you keep backups depends on your recovery point objective (RPO) and compliance requirements.

Determining backup frequency

Recovery point objective answers "how much data can we afford to lose?" An e-commerce site might tolerate 15 minutes of lost transactions, while a financial system needs zero data loss.

RPO requirement	Backup frequency	Recommended method
24 hours	Daily full backups	mongodump or filesystem snapshot
4-8 hours	Every 4-6 hours	mongodump with oplog
1-2 hours	Hourly snapshots	Filesystem snapshot or continuous backup
<15 minutes	Continuous backup	Oplog archiving + replica sets

Start conservative and adjust based on actual recovery needs. Many teams discover their initial RPO estimates were too aggressive and scale back to reduce complexity.

Retention strategies

Keep backups long enough to recover from delayed-discovery problems. A corruption introduced weeks ago won't be noticed until old reports fail.

Common retention policies:

Daily backups: Keep 7 days
Weekly backups: Keep 4 weeks
Monthly backups: Keep 12 months
Yearly backups: Keep 7 years (compliance dependent)

Balance retention against storage costs. Older backups become less useful but eat growing amounts of space.

Testing backup restoration

Untested backups are Schrödinger's backups — you don't know if they work until you try. Many teams discover backup problems during actual disasters when stress is highest and time is shortest.

Monthly restoration tests

Schedule monthly tests that simulate real failures:

Select a random backup from the past week
Restore to a non-production environment
Verify all collections exist with expected document counts
Run application smoke tests against restored data
Document restoration time and any issues

These tests catch problems early. You might discover permissions issues, missing dependencies or bugs in restoration scripts. Fix these problems during calm periods, not during outages.

Partial restoration practice

Practice restoring individual collections and databases, not just complete systems. Many real scenarios need selective restoration: reverting a single collection after bad data import, recovering one database from a sharded cluster or extracting specific documents.

Create runbooks for common restoration scenarios. Document exact commands, required permissions and validation steps. When disasters happen, tested procedures reduce mistakes and speed recovery.

Replica set backup strategies

Replica sets provide built-in redundancy but aren't substitutes for backups. A bad query that drops a collection affects all replicas immediately. Backups protect against operational errors, not just hardware failures.

Backup from secondary members

Always run backups against secondary replica set members, never the primary. Backups consume CPU, memory and disk I/O. Running them on the primary degrades application performance.

Connect to a secondary with --host flag:

mongodump --host=secondary.example.com:27017 --out=/backup/mongodb

Or specify a replica set with read preference:

mongodump --uri="mongodb://secondary.example.com:27017/?readPreference=secondary" --out=/backup/mongodb

This ensures backup load stays off your primary node.

Hidden replica set members

For large production deployments, dedicate a hidden replica set member specifically for backups. Hidden members don't serve application traffic but participate in replication.

Add a hidden member to your replica set:

rs.add({
  host: "backup.example.com:27017",
  priority: 0,
  hidden: true,
  votes: 0
})

This member stays current with data changes but never becomes primary. Use it for backups, analytics queries and restoration testing without affecting production.

Delayed replica members

A delayed replica lags behind the primary by a configured amount (typically 1-4 hours). If someone drops a critical collection, you have a window to recover from the delayed member before replication catches up.

Configure a delayed member:

rs.add({
  host: "delayed.example.com:27017",
  priority: 0,
  hidden: true,
  votes: 0,
  slaveDelay: 3600
})

This creates a one-hour time window for recovering from operational mistakes. Combined with regular backups, delayed members provide defense in depth.

Sharded cluster backup

Sharded clusters distribute data across multiple servers, complicating backups. You can't simply backup each shard independently — you need a consistent snapshot across all shards plus config servers.

Challenges with sharded backups

MongoDB sharding spreads collections across shard servers based on shard keys. A backup must capture consistent state across:

All shard servers
Config servers (cluster metadata)
Any mongos routers

If these components have different timestamps, restoration produces inconsistent data. Documents might appear in wrong shards or disappear entirely.

Backup approaches for sharded clusters

For small sharded clusters (under 500GB), mongodump with --oplog captures consistent backups. MongoDB Atlas and managed services handle this automatically.

For large clusters, use one of these strategies:

Filesystem snapshots with coordinated timing:

Stop the balancer to prevent chunk migrations
Take snapshots of all shards and config servers simultaneously
Snapshots must complete within a few seconds for consistency

Cloud provider backup services:

MongoDB Atlas automated backups
AWS backup for DocumentDB
These handle coordination automatically

Continuous backup with oplog archiving:

Backup each shard's oplog to external storage
Take periodic base snapshots
Restore by replaying oplog from snapshot time

For most organizations, managed backup services eliminate sharded cluster complexity. The engineering cost of building reliable sharded backup automation exceeds the price of Atlas or similar services.

Securing your backups

Backups contain complete copies of production data. Protect them with the same security controls as your live databases.

Encryption at rest

Encrypt backup files before storing them anywhere. MongoDB's native encryption requires Enterprise edition, but you can encrypt backups regardless of MongoDB edition.

With mongodump, pipe through encryption:

mongodump --archive | openssl enc -aes-256-cbc -pass file:/etc/mongodb/backup.key > backup.encrypted

Or use your storage system's encryption (S3 server-side encryption, encrypted file systems, etc.). Just ensure you're not relying solely on database-level encryption — that doesn't protect backup files.

Access control for backups

Restrict who can read backup files. Use cloud storage IAM policies, filesystem permissions and encryption keys to limit access.

Create a dedicated backup role with minimal privileges:

db.createRole({
  role: "backupRole",
  privileges: [
    { resource: { db: "", collection: "" }, actions: ["find", "listCollections", "listIndexes"] }
  ],
  roles: ["backup"]
})

This role can read data but cannot modify anything in MongoDB.

Backup integrity verification

Corrupted backups are worse than no backups — you think you're protected but discover problems during restoration. Verify backup integrity after each backup completes.

Calculate checksums for backup files:

sha256sum /backup/mongodb-20260119.archive > /backup/mongodb-20260119.sha256

Store checksums separately from backup files. Before restoration, verify checksums match.

Automating MongoDB backups

Manual backups fail. Someone forgets, scripts break or servers go down. Automation removes human error and ensures backups happen consistently.

Building backup automation

Production backup automation needs:

Scheduling: Cron, systemd timers or orchestration systems
Error handling: Retries, alerting on failures
Logging: Detailed logs for troubleshooting
Monitoring: Track backup success, duration and size
Cleanup: Automatic old backup deletion

Here's a production-ready backup script template:

#!/bin/bash
set -euo pipefail

BACKUP_DIR="/backup/mongodb"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=7
MONGO_URI="mongodb://backup:password@localhost:27017"

# Create backup
mongodump --uri="$MONGO_URI" \
  --archive="$BACKUP_DIR/backup-$DATE.archive.gz" \
  --gzip \
  --oplog

# Verify backup exists and has size
BACKUP_FILE="$BACKUP_DIR/backup-$DATE.archive.gz"
BACKUP_SIZE=$(stat -f%z "$BACKUP_FILE" 2>/dev/null || stat -c%s "$BACKUP_FILE")

if [ "$BACKUP_SIZE" -lt 1000000 ]; then
  echo "ERROR: Backup file too small: $BACKUP_SIZE bytes"
  exit 1
fi

# Calculate checksum
sha256sum "$BACKUP_FILE" > "$BACKUP_FILE.sha256"

# Cleanup old backups
find "$BACKUP_DIR" -name "backup-*.archive.gz" -mtime +$RETENTION_DAYS -delete

echo "Backup completed: $BACKUP_FILE ($BACKUP_SIZE bytes)"

Add monitoring with exit codes, log files and alerting. Failed backups should page on-call engineers, not sit silently until disasters happen.

Using Databasus for MongoDB backup automation

Manual scripts work but require ongoing maintenance. MongoDB backup tools like Databasus automate the entire process with a web interface and team-friendly features.

Installing Databasus

Install with Docker:

docker run -d \
  --name databasus \
  -p 4005:4005 \
  -v ./databasus-data:/databasus-data \
  --restart unless-stopped \
  databasus/databasus:latest

Or using Docker Compose:

services:
  databasus:
    container_name: databasus
    image: databasus/databasus:latest
    ports:
      - "4005:4005"
    volumes:
      - ./databasus-data:/databasus-data
    restart: unless-stopped

Start the service:

docker compose up -d

Configuring MongoDB backups in Databasus

Access the web interface at http://localhost:4005 and create your account, then:

Add your database: Click "New Database" and select MongoDB as the database type
Enter connection details: Provide your MongoDB connection string (standalone or replica set URI)
Select storage: Choose from local storage, AWS S3, Google Cloud Storage, Dropbox, SFTP or other supported destinations
Configure schedule: Set hourly, daily, weekly, monthly or custom cron-based backup intervals
Add notifications (optional): Configure Slack, Discord, Telegram or email alerts for backup success and failures
Create backup: Databasus validates your settings and starts the backup schedule

Databasus handles compression, encryption and retention policies automatically. The web interface provides backup history, restoration tools and team access controls without writing custom scripts.

Point-in-time recovery with oplog

Point-in-time recovery (PITR) lets you restore to any moment, not just backup times. If a bad deployment corrupts data at 14:37, you can restore to 14:36 even if your last backup was at midnight.

How oplog enables PITR

MongoDB's oplog records every write operation in a capped collection. Replaying oplog entries recreates database state at any point.

PITR requires:

Base backup (full mongodump or snapshot)
Continuous oplog archiving from backup time forward
Ability to replay oplog to desired recovery point

Implementing basic oplog archiving

Backup the oplog periodically:

mongodump --db=local --collection=oplog.rs --archive=oplog-$(date +%Y%m%d_%H%M%S).archive

Store these oplog backups alongside your full backups. More frequent oplog backups provide finer-grained recovery points.

Restoring to a specific time

Restore the base backup first:

mongorestore --archive=backup-base.archive --oplogReplay

Then replay oplog archives up to your desired timestamp:

mongorestore --archive=oplog-segment1.archive --oplogReplay --oplogLimit=1705671420:0

The timestamp format is Unix epoch seconds and an increment value. This complexity is why managed services and tools handle PITR automatically for most users.

Monitoring backup health

Backups fail silently without monitoring. Script errors, full disks and network problems cause backup failures that go unnoticed until you need them.

Key metrics to track

Monitor these backup health indicators:

Last successful backup timestamp: Alert if older than expected interval
Backup size: Sudden drops indicate incomplete backups
Backup duration: Growing times signal performance problems
Restoration test results: Track monthly test success rates
Storage space: Prevent disk full errors

Set up alerts for any backup failure or unusual metric. Backup problems should trigger immediate investigation, not wait for weekly reviews.

Backup validation automation

After each backup completes, run automated validation:

# Check backup file exists and has minimum size
test -f "$BACKUP_FILE" || exit 1
test $(stat -c%s "$BACKUP_FILE") -gt 1000000 || exit 1

# Test archive integrity
mongorestore --archive="$BACKUP_FILE" --dryRun

# Calculate and verify checksum
sha256sum -c "$BACKUP_FILE.sha256" || exit 1

These checks catch corruption immediately rather than during restoration attempts.

Common backup mistakes to avoid

Learning from other teams' mistakes saves painful lessons. These MongoDB backup anti-patterns cause recovery problems.

Backing up only the primary

Running backups against the primary node degrades application performance. Backup operations read entire datasets, consuming memory and I/O. On small databases this might seem fine, but as data grows backup impact increases.

Always backup from secondary replica set members. If you don't have a replica set, add one before your database grows large enough that backup impact matters.

Ignoring backup testing

"We've never had to restore, so our backups must work" is optimism, not a backup strategy. Untested backups often fail when needed.

Common test-time discoveries:

Backup credentials expired
Restore scripts reference wrong paths
Incremental backups missing base backup
Backup files corrupted by storage issues

Schedule regular restoration tests. Document what you learn and update procedures.

Storing backups only on the database server

Keeping backups on the same server as the database fails during hardware failures, ransomware attacks and accidental deletions.

Send backups to separate storage systems immediately after creation. Cloud object storage (S3, GCS, Azure Blob) provides durable offsite storage for a few dollars per month.

Backup strategies by MongoDB deployment type

Different MongoDB architectures need different backup approaches. A single server requires different strategies than a globally-distributed sharded cluster.

Single server backups

For standalone MongoDB instances without replica sets:

Use mongodump for regular backups
Schedule backups during low-traffic periods
Accept brief performance impact during backup
Consider adding a secondary for backup offloading

Single server MongoDB is fine for development and small applications. As your database grows, add replica sets for both redundancy and backup isolation.

Replica set backups

Standard production deployment with 3-5 replica set members:

Backup from secondary members only
Use mongodump with --oplog for consistency
Consider dedicated hidden member for backups
Add delayed member for operational mistake protection

This is the sweet spot of MongoDB backup complexity — manageable scripts and clear procedures without sharded cluster complications.

Sharded cluster backups

Large deployments with data distributed across shards:

Use managed backup services (Atlas, cloud providers)
If self-managing, coordinate backups across all shards
Stop balancer during backup windows
Test restoration procedures regularly

For most teams, managed MongoDB services justify their cost through reliable automated backups alone. Building and maintaining sharded cluster backup systems requires significant engineering resources.

Backup checklist for production MongoDB

Use this checklist to audit your MongoDB backup strategy:

Backup configuration:

[ ] Backups run from secondary replica set members
[ ] Backup user has minimal required privileges
[ ] Backup schedule meets RPO requirements
[ ] Oplog captured with backups (if using mongodump)

Storage and retention:

[ ] Backups stored on separate systems from database
[ ] Remote/offsite backup copies exist
[ ] Retention policy defined and automated
[ ] Storage capacity monitored and alerted

Security:

[ ] Backup files encrypted at rest
[ ] Access to backups restricted to authorized personnel
[ ] Encryption keys stored separately from backups
[ ] Backup transfer encrypted (TLS/HTTPS)

Testing and validation:

[ ] Monthly restoration tests scheduled
[ ] Restoration procedures documented
[ ] Multiple team members trained on restoration
[ ] Recovery time objectives measured

Monitoring:

[ ] Backup success/failure alerts configured
[ ] Backup metrics tracked (size, duration, timestamp)
[ ] On-call escalation defined for backup failures
[ ] Backup validation runs after each backup

Missing items indicate gaps in your backup strategy. Address them before they become disaster recovery problems.

Conclusion

MongoDB backup strategies balance simplicity, performance and recoverability. Start with mongodump on secondary replica set members, add offsite storage and implement retention policies. Test restoration monthly and automate everything possible.

The best backup strategy is one you'll actually maintain. A simple automated system beats a complex manual process every time. As your MongoDB deployment grows, layer additional protections: oplog archiving for PITR, delayed replicas for operational mistakes and dedicated backup members for isolation.

Remember that backups are insurance against disasters, not alternatives to redundancy. Replica sets protect against hardware failures. Backups protect against everything else: operational errors, security breaches and disasters that affect entire sites. When recovery becomes necessary, your preparation determines whether it's a minor incident or a catastrophic outage.