Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

MongoDB Operations Toolkit

#database #sql #postgres #performance

MongoDB Operations Toolkit

Everything you need to run MongoDB in production with confidence. This toolkit covers schema design patterns that prevent document bloat, aggregation pipelines for real analytics, sharding strategies with step-by-step deployment, automated backup and restore scripts, and monitoring dashboards that surface problems before your users notice. Built for MongoDB 6.0+ and tested on both Atlas and self-hosted deployments.

Key Features

6 schema design patterns — embedded, referenced, bucket, computed, subset, and extended reference with sizing guidelines
15 aggregation pipeline templates for reporting, time-series analysis, graph lookups, and windowed computations
Sharding playbook covering shard key selection, chunk balancing, zone-based sharding for geo-distributed deployments
Automated backup scripts using mongodump, mongorestore, and Atlas continuous backup with point-in-time recovery
Monitoring dashboard configs for Prometheus + Grafana with 12 panels covering oplog lag, connection pools, and WiredTiger cache
Index analysis queries that identify unused indexes, missing indexes, and index intersection opportunities
Connection pooling tuning for Mongoose, PyMongo, and the native driver with recommended settings per workload type
Migration helpers for resharding live collections and rolling index builds without downtime

Quick Start

unzip mongodb-operations-toolkit.zip
cd mongodb-operations-toolkit/

# Connect to your MongoDB instance
mongosh "mongodb://admin:YOUR_PASSWORD_HERE@localhost:27017/admin"

# Run the diagnostic health check
load("src/diagnostics/health_check.js")
# Output: collection sizes, index stats, replication status

# Run the index analysis
load("src/indexes/unused_index_finder.js")

Quick index analysis to find waste:

// Find indexes that haven't been used since last server restart
db.getCollectionNames().forEach(function(coll) {
  var stats = db[coll].aggregate([{ $indexStats: {} }]).toArray();
  stats.forEach(function(idx) {
    if (idx.accesses.ops.valueOf() === 0 && idx.name !== "_id_") {
      print("UNUSED: " + coll + "." + idx.name +
            " | size: " + db[coll].stats().indexSizes[idx.name]);
    }
  });
});

Architecture / How It Works

mongodb-operations-toolkit/
├── src/
│   ├── schema_patterns/
│   │   ├── embedded_pattern.js      # One-to-few relationships
│   │   ├── bucket_pattern.js        # Time-series bucketing
│   │   ├── computed_pattern.js      # Pre-computed aggregations
│   │   └── subset_pattern.js       # Hot/cold data separation
│   ├── aggregation/
│   │   ├── reporting_pipelines.js   # Revenue, user activity, funnels
│   │   ├── time_series.js          # Windowed aggregations
│   │   └── graph_lookups.js        # Recursive $graphLookup examples
│   ├── sharding/
│   │   ├── shard_key_analysis.js   # Evaluate candidate shard keys
│   │   ├── setup_sharded_cluster.sh
│   │   └── zone_sharding.js       # Geo-based zone configuration
│   ├── backup/
│   │   ├── backup.sh              # Automated mongodump with rotation
│   │   ├── restore.sh             # Point-in-time restore procedure
│   │   └── verify_backup.sh       # Backup integrity validation
│   ├── indexes/
│   │   └── unused_index_finder.js  # Find and report unused indexes
│   └── diagnostics/
│       └── health_check.js         # Comprehensive server diagnostics
├── examples/
│   ├── ecommerce_schema.js
│   └── iot_time_series.js
└── config.example.yaml

Usage Examples

Bucket pattern for time-series IoT data:

// Instead of one document per reading (millions of tiny docs),
// bucket into hourly documents (much fewer, larger docs)
db.sensor_readings.insertOne({
  sensor_id: "sensor-42",
  bucket_start: ISODate("2026-03-23T14:00:00Z"),
  bucket_end: ISODate("2026-03-23T15:00:00Z"),
  count: 60,
  readings: [
    { ts: ISODate("2026-03-23T14:00:00Z"), temp: 22.4, humidity: 45 },
    { ts: ISODate("2026-03-23T14:01:00Z"), temp: 22.5, humidity: 44 }
    // ... up to 60 readings per bucket
  ],
  summary: { avg_temp: 22.45, min_temp: 21.8, max_temp: 23.1 }
});

// Index on sensor_id + bucket_start for efficient range queries
db.sensor_readings.createIndex(
  { sensor_id: 1, bucket_start: 1 },
  { name: "idx_sensor_time_range" }
);

Aggregation pipeline for monthly revenue report:

db.orders.aggregate([
  { $match: {
    status: "completed",
    created_at: { $gte: ISODate("2026-01-01"), $lt: ISODate("2026-04-01") }
  }},
  { $group: {
    _id: { $dateToString: { format: "%Y-%m", date: "$created_at" } },
    total_revenue: { $sum: "$total" },
    order_count: { $sum: 1 },
    avg_order_value: { $avg: "$total" }
  }},
  { $sort: { _id: 1 } },
  { $project: {
    month: "$_id",
    total_revenue: { $round: ["$total_revenue", 2] },
    order_count: 1,
    avg_order_value: { $round: ["$avg_order_value", 2] }
  }}
]);

Automated backup script with retention:

#!/bin/bash
# backup.sh — run via cron: 0 2 * * * /opt/mongodb/backup.sh
BACKUP_DIR="/backups/mongodb/$(date +%Y%m%d_%H%M%S)"
RETENTION_DAYS=30
MONGO_URI="mongodb://backup_user:YOUR_PASSWORD_HERE@localhost:27017"

mkdir -p "$BACKUP_DIR"
mongodump --uri="$MONGO_URI" --out="$BACKUP_DIR" --gzip --oplog

# Verify backup integrity
mongorestore --uri="$MONGO_URI" --dir="$BACKUP_DIR" --dryRun --gzip 2>&1 \
  | grep -q "done" && echo "Backup verified" || echo "BACKUP VERIFICATION FAILED"

# Rotate old backups
find /backups/mongodb -maxdepth 1 -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \;

Configuration

# config.example.yaml
mongodb:
  uri: "mongodb://admin:YOUR_PASSWORD_HERE@localhost:27017/admin"
  database: myapp
  auth_source: admin

backup:
  schedule: "0 2 * * *"        # daily at 2 AM
  retention_days: 30
  compression: gzip
  include_oplog: true          # required for point-in-time recovery
  verify_after_backup: true

sharding:
  enabled: false
  config_servers: 3            # always use 3 for production
  shards: 2
  default_chunk_size_mb: 128

monitoring:
  exporter_port: 9216
  scrape_interval: 15s
  alert_on_repl_lag_seconds: 10
  alert_on_connections_percent: 80

Best Practices

Choose shard keys based on query patterns, not data distribution alone. A monotonically increasing shard key creates a "hot shard" that handles all new writes.
Cap embedded arrays at 500 elements. Beyond that, use the bucket pattern or move to a referenced design to avoid document growth limits.
Build indexes in the background on production replica sets. Use db.collection.createIndex({field: 1}, {background: true}) to avoid blocking reads.
Always include --oplog in mongodump for replica sets. Without it, you cannot do point-in-time recovery.
Monitor WiredTiger cache usage. If cache dirty percentage stays above 20%, your write workload is exceeding disk flush capacity.
Use read preference secondaryPreferred for reporting queries to reduce load on the primary node.

Troubleshooting

Problem	Cause	Fix
Slow queries despite correct indexes	Index not fitting in RAM	Check `db.collection.stats().indexSizes` and ensure total index size < WiredTiger cache
Replication lag increasing	Large write batches or slow secondary	Check oplog window with `rs.printReplicationInfo()` and resize oplog if under 24h
`mongodump` fails mid-backup	Insufficient disk space or auth error	Verify free space with `df -h` and ensure backup user has `backup` role
Document exceeds 16MB limit	Unbounded embedded array growth	Migrate to bucket pattern or referenced design; add app-level size guard

This is 1 of 9 resources in the Database Admin Pro toolkit. Get the complete [MongoDB Operations Toolkit] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire Database Admin Pro bundle (9 products) for $109 — save 30%.

Get the Complete Bundle →

DEV Community

MongoDB Operations Toolkit

MongoDB Operations Toolkit

Key Features

Quick Start

Architecture / How It Works

Usage Examples

Configuration

Best Practices

Troubleshooting

Related Articles

Top comments (0)