DEV-AI

Posted on Sep 30

MongoDB Server Performance Profiling in Simple Way through Compass Shell

#mongodb #performance #monitoring #devops

MongoDB performance profiling is essential for identifying bottlenecks, slow queries, and resource utilization issues in production environments. This guide provides operations teams with practical shell commands and analysis techniques using MongoDB Compass for comprehensive server monitoring.

Initial Setup and Basic Profiler Configuration

Enable Database Profiler

Start with basic profiler setup to capture slow operations:

// Check current profiling status
db.getProfilingStatus()

// Enable profiling for operations slower than 100ms
db.setProfilingLevel(1, { slowms: 100 })

// For comprehensive analysis (use cautiously in production)
db.setProfilingLevel(2, { sampleRate: 0.1 }) // Sample 10% of operations

Expected Output: Returns profiler configuration with current level, slowms threshold, and sample rate settings.

Essential Performance Monitoring Queries

1. Server Health Overview

// Get comprehensive server metrics
db.serverStatus()

Expected Output: Complete server statistics including connections, operation counters, memory usage, locks, and cache metrics. Key sections to monitor:

connections: Current active connections vs available
opcounters: Operation counts (insert, query, update, delete)
globalLock: Lock acquisition statistics
wiredTiger.cache: Cache utilization and hit ratios

2. Real-Time Operation Statistics

// Collection-level performance data
db.runCommand({ top: 1 })

Expected Output: Time spent and operation counts per collection, showing which collections are consuming the most resources.

Critical Query Analysis

3. Identify Slowest Operations (Last Hour)

db.system.profile.find({
  "ts": { $gte: new Date(Date.now() - 3600000) }
}).count()

// Top 10 slowest queries by execution time

db.system.profile.find({
  "ts": { $gte: new Date(Date.now() - 3600000) }
}).sort({ "millis": -1 }).limit(10).pretty()

Expected Output: Documents showing operation type, namespace, execution time (millis), documents examined vs returned, and plan summary. Look for:

millis > 1000: Operations taking over 1 second
planSummary: "COLLSCAN": Collection scans requiring indexes
High docsExamined to nreturned ratios

For query to get operation count and averages grouped by collection and operation type (read, delete, insert, update) for the last hour:

db.system.profile.aggregate([
  {
    $match: {
      "ts": { $gte: new Date(Date.now() - 3600000) }, // Last hour
      "op": { $in: ["query", "getmore", "insert", "update", "remove", "delete"] }
    }
  },
  {
    $addFields: {
      operationType: {
        $switch: {
          branches: [
            { case: { $in: ["$op", ["query", "getmore"]] }, then: "read" },
            { case: { $eq: ["$op", "insert"] }, then: "insert" },
            { case: { $eq: ["$op", "update"] }, then: "update" },
            { case: { $in: ["$op", ["remove", "delete"]] }, then: "delete" }
          ],
          default: "other"
        }
      }
    }
  },
  {
    $group: {
      _id: {
        collection: "$ns",
        operationType: "$operationType"
      },
      operationCount: { $sum: 1 },
      avgExecutionTime: { $avg: "$millis" },
      totalExecutionTime: { $sum: "$millis" }
    }
  },
  {
    $sort: { 
      "_id.collection": 1,
      "_id.operationType": 1
    }
  }
]).pretty()

Expected Output: Results grouped by collection and operation type showing:

operationCount: Number of operations
avgExecutionTime: Average execution time in milliseconds
totalExecutionTime: Total time spent on that operation type per collection
Operations categorized as: read, insert, update, delete

The query now covers all operations (not just aggregations) and provides a clear breakdown by collection and operation type for the last hour.

5. Resource-Intensive Operations

// High document examination queries
db.system.profile.find({
  "docsExamined": { $gt: 10000 },
  "nreturned": { $lt: 100 }
}).sort({ "millis": -1 }).limit(10).pretty()

Expected Output: Queries scanning many documents to return few results, indicating index inefficiency or missing compound indexes.

Aggregation Pipeline Performance Analysis

6. Expensive Aggregation Operations

// Slow aggregation pipelines with stage analysis
db.system.profile.find({
  "op": "command",
  "command.aggregate": { $exists: true },
  "millis": { $gt: 1000 }
}).sort({ "millis": -1 }).limit(10).pretty()

Expected Output: Aggregation operations with pipeline details. Common issues include:

Large $skip operations (pagination problems)
Unoptimized $lookup stages
Missing indexes for $match stages

Advanced Performance Analysis Framework

This section provides comprehensive MongoDB profiler queries for deep performance analysis, enabling operations teams to identify complex bottlenecks and optimization opportunities through sophisticated aggregation-based monitoring.

Complete Performance Health Check

// Master Performance Overview - Last 6 Hours
db.system.profile.aggregate([
  {
    $match: {
      "ts": { $gte: new Date(Date.now() - 21600000) } // Last 6 hours
    }
  },
  {
    $facet: {
      "criticalIssues": [
        {
          $match: {
            $or: [
              { "millis": { $gt: 5000 } },           // > 5 seconds
              { "planSummary": "COLLSCAN" },          // Collection scans
              { "docsExamined": { $gt: 100000 } },    // High doc examination
              { "cpuNanos": { $gt: 5000000000 } }     // > 5 CPU seconds
            ]
          }
        },
        { $sort: { "millis": -1 } },
        { $limit: 20 },
        {
          $project: {
            ns: 1, op: 1, millis: 1, planSummary: 1,
            docsExamined: 1, nreturned: 1,
            efficiency: { 
              $cond: [
                { $eq: ["$nreturned", 0] }, 
                0, 
                { $divide: ["$docsExamined", "$nreturned"] }
              ]
            },
            cpuSeconds: { $divide: ["$cpuNanos", 1000000000] }
          }
        }
      ],
      "performanceMetrics": [
        {
          $group: {
            _id: null,
            totalOperations: { $sum: 1 },
            avgLatency: { $avg: "$millis" },
            p95Latency: { $percentile: { input: "$millis", p: [0.95], method: "approximate" } },
            p99Latency: { $percentile: { input: "$millis", p: [0.99], method: "approximate" } },
            maxLatency: { $max: "$millis" },
            totalDocsExamined: { $sum: "$docsExamined" },
            totalDocsReturned: { $sum: "$nreturned" },
            collectionScans: { 
              $sum: { 
                $cond: [{ $eq: ["$planSummary", "COLLSCAN"] }, 1, 0] 
              }
            },
            slowQueries: {
              $sum: {
                $cond: [{ $gt: ["$millis", 1000] }, 1, 0]
              }
            }
          }
        },
        {
          $addFields: {
            overallEfficiency: { 
              $divide: ["$totalDocsReturned", "$totalDocsExamined"] 
            },
            slowQueryPercentage: {
              $multiply: [
                { $divide: ["$slowQueries", "$totalOperations"] },
                100
              ]
            }
          }
        }
      ],
      "resourceConsumption": [
        {
          $group: {
            _id: null,
            totalCpuTime: { $sum: "$cpuNanos" },
            avgCpuTime: { $avg: "$cpuNanos" },
            totalYields: { $sum: "$numYield" },
            totalBytesRead: { $sum: "$storage.data.bytesRead" },
            highCpuOps: {
              $sum: {
                $cond: [{ $gt: ["$cpuNanos", 1000000000] }, 1, 0] // > 1 CPU second
              }
            }
          }
        },
        {
          $addFields: {
            totalCpuTimeSeconds: { $divide: ["$totalCpuTime", 1000000000] },
            avgCpuTimeMs: { $divide: ["$avgCpuTime", 1000000] },
            totalBytesReadMB: { $divide: ["$totalBytesRead", 1048576] }
          }
        }
      ]
    }
  }
]).pretty()

Expected Output: Three-part analysis providing:

criticalIssues: Top 20 problematic operations with efficiency ratios and CPU consumption
performanceMetrics: Overall system health including P95/P99 latencies and slow query percentage
resourceConsumption: CPU time, memory pressure indicators, and I/O statistics

Operation Type Performance Breakdown

// Detailed Operation Analysis by Type
db.system.profile.aggregate([
  {
    $match: {
      "ts": { $gte: new Date(Date.now() - 7200000) } // Last 2 hours
    }
  },
  {
    $group: {
      _id: {
        operation: "$op",
        collection: { $arrayElemAt: [{ $split: ["$ns", "."] }, -1] }
      },
      count: { $sum: 1 },
      totalTime: { $sum: "$millis" },
      avgTime: { $avg: "$millis" },
      maxTime: { $max: "$millis" },
      minTime: { $min: "$millis" },
      p95Time: { $percentile: { input: "$millis", p: [0.95], method: "approximate" } },
      totalDocsExamined: { $sum: "$docsExamined" },
      totalDocsReturned: { $sum: "$nreturned" },
      collectionScans: {
        $sum: {
          $cond: [{ $eq: ["$planSummary", "COLLSCAN"] }, 1, 0]
        }
      },
      indexScans: {
        $sum: {
          $cond: [{ $regexMatch: { input: "$planSummary", regex: "IXSCAN" } }, 1, 0]
        }
      }
    }
  },
  {
    $addFields: {
      efficiency: {
        $cond: [
          { $eq: ["$totalDocsReturned", 0] },
          0,
          { $divide: ["$totalDocsReturned", "$totalDocsExamined"] }
        ]
      },
      avgDocsPerQuery: { $divide: ["$totalDocsReturned", "$count"] },
      collectionScanRate: { $divide: ["$collectionScans", "$count"] }
    }
  },
  { $sort: { "totalTime": -1 } },
  { $limit: 25 }
]).pretty()

Expected Output: Performance metrics grouped by operation type and collection, showing time distributions, scan types, and efficiency ratios. Identifies which collections and operations consume the most resources.

Critical Query Pattern Analysis

Query Shape Performance Analysis

// Query Pattern Analysis by Hash
db.system.profile.aggregate([
  {
    $match: {
      "queryHash": { $exists: true },
      "ts": { $gte: new Date(Date.now() - 10800000) } // Last 3 hours
    }
  },
  {
    $group: {
      _id: {
        queryHash: "$queryHash",
        collection: "$ns",
        operation: "$op"
      },
      executions: { $sum: 1 },
      totalTime: { $sum: "$millis" },
      avgTime: { $avg: "$millis" },
      maxTime: { $max: "$millis" },
      p95Time: { $percentile: { input: "$millis", p: [0.95], method: "approximate" } },
      totalDocsExamined: { $sum: "$docsExamined" },
      avgDocsExamined: { $avg: "$docsExamined" },
      totalKeysExamined: { $sum: "$keysExamined" },
      planVariations: { $addToSet: "$planSummary" },
      sampleQuery: { $first: "$command" },
      recentExecution: { $max: "$ts" }
    }
  },
  {
    $addFields: {
      impactScore: {
        $multiply: ["$executions", "$avgTime"] // Frequency × Average Time
      },
      avgEfficiency: {
        $cond: [
          { $eq: ["$totalDocsExamined", 0] },
          1,
          { $divide: ["$totalKeysExamined", "$totalDocsExamined"] }
        ]
      },
      isProblematic: {
        $or: [
          { $gt: ["$avgTime", 1000] },              // Slow queries
          { $gt: ["$avgDocsExamined", 10000] },     // High doc examination
          { $in: ["COLLSCAN", "$planVariations"] }, // Collection scans
          { $gt: ["$executions", 1000] }            // High frequency
        ]
      }
    }
  },
  { $sort: { "impactScore": -1 } },
  { $limit: 20 }
]).pretty()

Expected Output: Query patterns ranked by impact score (frequency × execution time), identifying recurring slow queries with their execution statistics, plan variations, and sample queries for optimization.

Aggregation Pipeline Performance Deep Dive

// Comprehensive Aggregation Analysis
db.system.profile.aggregate([
  {
    $match: {
      "op": "command",
      "command.aggregate": { $exists: true },
      "ts": { $gte: new Date(Date.now() - 7200000) }
    }
  },
  {
    $addFields: {
      pipelineStages: {
        $map: {
          input: "$command.pipeline",
          as: "stage",
          in: { $arrayElemAt: [{ $objectToArray: "$$stage" }, 0] }
        }
      },
      hasSkip: {
        $anyElementTrue: {
          $map: {
            input: "$command.pipeline",
            as: "stage",
            in: { $ne: [{ $type: "$$stage.$skip" }, "missing"] }
          }
        }
      },
      hasLookup: {
        $anyElementTrue: {
          $map: {
            input: "$command.pipeline",
            as: "stage",
            in: { $ne: [{ $type: "$$stage.$lookup" }, "missing"] }
          }
        }
      },
      hasSort: {
        $anyElementTrue: {
          $map: {
            input: "$command.pipeline",
            as: "stage",
            in: { $ne: [{ $type: "$$stage.$sort" }, "missing"] }
          }
        }
      }
    }
  },
  {
    $group: {
      _id: {
        collection: "$ns",
        stages: "$pipelineStages.k",
        hasSkip: "$hasSkip",
        hasLookup: "$hasLookup",
        hasSort: "$hasSort"
      },
      count: { $sum: 1 },
      avgTime: { $avg: "$millis" },
      maxTime: { $max: "$millis" },
      totalTime: { $sum: "$millis" },
      avgDocsExamined: { $avg: "$docsExamined" },
      avgKeysExamined: { $avg: "$keysExamined" },
      avgYields: { $avg: "$numYield" },
      samplePipeline: { $first: "$command.pipeline" }
    }
  },
  {
    $addFields: {
      riskScore: {
        $add: [
          { $cond: [{ $gt: ["$avgTime", 5000] }, 50, 0] },
          { $cond: [{ $eq: ["$_id.hasSkip", true] }, 30, 0] },
          { $cond: [{ $eq: ["$_id.hasLookup", true] }, 20, 0] },
          { $cond: [{ $gt: ["$avgDocsExamined", 50000] }, 40, 0] },
          { $cond: [{ $gt: ["$count", 100] }, 10, 0] }
        ]
      }
    }
  },
  { $sort: { "riskScore": -1, "totalTime": -1 } },
  { $limit: 15 }
]).pretty()

Expected Output: Aggregation pipelines with risk scoring based on execution time, stage complexity, and resource usage. Identifies pipelines with large skips, expensive lookups, and high document examination rates.

Resource Utilization and Bottleneck Analysis

System Resource Impact Assessment

// Complete Resource Utilization Analysis
db.system.profile.aggregate([
  {
    $match: {
      "ts": { $gte: new Date(Date.now() - 14400000) } // Last 4 hours
    }
  },
  {
    $facet: {
      "lockContention": [
        {
          $match: {
            "locks.Global.acquireWaitCount": { $gt: 0 }
          }
        },
        {
          $group: {
            _id: "$ns",
            totalLockWaits: { $sum: "$locks.Global.acquireWaitCount" },
            avgLockWaitTime: { $avg: "$locks.Global.timeAcquiringMicros" },
            operationCount: { $sum: 1 },
            avgExecutionTime: { $avg: "$millis" }
          }
        },
        { $sort: { "totalLockWaits": -1 } },
        { $limit: 10 }
      ],
      "memoryPressure": [
        {
          $match: {
            "numYield": { $gt: 10 }
          }
        },
        {
          $group: {
            _id: {
              collection: "$ns",
              operation: "$op"
            },
            totalYields: { $sum: "$numYield" },
            avgYields: { $avg: "$numYield" },
            maxYields: { $max: "$numYield" },
            operationCount: { $sum: 1 },
            avgTime: { $avg: "$millis" }
          }
        },
        { $sort: { "totalYields": -1 } },
        { $limit: 10 }
      ],
      "ioIntensive": [
        {
          $match: {
            "storage.data.bytesRead": { $gt: 1000000 } // > 1MB
          }
        },
        {
          $group: {
            _id: "$ns",
            totalBytesRead: { $sum: "$storage.data.bytesRead" },
            avgBytesRead: { $avg: "$storage.data.bytesRead" },
            totalReadTime: { $sum: "$storage.data.timeReadingMicros" },
            operationCount: { $sum: 1 }
          }
        },
        {
          $addFields: {
            totalBytesReadMB: { $divide: ["$totalBytesRead", 1048576] },
            avgBytesReadMB: { $divide: ["$avgBytesRead", 1048576] },
            avgReadTimeMs: { $divide: ["$totalReadTime", { $multiply: ["$operationCount", 1000] }] }
          }
        },
        { $sort: { "totalBytesRead": -1 } },
        { $limit: 10 }
      ]
    }
  }
]).pretty()

Expected Output: Three-dimensional resource analysis showing collections with lock contention, memory pressure through high yield counts, and I/O intensive operations with read time breakdowns.

Index Usage and Optimization Opportunities

// Index Effectiveness Analysis
db.system.profile.aggregate([
  {
    $match: {
      "ts": { $gte: new Date(Date.now() - 7200000) },
      "keysExamined": { $exists: true }
    }
  },
  {
    $addFields: {
      indexEfficiency: {
        $cond: [
          { $eq: ["$nreturned", 0] },
          0,
          { $divide: ["$nreturned", { $add: ["$keysExamined", 1] }] }
        ]
      },
      scanEfficiency: {
        $cond: [
          { $eq: ["$nreturned", 0] },
          0,
          { $divide: ["$nreturned", { $add: ["$docsExamined", 1] }] }
        ]
      },
      indexType: {
        $cond: [
          { $eq: ["$planSummary", "COLLSCAN"] },
          "COLLECTION_SCAN",
          {
            $cond: [
              { $regexMatch: { input: "$planSummary", regex: "IXSCAN" } },
              "INDEX_SCAN",
              "OTHER"
            ]
          }
        ]
      }
    }
  },
  {
    $group: {
      _id: {
        collection: "$ns",
        indexType: "$indexType",
        planSummary: "$planSummary"
      },
      operationCount: { $sum: 1 },
      avgExecutionTime: { $avg: "$millis" },
      totalExecutionTime: { $sum: "$millis" },
      avgKeysExamined: { $avg: "$keysExamined" },
      avgDocsExamined: { $avg: "$docsExamined" },
      avgDocsReturned: { $avg: "$nreturned" },
      avgIndexEfficiency: { $avg: "$indexEfficiency" },
      avgScanEfficiency: { $avg: "$scanEfficiency" },
      worstCase: { $max: "$millis" }
    }
  },
  {
    $addFields: {
      optimizationPriority: {
        $multiply: [
          "$operationCount",
          "$avgExecutionTime",
          {
            $cond: [
              { $eq: ["$_id.indexType", "COLLECTION_SCAN"] },
              10, // High priority for collection scans
              {
                $cond: [
                  { $lt: ["$avgIndexEfficiency", 0.1] }, // Low efficiency
                  5,
                  1
                ]
              }
            ]
          }
        ]
      }
    }
  },
  { $sort: { "optimizationPriority": -1 } },
  { $limit: 20 }
]).pretty()

Expected Output: Index utilization analysis with optimization priority scores, identifying collections needing new indexes or compound index refinements based on efficiency metrics.

Time-Series Trend Analysis

Performance Degradation Detection

// Hourly Performance Trends (Last 24 Hours)
db.system.profile.aggregate([
  {
    $match: {
      "ts": { $gte: new Date(Date.now() - 86400000) } // Last 24 hours
    }
  },
  {
    $addFields: {
      hour: {
        $dateToString: {
          format: "%Y-%m-%d %H:00",
          date: "$ts"
        }
      }
    }
  },
  {
    $group: {
      _id: {
        hour: "$hour",
        collection: "$ns"
      },
      operationCount: { $sum: 1 },
      avgLatency: { $avg: "$millis" },
      maxLatency: { $max: "$millis" },
      p95Latency: { $percentile: { input: "$millis", p: [0.95], method: "approximate" } },
      slowOperations: {
        $sum: {
          $cond: [{ $gt: ["$millis", 1000] }, 1, 0]
        }
      },
      collectionScans: {
        $sum: {
          $cond: [{ $eq: ["$planSummary", "COLLSCAN"] }, 1, 0]
        }
      }
    }
  },
  {
    $addFields: {
      slowOperationRate: {
        $divide: ["$slowOperations", "$operationCount"]
      },
      collectionScanRate: {
        $divide: ["$collectionScans", "$operationCount"]
      }
    }
  },
  {
    $sort: { "_id.hour": 1, "operationCount": -1 }
  }
]).pretty()

Expected Output: Hourly performance trends showing latency progression, operation volumes, and degradation patterns across collections, enabling identification of performance regressions over time.

Actionable Optimization Report

Priority Optimization Recommendations

// Generate Actionable Optimization Report
db.system.profile.aggregate([
  {
    $match: {
      "ts": { $gte: new Date(Date.now() - 10800000) } // Last 3 hours
    }
  },
  {
    $facet: {
      "immediateActions": [
        {
          $match: {
            $or: [
              { "millis": { $gt: 10000 } },              // > 10 seconds
              { "planSummary": "COLLSCAN", "millis": { $gt: 100 } },
              { "docsExamined": { $gt: 500000 } }        // > 500K docs
            ]
          }
        },
        {
          $group: {
            _id: {
              collection: "$ns",
              queryShape: "$queryHash",
              issue: {
                $cond: [
                  { $eq: ["$planSummary", "COLLSCAN"] },
                  "MISSING_INDEX",
                  {
                    $cond: [
                      { $gt: ["$millis", 10000] },
                      "EXTREMELY_SLOW",
                      "HIGH_DOCUMENT_SCAN"
                    ]
                  }
                ]
              }
            },
            occurrences: { $sum: 1 },
            avgTime: { $avg: "$millis" },
            totalImpact: { $sum: "$millis" },
            sampleQuery: { $first: "$command" }
          }
        },
        { $sort: { "totalImpact": -1 } },
        { $limit: 10 }
      ],
      "mediumPriorityActions": [
        {
          $match: {
            $and: [
              { "millis": { $gt: 1000, $lte: 10000 } },
              { $expr: { $gt: [{ $divide: ["$docsExamined", { $add: ["$nreturned", 1] }] }, 10] } }
            ]
          }
        },
        {
          $group: {
            _id: "$ns",
            avgInefficiency: {
              $avg: { $divide: ["$docsExamined", { $add: ["$nreturned", 1] }] }
            },
            operationCount: { $sum: 1 },
            totalTime: { $sum: "$millis" }
          }
        },
        { $sort: { "totalTime": -1 } },
        { $limit: 10 }
      ],
      "aggregationOptimizations": [
        {
          $match: {
            "op": "command",
            "command.aggregate": { $exists: true },
            "millis": { $gt: 2000 }
          }
        },
        {
          $project: {
            ns: 1, 
            millis: 1, 
            pipeline: "$command.pipeline",
            recommendation: "REVIEW_AGGREGATION_PERFORMANCE"
          }
        },
        { $sort: { "millis": -1 } },
        { $limit: 10 }
      ]
    }
  }
]).pretty()

Expected Output: Three-tiered optimization report with immediate actions (critical issues), medium priority items (inefficient queries), and specific aggregation recommendations with actionable remediation strategies.

This profiling approach provides operations teams with actionable insights to identify bottlenecks, optimize queries, and maintain optimal MongoDB performance in production environments.

1
2
3
4
5
6
7
8

DEV Community