Darian Vance

Posted on Jan 25 • Originally published at wp.me

Solved: Monitor MongoDB Slow Queries and Pipe to Discord Channel

#devops #programming #tutorial #cloud

🚀 Executive Summary

TL;DR: This guide outlines how to proactively monitor MongoDB slow queries by configuring the database profiler and developing a Python script. It automates the detection of performance bottlenecks and delivers real-time alerts directly to a Discord channel, enabling rapid diagnosis and resolution.

🎯 Key Takeaways

MongoDB’s database profiler (level 1 with slowOpThresholdMs) logs slow operations to the system.profile collection.
A Python script utilizing pymongo and requests can query the system.profile collection for new slow queries and send formatted alerts to a Discord webhook.
State management, typically via a state file, is implemented in the Python script to track the last\_processed\_timestamp and prevent duplicate alerts.
The Python script should be scheduled using cron (e.g., \*/5 \* \* \* \*) for continuous, automated monitoring.
Careful consideration of slowOpThresholdMs is crucial to balance capturing meaningful slow queries against potential profiling overhead and Discord webhook rate limits.

Monitor MongoDB Slow Queries and Pipe to Discord Channel

Introduction

In the dynamic world of modern applications, database performance is paramount. Unoptimized or “slow” queries in MongoDB can quickly degrade user experience, exhaust server resources, and lead to cascading failures. Manually sifting through database logs or system profiles to identify these culprits can be a time-consuming and reactive process.

At TechResolve, we advocate for proactive monitoring and rapid response. This tutorial provides a comprehensive, step-by-step guide to automate the detection of slow MongoDB queries and deliver real-time alerts directly to your team’s Discord channel. By integrating these insights into your communication workflow, you empower your SysAdmins, Developers, and DevOps Engineers to identify and resolve performance bottlenecks swiftly, ensuring your applications remain responsive and reliable.

Prerequisites

Before diving into the setup, ensure you have the following:

A running MongoDB instance (version 4.0 or newer recommended, though the profiler approach works with older versions).
Python 3 installed on the system that will run the monitoring script.
Python pymongo and requests libraries installed. You can install them using pip:

  pip install pymongo requests

Access to a Discord server and a configured webhook URL for the channel where you want to receive alerts. Instructions for creating a webhook can be found in Discord’s documentation.
Sufficient permissions to enable profiling on your MongoDB instance and to schedule tasks on the host system.

Step-by-Step Guide

Step 1: Configure MongoDB Profiling

MongoDB’s database profiler logs information about database operations, including slow queries. You can set the profiling level to capture operations exceeding a specified threshold.

Connect to your MongoDB instance using the mongo shell or your preferred client. Once connected, select the database you wish to monitor and set the profiling level. For this example, let’s assume your database is named mydb.

The profiling levels are:

0: Off – No operations are profiled.
1: On – Only slow operations are profiled.
2: All – All operations are profiled.

The slowOpThresholdMs parameter defines the duration in milliseconds an operation must exceed to be considered “slow”.

use mydb
db.setProfilingLevel(1, { slowOpThresholdMs: 100 })

In this command:

use mydb: Switches to your target database.
db.setProfilingLevel(1, { slowOpThresholdMs: 100 }): Sets the profiling level to 1 (only slow operations) and defines the slow operation threshold at 100 milliseconds. Any operation taking longer than 100ms will be logged to the system.profile collection within mydb.

For production environments, carefully consider the slowOpThresholdMs value and potentially start with a higher threshold to avoid excessive logging, then adjust as needed. Profiling level 2 can introduce significant overhead and is generally not recommended for continuous production use.

Step 2: Develop a Python Script to Detect Slow Queries

Next, we’ll create a Python script that connects to MongoDB, queries the system.profile collection for new slow operations, formats the relevant details, and prepares them for sending to Discord. To avoid sending duplicate alerts, the script will keep track of the timestamp of the last processed slow query.

Create a file named monitor_mongodb.py with the following content:

import pymongo
import requests
import json
import os
from datetime import datetime, timezone, timedelta

# --- Configuration ---
MONGO_URI = "mongodb://localhost:27017/"
MONGO_DB_NAME = "mydb"
DISCORD_WEBHOOK_URL = "YOUR_DISCORD_WEBHOOK_URL_HERE"
STATE_FILE_PATH = os.path.expanduser("~/mongodb_slow_query_monitor_state.json") # State file location
THRESHOLD_SECONDS = 300 # Only process operations within the last 5 minutes on each run if state is too old

def get_last_processed_timestamp():
    """Retrieves the timestamp of the last processed slow query from the state file."""
    if os.path.exists(STATE_FILE_PATH):
        try:
            with open(STATE_FILE_PATH, 'r') as f:
                state = json.load(f)
                # Ensure the timestamp is timezone-aware and in UTC
                dt_obj = datetime.fromisoformat(state.get('last_timestamp'))
                if dt_obj.tzinfo is None: # Naive datetime, assume UTC
                    return dt_obj.replace(tzinfo=timezone.utc)
                return dt_obj.astimezone(timezone.utc)
        except (json.JSONDecodeError, KeyError):
            print("Warning: Could not read or parse state file. Starting from scratch or threshold.")
            return datetime.now(timezone.utc) - timedelta(seconds=THRESHOLD_SECONDS) # Start from a recent window
    return datetime.now(timezone.utc) - timedelta(seconds=THRESHOLD_SECONDS) # Start from a recent window

def set_last_processed_timestamp(timestamp):
    """Saves the timestamp of the latest processed slow query to the state file."""
    # Ensure timestamp is UTC and has timezone info before saving
    if timestamp.tzinfo is None:
        timestamp = timestamp.replace(tzinfo=timezone.utc)
    else:
        timestamp = timestamp.astimezone(timezone.utc)

    with open(STATE_FILE_PATH, 'w') as f:
        json.dump({'last_timestamp': timestamp.isoformat()}, f)

def send_to_discord(message_content, embed_data=None):
    """Sends a message to the Discord channel via webhook."""
    headers = {'Content-Type': 'application/json'}
    payload = {'content': message_content}
    if embed_data:
        payload['embeds'] = [embed_data]

    try:
        response = requests.post(DISCORD_WEBHOOK_URL, json=payload, headers=headers)
        response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
        print("Message sent to Discord successfully.")
    except requests.exceptions.RequestException as e:
        print(f"Error sending message to Discord: {e}")
        # print(f"Response content: {response.text if response else 'N/A'}") # Omit for WAF
        if response is not None:
             print(f"Discord API response status: {response.status_code}")


def main():
    client = None
    try:
        client = pymongo.MongoClient(MONGO_URI)
        db = client[MONGO_DB_NAME]

        last_timestamp = get_last_processed_timestamp()

        # Adjust last_timestamp to be within a reasonable window for initial runs or if state is lost
        # This prevents processing extremely old logs if the script was down for a long time
        now_utc = datetime.now(timezone.utc)
        if (now_utc - last_timestamp).total_seconds() > THRESHOLD_SECONDS * 2: # If state is too old (e.g., more than 10 mins)
            last_timestamp = now_utc - timedelta(seconds=THRESHOLD_SECONDS)

        # Query the system.profile collection for new slow operations
        # The ts field is the timestamp of the operation
        # We look for operations newer than our last_timestamp
        # Sort by ts to process chronologically and update last_timestamp correctly
        print(f"Checking for new slow queries since: {last_timestamp.isoformat()}")
        cursor = db.system.profile.find(
            {'ts': {'$gt': last_timestamp}}
        ).sort('ts', pymongo.ASCENDING)

        new_slow_queries = []
        latest_ts_in_run = last_timestamp

        for profile_entry in cursor:
            # Ensure profile_entry['ts'] is timezone-aware for comparison
            entry_ts = profile_entry['ts']
            if entry_ts.tzinfo is None:
                entry_ts = entry_ts.replace(tzinfo=timezone.utc)

            new_slow_queries.append(profile_entry)
            if entry_ts > latest_ts_in_run:
                latest_ts_in_run = entry_ts

        if not new_slow_queries:
            print("No new slow queries detected.")
            # If no new queries, just update the timestamp to now to prevent checking ancient history on next run
            set_last_processed_timestamp(now_utc) 
            return

        print(f"Detected {len(new_slow_queries)} new slow queries.")
        for query_data in new_slow_queries:
            op_type = query_data.get('op', 'N/A')
            namespace = query_data.get('ns', 'N/A')
            duration_ms = query_data.get('millis', 'N/A')

            # Prioritize 'filter' in 'query' or 'command', then 'query' itself, then 'command' itself
            query_info = query_data.get('query', query_data.get('command', {}))
            query_filter = query_info.get('filter', query_info) # If 'filter' not present, take whole query_info

            client_ip = query_data.get('client', 'N/A')

            # Limit the query filter to prevent extremely long messages
            formatted_query_filter = json.dumps(query_filter, indent=2)
            if len(formatted_query_filter) > 1000: # Increased limit slightly for better info
                formatted_query_filter = formatted_query_filter[:1000] + "\n... (truncated)"

            # Ensure query_data['ts'] is UTC for Discord timestamp formatting
            discord_timestamp = query_data['ts']
            if discord_timestamp.tzinfo is None:
                discord_timestamp = discord_timestamp.replace(tzinfo=timezone.utc)
            else:
                discord_timestamp = discord_timestamp.astimezone(timezone.utc)

            embed = {
                "title": f"🚨 Slow Query Detected in {namespace}",
                "description": f"Operation Type: `{op_type}`",
                "color": 15844367, # A nice red/orange color for alerts
                "fields": [
                    {"name": "Duration", "value": f"{duration_ms} ms", "inline": True},
                    {"name": "Client IP", "value": f"`{client_ip}`", "inline": True},
                    {"name": "Timestamp", "value": f"<t:{int(discord_timestamp.timestamp())}:F>", "inline": False},
                    {"name": "Query Filter/Command", "value": f"<code>json\n{formatted_query_filter}\n</code>", "inline": False}
                ],
                "footer": {"text": "MongoDB Slow Query Monitor by TechResolve"}
            }
            send_to_discord(f"**Attention!** A slow MongoDB query was detected.", embed_data=embed)

        set_last_processed_timestamp(latest_ts_in_run)
        print(f"Updated last processed timestamp to: {latest_ts_in_run.isoformat()}")

    except pymongo.errors.ConnectionFailure as e:
        print(f"MongoDB Connection Error: {e}")
        send_to_discord(f"**ERROR:** MongoDB connection failed for slow query monitor. Details: `{e}`")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        send_to_discord(f"**ERROR:** An unexpected error occurred in slow query monitor. Details: `{e}`")
    finally:
        if client:
            client.close()

if __name__ == "__main__":
    main()

Code Logic Explained:

Configuration: Define your MongoDB connection URI, database name, Discord webhook URL, and the path for the state file (mongodb_slow_query_monitor_state.json in your user’s home directory).
State Management: The get_last_processed_timestamp and set_last_processed_timestamp functions handle reading and writing the timestamp of the last alert sent. This prevents duplicate alerts and ensures only new slow queries are processed on each run. It defaults to a recent window if no state file is found or if the state is too old.
Discord Integration: The send_to_discord function constructs a JSON payload with the alert message and an embedded object for better formatting in Discord, then sends it via a POST request to your webhook URL. It includes error handling for network issues.
MongoDB Query: The script connects to MongoDB and queries db.system.profile. It filters for entries where the ts (timestamp) field is greater than the last_timestamp from the state file. Results are sorted chronologically.
Alert Formatting: For each new slow query, it extracts key details like operation type (op), namespace (ns), duration (millis), and the actual query filter/command. These are formatted into a rich Discord embed message, making the alert easily digestible. We correctly escape the backticks for the JSON code block with <code>json\n...\n</code>.
Error Handling: Basic error handling for MongoDB connection failures and other unexpected exceptions is included, with alerts sent to Discord for critical issues.

Remember to replace "YOUR_DISCORD_WEBHOOK_URL_HERE" and adjust MONGO_URI and MONGO_DB_NAME to match your environment.

Step 3: Schedule the Script for Continuous Monitoring

To ensure continuous monitoring, you need to schedule your Python script to run at regular intervals. cron is a standard utility on Unix-like systems for this purpose.

First, make your Python script executable:

chmod +x monitor_mongodb.py

Next, open your cron editor to add a new job:

Open your cron editor

Add the following line to schedule the script to run every 5 minutes. Make sure to provide the full path to your monitor_mongodb.py script.

*/5 * * * * python3 /home/user/monitor_mongodb.py

Cron Job Logic Explained:

*/5 * * * *: This cron expression means “at every 5th minute”. So, the script will execute at 0, 5, 10, …, 55 minutes past the hour, every hour, every day.
python3: This invokes the Python 3 interpreter.
/home/user/monitor_mongodb.py: This is the absolute path to your monitoring script. Adjust /home/user/ to the actual directory where you’ve saved your script.

Save and exit your cron editor. The script will now run automatically, fetching new slow queries and dispatching alerts to Discord.

Common Pitfalls

### MongoDB Profiling Overhead and Configuration

Setting db.setProfilingLevel(2) (profiling all operations) can introduce significant performance overhead, especially on high-traffic databases. Even db.setProfilingLevel(1) with a very low slowOpThresholdMs (e.g., 0ms) can generate a large volume of logs, potentially impacting disk I/O and the size of the system.profile collection. Ensure your slowOpThresholdMs is set appropriately for your production environment to capture meaningful slow queries without overwhelming your system or your Discord channel with alerts. Consider starting with a higher threshold (e.g., 500ms or 1000ms) and gradually reducing it as you optimize queries.

### Discord Rate Limits and Alert Volume

Discord webhooks have rate limits. If your MongoDB instance is experiencing a high volume of slow queries, your script might attempt to send too many messages too quickly, leading to HTTP 429 “Too Many Requests” errors from Discord. This can result in missed alerts. To mitigate this, you could:

Increase your slowOpThresholdMs in MongoDB to reduce the number of slow queries logged.
Implement a delay or a batching mechanism within your Python script to send multiple slow queries in a single Discord embed or with short pauses between individual messages, if batching is not feasible for immediate alerts.
Adjust the cron schedule to run less frequently if the volume is consistently high.

Conclusion

Automating the monitoring of MongoDB slow queries and piping them to a dedicated Discord channel significantly enhances your team’s ability to maintain application performance. This proactive approach ensures that performance bottlenecks are identified and brought to attention swiftly, fostering a culture of rapid diagnosis and resolution. By following this guide, you’ve established a robust system that transforms potential silent failures into actionable insights, keeping your MongoDB healthy and your applications performant. Remember to continuously fine-tune your profiling thresholds and explore further integrations to adapt this solution to your evolving operational needs.