DEV Community

Cover image for Solved: Monitor Redis Memory Usage and Alert via PagerDuty API
Darian Vance
Darian Vance

Posted on • Originally published at wp.me

Solved: Monitor Redis Memory Usage and Alert via PagerDuty API

🚀 Executive Summary

TL;DR: Unchecked Redis memory usage can lead to critical performance degradation and service outages. This guide provides a Python script to monitor Redis memory usage, calculate it against a predefined threshold, and automatically trigger PagerDuty alerts via its Events API v2, ensuring proactive incident response.

🎯 Key Takeaways

  • Proactive monitoring of Redis memory usage is crucial for maintaining system reliability and preventing performance degradation.
  • A Python script utilizes the redis-py library to fetch Redis INFO memory metrics and the requests library to send incident triggers to the PagerDuty Events API v2.
  • Configuration details such as Redis connection parameters, memory usage threshold, and PagerDuty integration key are securely managed via environment variables.
  • The monitoring script calculates memory usage percentage based on used\_memory and maxmemory, triggering a ‘warning’ severity PagerDuty event if the threshold is breached.
  • Automation via cron ensures the Python script runs at regular intervals, providing continuous monitoring and timely alerts for Redis memory issues.
  • For effective percentage-based monitoring, the Redis instance must have the maxmemory directive explicitly configured.

Monitor Redis Memory Usage and Alert via PagerDuty API

Introduction

In the dynamic world of microservices and real-time applications, Redis stands out as a high-performance in-memory data store. Its speed and versatility are crucial for caching, session management, message brokering, and more. However, unchecked memory consumption in Redis can lead to critical performance degradation, data eviction, and even service outages, directly impacting user experience and business operations. Proactive monitoring and timely alerting are not just best practices; they are essential for maintaining the health and reliability of your Redis instances.

This comprehensive guide from TechResolve will walk you through setting up a robust system to monitor your Redis memory usage and automatically trigger alerts via the PagerDuty API when predefined thresholds are breached. By integrating Redis monitoring with PagerDuty, you ensure that your operations team is immediately notified of potential issues, enabling rapid response and minimizing downtime.

Prerequisites

Before you begin, ensure you have the following:

  • A running Redis server instance that you wish to monitor.
  • Python 3 installed on your monitoring host, along with its package manager, pip.
  • An active PagerDuty account with administrative access to create services and integration keys.
  • Basic familiarity with the Linux/Unix command line.

Step-by-Step Guide

Step 1: Configure PagerDuty Service and Integration

To receive alerts, PagerDuty needs to know where they’re coming from. We’ll set up a dedicated service and an Events API v2 integration.

  1. Log in to your PagerDuty account.
  2. Navigate to Services > Service Directory.
  3. Click + New Service or select an existing service where you want to add the integration.
  4. If creating a new service:
    • Provide a Name (e.g., “Redis Monitoring”).
    • Assign an Escalation Policy and Alert Grouping as appropriate for your team.
    • Click Create Service.
  5. Once your service is created or selected, go to the Integrations tab within that service.
  6. Click + Add an integration.
  7. Search for and select Events API v2 as the Integration Type.
  8. Provide an Integration Name (e.g., “Redis Memory Alerts”).
  9. Click Add Integration.
  10. After creation, you will see your new integration listed. Copy the Integration Key. This key is crucial for our script to send events to PagerDuty. Keep it secure.

Step 2: Install Required Python Libraries

Our monitoring script will rely on the redis-py library to interact with Redis and the requests library to communicate with the PagerDuty API. Install them using pip:

pip install redis requests
Enter fullscreen mode Exit fullscreen mode

This command fetches and installs the necessary Python packages, making them available for your script.

Step 3: Develop the Redis Monitoring Script

Now, let’s create the Python script that will connect to Redis, fetch memory metrics, evaluate against a threshold, and trigger a PagerDuty incident if necessary.

Create a file named monitor_redis_memory.py:

# Python Script to Monitor Redis Memory and Alert via PagerDuty

import redis
import requests
import os
import json

# --- Configuration ---
REDIS_HOST = os.getenv('REDIS_HOST', 'localhost')
REDIS_PORT = int(os.getenv('REDIS_PORT', 6379))
REDIS_PASSWORD = os.getenv('REDIS_PASSWORD', None) # Set to None if no password
MEMORY_USAGE_THRESHOLD_PERCENT = float(os.getenv('MEMORY_USAGE_THRESHOLD_PERCENT', 80.0)) # %
PAGERDUTY_INTEGRATION_KEY = os.getenv('PAGERDUTY_INTEGRATION_KEY', 'YOUR_PAGERDUTY_INTEGRATION_KEY_HERE')
PAGERDUTY_API_URL = "https://events.pagerduty.com/v2/enqueue"

# --- Redis Connection ---
def get_redis_client():
    try:
        r = redis.StrictRedis(host=REDIS_HOST, port=REDIS_PORT, password=REDIS_PASSWORD, decode_responses=True)
        r.ping()
        return r
    except redis.exceptions.ConnectionError as e:
        print(f"Error connecting to Redis: {e}")
        return None

# --- PagerDuty Incident Trigger ---
def trigger_pagerduty_event(severity, summary, source, custom_details=None):
    if not PAGERDUTY_INTEGRATION_KEY or PAGERDUTY_INTEGRATION_KEY == 'YOUR_PAGERDUTY_INTEGRATION_KEY_HERE':
        print("PagerDuty integration key not set. Cannot send alert.")
        return

    payload = {
        "routing_key": PAGERDUTY_INTEGRATION_KEY,
        "event_action": "trigger",
        "payload": {
            "summary": summary,
            "source": source,
            "severity": severity,
            "component": "Redis",
            "group": "Memory Monitoring",
            "class": "Performance"
        }
    }
    if custom_details:
        payload["payload"]["custom_details"] = custom_details

    headers = {
        "Content-Type": "application/json"
    }

    try:
        response = requests.post(PAGERDUTY_API_URL, headers=headers, data=json.dumps(payload))
        response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
        print(f"PagerDuty event sent successfully. Status: {response.status_code}, Response: {response.json()}")
    except requests.exceptions.RequestException as e:
        print(f"Error sending PagerDuty event: {e}")

# --- Main Logic ---
def main():
    r = get_redis_client()
    if not r:
        trigger_pagerduty_event(
            severity="critical",
            summary=f"Redis Connection Failed on {REDIS_HOST}:{REDIS_PORT}",
            source="Redis Monitor Script"
        )
        exit(1)

    try:
        info = r.info('memory')
        used_memory = info.get('used_memory', 0)
        used_memory_rss = info.get('used_memory_rss', 0)
        maxmemory = info.get('maxmemory', 0) # 0 if no maxmemory is set

        print(f"Redis Used Memory: {used_memory / (1024*1024):.2f} MB")
        print(f"Redis Used Memory RSS: {used_memory_rss / (1024*1024):.2f} MB")

        if maxmemory > 0:
            memory_usage_percentage = (used_memory / maxmemory) * 100
            print(f"Redis Max Memory: {maxmemory / (1024*1024):.2f} MB")
            print(f"Redis Memory Usage Percentage: {memory_usage_percentage:.2f}% (Threshold: {MEMORY_USAGE_THRESHOLD_PERCENT:.2f}%)")

            if memory_usage_percentage > MEMORY_USAGE_THRESHOLD_PERCENT:
                summary = f"HIGH Redis Memory Usage on {REDIS_HOST}: {memory_usage_percentage:.2f}% exceeds {MEMORY_USAGE_THRESHOLD_PERCENT:.2f}%"
                details = {
                    "used_memory_mb": used_memory / (1024*1024),
                    "used_memory_rss_mb": used_memory_rss / (1024*1024),
                    "maxmemory_mb": maxmemory / (1024*1024),
                    "current_percentage": f"{memory_usage_percentage:.2f}%",
                    "threshold_percentage": f"{MEMORY_USAGE_THRESHOLD_PERCENT:.2f}%"
                }
                trigger_pagerduty_event(
                    severity="warning",
                    summary=summary,
                    source=f"Redis Monitor Script ({REDIS_HOST})",
                    custom_details=details
                )
            else:
                print("Redis memory usage is within acceptable limits.")
        else:
            print("Redis 'maxmemory' is not configured. Cannot calculate percentage usage. Consider setting 'maxmemory' for effective monitoring.")

    except redis.exceptions.RedisError as e:
        print(f"Error communicating with Redis: {e}")
        trigger_pagerduty_event(
            severity="critical",
            summary=f"Redis Communication Error on {REDIS_HOST}:{REDIS_PORT}",
            source="Redis Monitor Script",
            custom_details={"error": str(e)}
        )
    except Exception as e:
        print(f"An unexpected error occurred: {e}")
        trigger_pagerduty_event(
            severity="error",
            summary=f"Redis Monitoring Script Error on {REDIS_HOST}",
            source="Redis Monitor Script",
            custom_details={"error": str(e)}
        )

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Code Logic Explanation:

  • Configuration: The script uses environment variables for Redis connection details (host, port, password) and the PagerDuty integration key. This is a secure way to handle sensitive information without hardcoding. Remember to replace 'YOUR_PAGERDUTY_INTEGRATION_KEY_HERE' or set the environment variable PAGERDUTY_INTEGRATION_KEY.
  • Redis Connection: The get_redis_client function establishes a connection to Redis. It includes error handling for connection failures, which will also trigger a PagerDuty alert if Redis is unreachable.
  • PagerDuty Event Trigger: The trigger_pagerduty_event function constructs the JSON payload required by the PagerDuty Events API v2. It sends an HTTP POST request to PagerDuty and logs the response. It handles different severity levels (critical, warning, error) and includes custom details for better context.
  • Main Logic:
    • It retrieves Redis memory statistics using the INFO memory command. Key metrics extracted are used_memory (total memory allocated by Redis) and maxmemory (the configured maximum memory limit for Redis).
    • If maxmemory is set (i.e., > 0), it calculates the percentage of memory used.
    • If the calculated memory usage exceeds MEMORY_USAGE_THRESHOLD_PERCENT, a PagerDuty incident is triggered with a “warning” severity. The alert includes a clear summary and custom details like current and maximum memory in MB, and the percentage usage.
    • Error handling is in place for Redis communication issues and other unexpected errors, ensuring that even script failures can raise alerts.

Step 4: Automate the Monitoring Script

To ensure continuous monitoring, you’ll need to schedule this script to run at regular intervals. cron is an excellent tool for this on Linux/Unix systems.

  1. Make the script executable:
   chmod +x monitor_redis_memory.py
Enter fullscreen mode Exit fullscreen mode
  1. Set environment variables for the script. You can either export them in your shell session before calling the script or pass them directly in the cron job, or ideally, manage them in a configuration file or a small wrapper script.

For example, to set environment variables and run the script every 5 minutes, you’d configure your cron editor:

Open your cron editor:

   # Example cron entry to run every 5 minutes
   # Ensure 'monitor_redis_memory.py' is in a known location, e.g., /home/user/scripts/
   # The script relies on environment variables. You can set them inline or use a wrapper.

   # Example with inline environment variables:
   0,5,10,15,20,25,30,35,40,45,50,55 * * * * REDIS_HOST="your.redis.host" REDIS_PORT="6379" REDIS_PASSWORD="your_redis_password" MEMORY_USAGE_THRESHOLD_PERCENT="85.0" PAGERDUTY_INTEGRATION_KEY="YOUR_PAGERDUTY_INTEGRATION_KEY" python3 /home/user/scripts/monitor_redis_memory.py

   # A cleaner approach using a wrapper script (e.g., /home/user/scripts/run_monitor.bash):
   # run_monitor.bash content:
   # #!/bin/bash
   # export REDIS_HOST="your.redis.host"
   # export REDIS_PORT="6379"
   # export REDIS_PASSWORD="your_redis_password"
   # export MEMORY_USAGE_THRESHOLD_PERCENT="85.0"
   # export PAGERDUTY_INTEGRATION_KEY="YOUR_PAGERDUTY_INTEGRATION_KEY"
   # python3 /home/user/scripts/monitor_redis_memory.py

   # Then in cron:
   # 0,5,10,15,20,25,30,35,40,45,50,55 * * * * /home/user/scripts/run_monitor.bash
Enter fullscreen mode Exit fullscreen mode

Replace your.redis.host, your_redis_password, and YOUR_PAGERDUTY_INTEGRATION_KEY with your actual values. Adjust the frequency (0,5,10,... for every 5 minutes) as per your monitoring requirements.

Common Pitfalls

  1. Incorrect PagerDuty Integration Key: Ensure the Integration Key you’re using is for an “Events API (v2)” integration and is correctly copied. Using a key from a different integration type (e.g., an older API version) will result in events not being processed.
  2. Redis Connection Errors: Double-check your REDIS_HOST, REDIS_PORT, and REDIS_PASSWORD. Network firewalls or security groups might also block the connection from your monitoring host to the Redis server. Test the connection manually first using redis-cli from the monitoring host.
  3. Cron Job Environment: Cron jobs run with a minimal environment. If your script relies on specific environment variables (like PATH for python3) or libraries that are not globally accessible, the cron job might fail. Always use absolute paths for scripts in cron, or ensure your wrapper script correctly sets up the environment. For Python, it’s often best to use the full path to your Python executable or ensure your Python is in the system PATH. Our script already utilizes environment variables effectively, but ensure they are correctly set for the cron user.
  4. Redis maxmemory not configured: If your Redis instance doesn’t have maxmemory explicitly set, the script cannot calculate a percentage usage and will not trigger alerts based on that threshold. While the script informs you about this, consider setting maxmemory for comprehensive monitoring.

Conclusion

Implementing proactive monitoring for critical components like Redis is fundamental to maintaining system reliability and performance. By following this guide, you’ve established a robust mechanism to track Redis memory usage and automatically alert your operations team via PagerDuty when issues arise. This integration empowers your team with timely insights, reducing mean time to resolution (MTTR) and ensuring your applications continue to run smoothly. Remember to regularly review your alert thresholds and PagerDuty escalation policies to adapt to the evolving needs of your infrastructure.


Darian Vance

👉 Read the original article on TechResolve.blog


☕ Support my work

If this article helped you, you can buy me a coffee:

👉 https://buymeacoffee.com/darianvance

Top comments (0)