đ Executive Summary
TL;DR: Unchecked Redis memory usage can lead to critical performance degradation and service outages. This guide provides a Python script to monitor Redis memory usage, calculate it against a predefined threshold, and automatically trigger PagerDuty alerts via its Events API v2, ensuring proactive incident response.
đŻ Key Takeaways
- Proactive monitoring of Redis memory usage is crucial for maintaining system reliability and preventing performance degradation.
- A Python script utilizes the
redis-pylibrary to fetch RedisINFO memorymetrics and therequestslibrary to send incident triggers to the PagerDuty Events API v2. - Configuration details such as Redis connection parameters, memory usage threshold, and PagerDuty integration key are securely managed via environment variables.
- The monitoring script calculates memory usage percentage based on
used\_memoryandmaxmemory, triggering a âwarningâ severity PagerDuty event if the threshold is breached. - Automation via
cronensures the Python script runs at regular intervals, providing continuous monitoring and timely alerts for Redis memory issues. - For effective percentage-based monitoring, the Redis instance must have the
maxmemorydirective explicitly configured.
Monitor Redis Memory Usage and Alert via PagerDuty API
Introduction
In the dynamic world of microservices and real-time applications, Redis stands out as a high-performance in-memory data store. Its speed and versatility are crucial for caching, session management, message brokering, and more. However, unchecked memory consumption in Redis can lead to critical performance degradation, data eviction, and even service outages, directly impacting user experience and business operations. Proactive monitoring and timely alerting are not just best practices; they are essential for maintaining the health and reliability of your Redis instances.
This comprehensive guide from TechResolve will walk you through setting up a robust system to monitor your Redis memory usage and automatically trigger alerts via the PagerDuty API when predefined thresholds are breached. By integrating Redis monitoring with PagerDuty, you ensure that your operations team is immediately notified of potential issues, enabling rapid response and minimizing downtime.
Prerequisites
Before you begin, ensure you have the following:
- A running Redis server instance that you wish to monitor.
- Python 3 installed on your monitoring host, along with its package manager,
pip. - An active PagerDuty account with administrative access to create services and integration keys.
- Basic familiarity with the Linux/Unix command line.
Step-by-Step Guide
Step 1: Configure PagerDuty Service and Integration
To receive alerts, PagerDuty needs to know where theyâre coming from. Weâll set up a dedicated service and an Events API v2 integration.
- Log in to your PagerDuty account.
- Navigate to Services > Service Directory.
- Click + New Service or select an existing service where you want to add the integration.
- If creating a new service:
- Provide a Name (e.g., âRedis Monitoringâ).
- Assign an Escalation Policy and Alert Grouping as appropriate for your team.
- Click Create Service.
- Once your service is created or selected, go to the Integrations tab within that service.
- Click + Add an integration.
- Search for and select Events API v2 as the Integration Type.
- Provide an Integration Name (e.g., âRedis Memory Alertsâ).
- Click Add Integration.
- After creation, you will see your new integration listed. Copy the Integration Key. This key is crucial for our script to send events to PagerDuty. Keep it secure.
Step 2: Install Required Python Libraries
Our monitoring script will rely on the redis-py library to interact with Redis and the requests library to communicate with the PagerDuty API. Install them using pip:
pip install redis requests
This command fetches and installs the necessary Python packages, making them available for your script.
Step 3: Develop the Redis Monitoring Script
Now, letâs create the Python script that will connect to Redis, fetch memory metrics, evaluate against a threshold, and trigger a PagerDuty incident if necessary.
Create a file named monitor_redis_memory.py:
# Python Script to Monitor Redis Memory and Alert via PagerDuty
import redis
import requests
import os
import json
# --- Configuration ---
REDIS_HOST = os.getenv('REDIS_HOST', 'localhost')
REDIS_PORT = int(os.getenv('REDIS_PORT', 6379))
REDIS_PASSWORD = os.getenv('REDIS_PASSWORD', None) # Set to None if no password
MEMORY_USAGE_THRESHOLD_PERCENT = float(os.getenv('MEMORY_USAGE_THRESHOLD_PERCENT', 80.0)) # %
PAGERDUTY_INTEGRATION_KEY = os.getenv('PAGERDUTY_INTEGRATION_KEY', 'YOUR_PAGERDUTY_INTEGRATION_KEY_HERE')
PAGERDUTY_API_URL = "https://events.pagerduty.com/v2/enqueue"
# --- Redis Connection ---
def get_redis_client():
try:
r = redis.StrictRedis(host=REDIS_HOST, port=REDIS_PORT, password=REDIS_PASSWORD, decode_responses=True)
r.ping()
return r
except redis.exceptions.ConnectionError as e:
print(f"Error connecting to Redis: {e}")
return None
# --- PagerDuty Incident Trigger ---
def trigger_pagerduty_event(severity, summary, source, custom_details=None):
if not PAGERDUTY_INTEGRATION_KEY or PAGERDUTY_INTEGRATION_KEY == 'YOUR_PAGERDUTY_INTEGRATION_KEY_HERE':
print("PagerDuty integration key not set. Cannot send alert.")
return
payload = {
"routing_key": PAGERDUTY_INTEGRATION_KEY,
"event_action": "trigger",
"payload": {
"summary": summary,
"source": source,
"severity": severity,
"component": "Redis",
"group": "Memory Monitoring",
"class": "Performance"
}
}
if custom_details:
payload["payload"]["custom_details"] = custom_details
headers = {
"Content-Type": "application/json"
}
try:
response = requests.post(PAGERDUTY_API_URL, headers=headers, data=json.dumps(payload))
response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
print(f"PagerDuty event sent successfully. Status: {response.status_code}, Response: {response.json()}")
except requests.exceptions.RequestException as e:
print(f"Error sending PagerDuty event: {e}")
# --- Main Logic ---
def main():
r = get_redis_client()
if not r:
trigger_pagerduty_event(
severity="critical",
summary=f"Redis Connection Failed on {REDIS_HOST}:{REDIS_PORT}",
source="Redis Monitor Script"
)
exit(1)
try:
info = r.info('memory')
used_memory = info.get('used_memory', 0)
used_memory_rss = info.get('used_memory_rss', 0)
maxmemory = info.get('maxmemory', 0) # 0 if no maxmemory is set
print(f"Redis Used Memory: {used_memory / (1024*1024):.2f} MB")
print(f"Redis Used Memory RSS: {used_memory_rss / (1024*1024):.2f} MB")
if maxmemory > 0:
memory_usage_percentage = (used_memory / maxmemory) * 100
print(f"Redis Max Memory: {maxmemory / (1024*1024):.2f} MB")
print(f"Redis Memory Usage Percentage: {memory_usage_percentage:.2f}% (Threshold: {MEMORY_USAGE_THRESHOLD_PERCENT:.2f}%)")
if memory_usage_percentage > MEMORY_USAGE_THRESHOLD_PERCENT:
summary = f"HIGH Redis Memory Usage on {REDIS_HOST}: {memory_usage_percentage:.2f}% exceeds {MEMORY_USAGE_THRESHOLD_PERCENT:.2f}%"
details = {
"used_memory_mb": used_memory / (1024*1024),
"used_memory_rss_mb": used_memory_rss / (1024*1024),
"maxmemory_mb": maxmemory / (1024*1024),
"current_percentage": f"{memory_usage_percentage:.2f}%",
"threshold_percentage": f"{MEMORY_USAGE_THRESHOLD_PERCENT:.2f}%"
}
trigger_pagerduty_event(
severity="warning",
summary=summary,
source=f"Redis Monitor Script ({REDIS_HOST})",
custom_details=details
)
else:
print("Redis memory usage is within acceptable limits.")
else:
print("Redis 'maxmemory' is not configured. Cannot calculate percentage usage. Consider setting 'maxmemory' for effective monitoring.")
except redis.exceptions.RedisError as e:
print(f"Error communicating with Redis: {e}")
trigger_pagerduty_event(
severity="critical",
summary=f"Redis Communication Error on {REDIS_HOST}:{REDIS_PORT}",
source="Redis Monitor Script",
custom_details={"error": str(e)}
)
except Exception as e:
print(f"An unexpected error occurred: {e}")
trigger_pagerduty_event(
severity="error",
summary=f"Redis Monitoring Script Error on {REDIS_HOST}",
source="Redis Monitor Script",
custom_details={"error": str(e)}
)
if __name__ == "__main__":
main()
Code Logic Explanation:
-
Configuration: The script uses environment variables for Redis connection details (host, port, password) and the PagerDuty integration key. This is a secure way to handle sensitive information without hardcoding. Remember to replace
'YOUR_PAGERDUTY_INTEGRATION_KEY_HERE'or set the environment variablePAGERDUTY_INTEGRATION_KEY. -
Redis Connection: The
get_redis_clientfunction establishes a connection to Redis. It includes error handling for connection failures, which will also trigger a PagerDuty alert if Redis is unreachable. -
PagerDuty Event Trigger: The
trigger_pagerduty_eventfunction constructs the JSON payload required by the PagerDuty Events API v2. It sends an HTTP POST request to PagerDuty and logs the response. It handles different severity levels (critical, warning, error) and includes custom details for better context. -
Main Logic:
- It retrieves Redis memory statistics using the
INFO memorycommand. Key metrics extracted areused_memory(total memory allocated by Redis) andmaxmemory(the configured maximum memory limit for Redis). - If
maxmemoryis set (i.e., > 0), it calculates the percentage of memory used. - If the calculated memory usage exceeds
MEMORY_USAGE_THRESHOLD_PERCENT, a PagerDuty incident is triggered with a âwarningâ severity. The alert includes a clear summary and custom details like current and maximum memory in MB, and the percentage usage. - Error handling is in place for Redis communication issues and other unexpected errors, ensuring that even script failures can raise alerts.
- It retrieves Redis memory statistics using the
Step 4: Automate the Monitoring Script
To ensure continuous monitoring, youâll need to schedule this script to run at regular intervals. cron is an excellent tool for this on Linux/Unix systems.
- Make the script executable:
chmod +x monitor_redis_memory.py
- Set environment variables for the script. You can either export them in your shell session before calling the script or pass them directly in the cron job, or ideally, manage them in a configuration file or a small wrapper script.
For example, to set environment variables and run the script every 5 minutes, youâd configure your cron editor:
Open your cron editor:
# Example cron entry to run every 5 minutes
# Ensure 'monitor_redis_memory.py' is in a known location, e.g., /home/user/scripts/
# The script relies on environment variables. You can set them inline or use a wrapper.
# Example with inline environment variables:
0,5,10,15,20,25,30,35,40,45,50,55 * * * * REDIS_HOST="your.redis.host" REDIS_PORT="6379" REDIS_PASSWORD="your_redis_password" MEMORY_USAGE_THRESHOLD_PERCENT="85.0" PAGERDUTY_INTEGRATION_KEY="YOUR_PAGERDUTY_INTEGRATION_KEY" python3 /home/user/scripts/monitor_redis_memory.py
# A cleaner approach using a wrapper script (e.g., /home/user/scripts/run_monitor.bash):
# run_monitor.bash content:
# #!/bin/bash
# export REDIS_HOST="your.redis.host"
# export REDIS_PORT="6379"
# export REDIS_PASSWORD="your_redis_password"
# export MEMORY_USAGE_THRESHOLD_PERCENT="85.0"
# export PAGERDUTY_INTEGRATION_KEY="YOUR_PAGERDUTY_INTEGRATION_KEY"
# python3 /home/user/scripts/monitor_redis_memory.py
# Then in cron:
# 0,5,10,15,20,25,30,35,40,45,50,55 * * * * /home/user/scripts/run_monitor.bash
Replace your.redis.host, your_redis_password, and YOUR_PAGERDUTY_INTEGRATION_KEY with your actual values. Adjust the frequency (0,5,10,... for every 5 minutes) as per your monitoring requirements.
Common Pitfalls
- Incorrect PagerDuty Integration Key: Ensure the Integration Key youâre using is for an âEvents API (v2)â integration and is correctly copied. Using a key from a different integration type (e.g., an older API version) will result in events not being processed.
-
Redis Connection Errors: Double-check your
REDIS_HOST,REDIS_PORT, andREDIS_PASSWORD. Network firewalls or security groups might also block the connection from your monitoring host to the Redis server. Test the connection manually first usingredis-clifrom the monitoring host. -
Cron Job Environment: Cron jobs run with a minimal environment. If your script relies on specific environment variables (like
PATHforpython3) or libraries that are not globally accessible, the cron job might fail. Always use absolute paths for scripts in cron, or ensure your wrapper script correctly sets up the environment. For Python, itâs often best to use the full path to your Python executable or ensure your Python is in the system PATH. Our script already utilizes environment variables effectively, but ensure they are correctly set for the cron user. -
Redis
maxmemorynot configured: If your Redis instance doesnât havemaxmemoryexplicitly set, the script cannot calculate a percentage usage and will not trigger alerts based on that threshold. While the script informs you about this, consider settingmaxmemoryfor comprehensive monitoring.
Conclusion
Implementing proactive monitoring for critical components like Redis is fundamental to maintaining system reliability and performance. By following this guide, youâve established a robust mechanism to track Redis memory usage and automatically alert your operations team via PagerDuty when issues arise. This integration empowers your team with timely insights, reducing mean time to resolution (MTTR) and ensuring your applications continue to run smoothly. Remember to regularly review your alert thresholds and PagerDuty escalation policies to adapt to the evolving needs of your infrastructure.
đ Read the original article on TechResolve.blog
â Support my work
If this article helped you, you can buy me a coffee:

Top comments (0)