
If you’ve ever upgraded Redis in production, you know one thing — it’s never “just a version change.”
We were running multiple Redis clusters on Amazon ElastiCache across environments, and every upgrade followed the same painful routine:
- Find the right cluster
- Double-check cluster mode
- Take a snapshot
- Validate engine compatibility
- Run CLI commands
- Monitor manually
- Hope nothing reconnects in a storm After doing this a few times, we realized something obvious:
The problem wasn’t Redis.
The problem was the process.
So we built a lightweight web-based dashboard to automate Redis cluster upgrades, snapshot management, and restore workflows — safely and predictably.
This is the story of that tool.
The Real Problem
Redis upgrades in AWS look simple in documentation. But in reality, things get tricky:
- Cluster-mode enabled vs disabled changes the snapshot flow.
- Some versions can’t upgrade directly.
- In-place upgrades may cause failovers.
- Snapshots don’t overwrite existing clusters.
- Restore always creates a new replication group.
- Someone always forgets to take a fresh snapshot.
- We didn’t want another “run this CLI carefully at 11 PM” situation.
We didn’t want another “run this CLI carefully at 11 PM” situation.
We wanted:
- Guardrails
- Visibility
- Repeatability
- Zero-downtime options
- Less human error
What We Built
A small web application that sits on top of AWS APIs and acts as a Redis operations control panel.
Under the hood it uses:
- Flask (Python backend)
- Boto3 (AWS SDK)
- Simple HTML/CSS frontend
- IAM roles for authentication
- Server-Sent Events for live progress updates
Nothing fancy. Just clean automation.
The goal was simple:
- Discover → Snapshot → Upgrade → Restore
- All from one clean UI.
Installation Guide
Step 1: Download/Clone Project
git clone https://github.com/gajjarashish007/GenAI/tree/a206b7598b423946b8dcf25aabe6b0fc3464b24f/Redis_Upgrade
cd redis_upgrade
Step 2: Install Dependencies
pip install -r requirements.txt
Required packages:
flask==3.0.0boto3==1.34.0
Step 3: Configure AWS Credentials
Choose one method:
Option A: AWS CLI (Recommended)
aws configure
Enter when prompted:
- AWS Access Key ID
- AWS Secret Access Key
- Default region (e.g.,
us-east-1) - Output format (press Enter for default)
Option B: Environment Variables
# Linux/Mac
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
export AWS_DEFAULT_REGION="us-east-1"
# Windows PowerShell
$env:AWS_ACCESS_KEY_ID="your_access_key"
$env:AWS_SECRET_ACCESS_KEY="your_secret_key"
$env:AWS_DEFAULT_REGION="us-east-1"
Option C: IAM Role (EC2)
If running on EC2, attach an IAM role with required permissions. No configuration needed.
Step 4: Start Application
python app.py
Expected output:
* Serving Flask app 'app'
* Debug mode: on
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:3000
* Running on http://YOUR-IP:3000
Step 5: Access Dashboard
Open browser and navigate to:
http://localhost:3000
Feature 1: Auto-Discovery of Redis Clusters
The first thing the dashboard does is scan the selected region and list all Redis replication groups.
For each cluster, it shows:
- Engine version
- Node type
- Cluster mode status
- Number of shards
- Replica count
- Current status
This removed the need to dig through the AWS console every time.

Feature 2: Snapshot Before Anything Risky
We enforced a rule in the UI:
You cannot upgrade without snapshotting first.
The tool automatically:
- Creates a manual snapshot
- Waits until status becomes “available”
- Logs the snapshot ID
- Proceeds only if backup succeeds
This one rule eliminated most operational anxiety.
Feature 3: Safe Engine Upgrade Workflow
We support two upgrade paths:
1. Direct Upgrade (In-Place)
Best for non-production environments.
The tool:
- Validates version compatibility
- Executes modify-replication-group
- Streams progress logs
- Displays status in real-time
2. Blue/Green Restore Strategy (Production)
For production clusters, we rarely do in-place upgrades.
Instead:
- Take snapshot
- Restore snapshot to new cluster
- Validate application connectivity
- Switch endpoint
- Keep old cluster temporarily This approach gives near zero downtime and easy rollback. The dashboard guides this flow step by step.
Feature 4: Snapshot Restore
One common misconception is:
“Can we restore over the same cluster?”
No.
On Amazon Web Services, restoring a snapshot always creates a new replication group.
The dashboard enforces unique naming and prevents accidental overwrite attempts.
It also validates:
- Node type compatibility
- Engine version match
- Memory requirements
Feature 5: Architecture Behind the Scenes
Under the hood, the system is simple by design:
Browser → Flask API → Boto3 → AWS ElastiCache
Final Thoughts
“Cache Me If You Can” wasn’t just a clever title.
It reflects a mindset shift — from reactive infrastructure to controlled, confident operations.
If you're managing Redis on AWS, consider building (or adopting) something similar.
Because in production systems:
Speed matters.
Stability matters more.
Conclusion
“Cache Me If You Can” ultimately highlights that successful Redis upgrades on AWS aren’t about mastering commands — they’re about designing a reliable process. By replacing manual, error-prone steps with a simple, automated dashboard, we transformed upgrades into a predictable, repeatable, and safe operation.
This tool brought structure through guardrails, confidence through enforced snapshots, and flexibility through blue/green deployment strategies — all while reducing downtime and human error.
In the end, the biggest win wasn’t just automation — it was peace of mind. Because in real-world production systems, it’s not just about moving fast — it’s about moving safely, every single time.





Top comments (0)