Ashish Gajjar for AWS Community Builders

Posted on Apr 3

Cache Me If You Can: Building a Web-Based Redis Upgrade Dashboard on AWS

#automation #aws #database #showdev

If you’ve ever upgraded Redis in production, you know one thing — it’s never “just a version change.”

We were running multiple Redis clusters on Amazon ElastiCache across environments, and every upgrade followed the same painful routine:

Find the right cluster
Double-check cluster mode
Take a snapshot
Validate engine compatibility
Run CLI commands
Monitor manually
Hope nothing reconnects in a storm After doing this a few times, we realized something obvious:

The problem wasn’t Redis.
The problem was the process.

So we built a lightweight web-based dashboard to automate Redis cluster upgrades, snapshot management, and restore workflows — safely and predictably.

This is the story of that tool.

The Real Problem

Redis upgrades in AWS look simple in documentation. But in reality, things get tricky:

Cluster-mode enabled vs disabled changes the snapshot flow.
Some versions can’t upgrade directly.
In-place upgrades may cause failovers.
Snapshots don’t overwrite existing clusters.
Restore always creates a new replication group.
Someone always forgets to take a fresh snapshot.
We didn’t want another “run this CLI carefully at 11 PM” situation.

We didn’t want another “run this CLI carefully at 11 PM” situation.

We wanted:

Guardrails
Visibility
Repeatability
Zero-downtime options
Less human error

What We Built

A small web application that sits on top of AWS APIs and acts as a Redis operations control panel.

Under the hood it uses:

Flask (Python backend)
Boto3 (AWS SDK)
Simple HTML/CSS frontend
IAM roles for authentication
Server-Sent Events for live progress updates

Nothing fancy. Just clean automation.

The goal was simple:

Discover → Snapshot → Upgrade → Restore
All from one clean UI.

Installation Guide

Step 1: Download/Clone Project

git clone https://github.com/gajjarashish007/GenAI/tree/a206b7598b423946b8dcf25aabe6b0fc3464b24f/Redis_Upgrade
cd redis_upgrade

Step 2: Install Dependencies

pip install -r requirements.txt

Required packages:

flask==3.0.0
boto3==1.34.0

Step 3: Configure AWS Credentials

Choose one method:

Option A: AWS CLI (Recommended)

aws configure

Enter when prompted:

AWS Access Key ID
AWS Secret Access Key
Default region (e.g., us-east-1)
Output format (press Enter for default)

Option B: Environment Variables

# Linux/Mac
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
export AWS_DEFAULT_REGION="us-east-1"

# Windows PowerShell
$env:AWS_ACCESS_KEY_ID="your_access_key"
$env:AWS_SECRET_ACCESS_KEY="your_secret_key"
$env:AWS_DEFAULT_REGION="us-east-1"

Option C: IAM Role (EC2)

If running on EC2, attach an IAM role with required permissions. No configuration needed.

Step 4: Start Application

python app.py

Expected output:

 * Serving Flask app 'app'
 * Debug mode: on
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:3000
 * Running on http://YOUR-IP:3000

Step 5: Access Dashboard

Open browser and navigate to:

http://localhost:3000

Feature 1: Auto-Discovery of Redis Clusters

The first thing the dashboard does is scan the selected region and list all Redis replication groups.

For each cluster, it shows:

Engine version
Node type
Cluster mode status
Number of shards
Replica count
Current status

This removed the need to dig through the AWS console every time.

Feature 2: Snapshot Before Anything Risky

We enforced a rule in the UI:

You cannot upgrade without snapshotting first.

The tool automatically:

Creates a manual snapshot
Waits until status becomes “available”
Logs the snapshot ID
Proceeds only if backup succeeds

This one rule eliminated most operational anxiety.

Feature 3: Safe Engine Upgrade Workflow

We support two upgrade paths:
1. Direct Upgrade (In-Place)

Best for non-production environments.

The tool:

Validates version compatibility
Executes modify-replication-group
Streams progress logs
Displays status in real-time

2. Blue/Green Restore Strategy (Production)

For production clusters, we rarely do in-place upgrades.
Instead:

Take snapshot
Restore snapshot to new cluster
Validate application connectivity
Switch endpoint
Keep old cluster temporarily This approach gives near zero downtime and easy rollback. The dashboard guides this flow step by step.

Feature 4: Snapshot Restore

One common misconception is:

“Can we restore over the same cluster?”

No.

On Amazon Web Services, restoring a snapshot always creates a new replication group.

The dashboard enforces unique naming and prevents accidental overwrite attempts.

It also validates:

Node type compatibility
Engine version match
Memory requirements

Feature 5: Architecture Behind the Scenes

Under the hood, the system is simple by design:
Browser → Flask API → Boto3 → AWS ElastiCache

Final Thoughts

“Cache Me If You Can” wasn’t just a clever title.

It reflects a mindset shift — from reactive infrastructure to controlled, confident operations.

If you're managing Redis on AWS, consider building (or adopting) something similar.

Because in production systems:

Speed matters.
Stability matters more.

Conclusion

“Cache Me If You Can” ultimately highlights that successful Redis upgrades on AWS aren’t about mastering commands — they’re about designing a reliable process. By replacing manual, error-prone steps with a simple, automated dashboard, we transformed upgrades into a predictable, repeatable, and safe operation.

This tool brought structure through guardrails, confidence through enforced snapshots, and flexibility through blue/green deployment strategies — all while reducing downtime and human error.

In the end, the biggest win wasn’t just automation — it was peace of mind. Because in real-world production systems, it’s not just about moving fast — it’s about moving safely, every single time.

DEV Community