DEV Community

Cache Me If You Can: Building a Web-Based Redis Upgrade Dashboard on AWS


If you’ve ever upgraded Redis in production, you know one thing — it’s never “just a version change.”

We were running multiple Redis clusters on Amazon ElastiCache across environments, and every upgrade followed the same painful routine:

  • Find the right cluster
  • Double-check cluster mode
  • Take a snapshot
  • Validate engine compatibility
  • Run CLI commands
  • Monitor manually
  • Hope nothing reconnects in a storm After doing this a few times, we realized something obvious:

The problem wasn’t Redis.
The problem was the process.

So we built a lightweight web-based dashboard to automate Redis cluster upgrades, snapshot management, and restore workflows — safely and predictably.

This is the story of that tool.

The Real Problem

Redis upgrades in AWS look simple in documentation. But in reality, things get tricky:

  • Cluster-mode enabled vs disabled changes the snapshot flow.
  • Some versions can’t upgrade directly.
  • In-place upgrades may cause failovers.
  • Snapshots don’t overwrite existing clusters.
  • Restore always creates a new replication group.
  • Someone always forgets to take a fresh snapshot.
  • We didn’t want another “run this CLI carefully at 11 PM” situation.

We didn’t want another “run this CLI carefully at 11 PM” situation.

We wanted:

  • Guardrails
  • Visibility
  • Repeatability
  • Zero-downtime options
  • Less human error

What We Built

A small web application that sits on top of AWS APIs and acts as a Redis operations control panel.

Under the hood it uses:

  • Flask (Python backend)
  • Boto3 (AWS SDK)
  • Simple HTML/CSS frontend
  • IAM roles for authentication
  • Server-Sent Events for live progress updates

Nothing fancy. Just clean automation.

The goal was simple:

  • Discover → Snapshot → Upgrade → Restore
  • All from one clean UI.

Installation Guide

Step 1: Download/Clone Project

git clone https://github.com/gajjarashish007/GenAI/tree/a206b7598b423946b8dcf25aabe6b0fc3464b24f/Redis_Upgrade
cd redis_upgrade
Enter fullscreen mode Exit fullscreen mode

Step 2: Install Dependencies

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Required packages:

  • flask==3.0.0
  • boto3==1.34.0

Step 3: Configure AWS Credentials

Choose one method:

Option A: AWS CLI (Recommended)

aws configure
Enter fullscreen mode Exit fullscreen mode

Enter when prompted:

  • AWS Access Key ID
  • AWS Secret Access Key
  • Default region (e.g., us-east-1)
  • Output format (press Enter for default)

Option B: Environment Variables

# Linux/Mac
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
export AWS_DEFAULT_REGION="us-east-1"

# Windows PowerShell
$env:AWS_ACCESS_KEY_ID="your_access_key"
$env:AWS_SECRET_ACCESS_KEY="your_secret_key"
$env:AWS_DEFAULT_REGION="us-east-1"
Enter fullscreen mode Exit fullscreen mode

Option C: IAM Role (EC2)

If running on EC2, attach an IAM role with required permissions. No configuration needed.

Step 4: Start Application

python app.py
Enter fullscreen mode Exit fullscreen mode

Expected output:

 * Serving Flask app 'app'
 * Debug mode: on
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:3000
 * Running on http://YOUR-IP:3000
Enter fullscreen mode Exit fullscreen mode

Step 5: Access Dashboard

Open browser and navigate to:

http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

Feature 1: Auto-Discovery of Redis Clusters

The first thing the dashboard does is scan the selected region and list all Redis replication groups.

For each cluster, it shows:

  • Engine version
  • Node type
  • Cluster mode status
  • Number of shards
  • Replica count
  • Current status

This removed the need to dig through the AWS console every time.

Feature 2: Snapshot Before Anything Risky

We enforced a rule in the UI:

You cannot upgrade without snapshotting first.

The tool automatically:

  • Creates a manual snapshot
  • Waits until status becomes “available”
  • Logs the snapshot ID
  • Proceeds only if backup succeeds

This one rule eliminated most operational anxiety.

Feature 3: Safe Engine Upgrade Workflow

We support two upgrade paths:
1. Direct Upgrade (In-Place)

Best for non-production environments.

The tool:

  • Validates version compatibility
  • Executes modify-replication-group
  • Streams progress logs
  • Displays status in real-time

2. Blue/Green Restore Strategy (Production)

For production clusters, we rarely do in-place upgrades.
Instead:

  • Take snapshot
  • Restore snapshot to new cluster
  • Validate application connectivity
  • Switch endpoint
  • Keep old cluster temporarily This approach gives near zero downtime and easy rollback. The dashboard guides this flow step by step.

Feature 4: Snapshot Restore

One common misconception is:

“Can we restore over the same cluster?”

No.

On Amazon Web Services, restoring a snapshot always creates a new replication group.

The dashboard enforces unique naming and prevents accidental overwrite attempts.

It also validates:

  • Node type compatibility
  • Engine version match
  • Memory requirements

Feature 5: Architecture Behind the Scenes

Under the hood, the system is simple by design:
Browser → Flask API → Boto3 → AWS ElastiCache

Final Thoughts

“Cache Me If You Can” wasn’t just a clever title.

It reflects a mindset shift — from reactive infrastructure to controlled, confident operations.

If you're managing Redis on AWS, consider building (or adopting) something similar.

Because in production systems:

Speed matters.
Stability matters more.

Conclusion

“Cache Me If You Can” ultimately highlights that successful Redis upgrades on AWS aren’t about mastering commands — they’re about designing a reliable process. By replacing manual, error-prone steps with a simple, automated dashboard, we transformed upgrades into a predictable, repeatable, and safe operation.

This tool brought structure through guardrails, confidence through enforced snapshots, and flexibility through blue/green deployment strategies — all while reducing downtime and human error.

In the end, the biggest win wasn’t just automation — it was peace of mind. Because in real-world production systems, it’s not just about moving fast — it’s about moving safely, every single time.

Top comments (0)