How caching sessions and temporary AWS credentials in Redis turned our auth layer from a bottleneck into a near-zero-cost lookup
The Moment We Realized Our Auth Was a DDoS on Ourselves
Every authenticated request in our multi-tenant platform did the same dance:
- Validate the user's session
- Check their role mappings (tenant, use case, environment)
- Call AWS STS to assume the right IAM role
- Return temporary credentials so downstream services could talk to S3, DynamoDB, Bedrock, etc.
Steps 1β3 hit the network. Every. Single. Time.
At modest traffic, it was fine. At scale, we were essentially DDoS-ing our own identity layerβSTS throttling kicked in, latency spiked, and users saw login spinners that never stopped spinning.
The fix wasn't a new auth framework. It was Redis.
TL;DR (If You Skim, Skim This)
- Problem: Per-request STS calls + stateless session validation = slow logins + rate limiting at scale.
- Move: Cache session data and STS credentials in Redis with structured keys and smart TTLs.
- Result: Sub-millisecond session lookups, ~90% fewer STS API calls, and a warm credential cache that makes subsequent requests feel instant.
- Tradeoff: You need a cache invalidation strategy and must handle Redis failures gracefully.
Why This Pattern Is Having a Moment
Three trends are colliding right now:
-
Multi-tenant platforms are everywhere. Each tenant has its own IAM boundary, its own roles, its own credential scope. That's a lot of
AssumeRolecalls. -
STS has hard rate limits. AWS throttles
AssumeRoleat ~500 requests/second per account. Hit that in production and you'll learn the meaning ofAccessDeniedthe hard way. - Users expect instant auth. Nobody waits 2 seconds for a login to "warm up." If the first click feels slow, trust evaporates.
Redis sits at the intersection of all three: it's fast enough to feel like memory, persistent enough to survive pod restarts (in clustered mode), and simple enough that the caching logic doesn't become its own microservice.
The Architecture: Two Caches, One Redis
We use Redis for two distinct but related caching concerns:
1. Session Cache (Identity Layer)
When a user logs in (via OIDC), we create a platform session in Redis:
session_data = {
"userId": "jane.doe@example.com",
"roles": [
{
"TenantId": "acme-corp",
"UseCaseId": "doc-search",
"Environment": "prod",
"RoleName": "USE_CASE_DEVELOPER",
},
{
"TenantId": "acme-corp",
"UseCaseId": "chatbot",
"Environment": "dev",
"RoleName": "USE_CASE_OWNER",
},
],
"highest_role": "USE_CASE_OWNER",
"platform_roles": ["USE_CASE_OWNER", "USE_CASE_DEVELOPER"],
"sts": {}, # STS credentials are added lazily
}
Key format: session:<uuid>
TTL: 1 hour (configurable via env)
This replaces the classic "hit the database on every request" pattern. Once stored, every downstream service validates auth by reading from Redisβnot by calling the IdP or querying a user table.
2. STS Credential Cache (AWS Access Layer)
When a user accesses a specific tenant/use-case/environment, we call sts:AssumeRole to get short-lived credentials. These get cached inside the session object:
session_data["sts"]["acme-corp|doc-search|prod|USE_CASE_DEVELOPER"] = {
"AccessKeyId": "ASIA...",
"SecretAccessKey": "wJal...",
"SessionToken": "FwoG...",
"Expiration": "2026-02-28T19:00:00+00:00",
}
Key format (composite): TenantId|UseCaseId|Environment|RoleName
TTL: Derived from credential expiry minus a 5-minute safety buffer
This means the second time a user touches the same tenant/environment, we skip STS entirely.
The Code: Session Storage
Here's the core of how we store a session after successful OIDC login:
import json
import redis
from redis.connection import ConnectionPool
DEFAULT_TTL_SECONDS = 3600 # 1 hour
# Singleton connection pool β one per process
_connection_pool: ConnectionPool | None = None
def get_redis_pool() -> ConnectionPool:
global _connection_pool
if _connection_pool is None:
_connection_pool = ConnectionPool(
host=os.environ.get("REDIS_HOST", "localhost"),
port=int(os.environ.get("REDIS_PORT", "6379")),
db=0,
max_connections=50,
decode_responses=True,
socket_keepalive=True,
socket_connect_timeout=5,
retry_on_timeout=True,
)
return _connection_pool
def get_redis_client() -> redis.Redis:
return redis.Redis(connection_pool=get_redis_pool())
def store_session(
session_id: str,
user_id: str,
roles: list[dict],
highest_role: str | None = None,
platform_roles: list[str] | None = None,
ttl_seconds: int = DEFAULT_TTL_SECONDS,
) -> bool:
try:
client = get_redis_client()
session_data = {
"userId": user_id,
"roles": roles,
"sts": {},
"highest_role": highest_role,
"platform_roles": platform_roles or [],
}
client.setex(
f"session:{session_id}",
ttl_seconds,
json.dumps(session_data),
)
return True
except redis.RedisError:
return False
Why setex instead of set + expire? Atomicity. If the process crashes between set and expire, you get a session that never dies. setex is a single atomic operation.
The Code: STS Credential Caching
The real performance win is hereβcaching the output of sts:AssumeRole:
import boto3
from datetime import datetime
sts_client = boto3.client("sts")
EXPIRATION_BUFFER_SEC = 300 # 5 minutes
def get_sts_credentials(
session_id: str,
platform_role: str,
user_email: str,
tenant_id: str,
use_case_id: str,
environment: str,
force_refresh: bool = False,
) -> dict:
# Step 1: Check the cache first
if not force_refresh:
cached = get_credentials_from_session(
session_id, tenant_id, use_case_id,
environment, platform_role,
)
if cached and is_credential_valid(cached):
return cached # π― Cache hit β skip STS entirely
# Step 2: Cache miss β call STS
role_arn = resolve_role_arn(platform_role)
resp = sts_client.assume_role(
RoleArn=role_arn,
RoleSessionName=f"{tenant_id}-{use_case_id}-{environment}"[:64],
DurationSeconds=3600,
)
creds = resp["Credentials"]
credential_data = {
"AccessKeyId": creds["AccessKeyId"],
"SecretAccessKey": creds["SecretAccessKey"],
"SessionToken": creds["SessionToken"],
"Expiration": creds["Expiration"].isoformat(),
}
# Step 3: Cache with smart TTL (expire before AWS does)
expiration = datetime.fromisoformat(credential_data["Expiration"])
ttl = int(
(expiration - datetime.now(expiration.tzinfo)).total_seconds()
) - EXPIRATION_BUFFER_SEC
if ttl > 0:
store_credentials_in_session(
session_id, tenant_id, use_case_id,
environment, platform_role, credential_data, ttl,
)
return credential_data
The EXPIRATION_BUFFER_SEC = 300 is critical. STS credentials expire at a hard boundary. If you serve a credential that's 10 seconds from death, the downstream AWS call will fail with a confusing ExpiredTokenException. The 5-minute buffer ensures we always refresh before the cliff.
Credential Validity Check
A clean helper that prevents serving stale credentials:
def is_credential_valid(credentials: dict) -> bool:
expiration_str = credentials.get("Expiration")
if not expiration_str:
return False
expiration = datetime.fromisoformat(
expiration_str.replace("Z", "+00:00")
)
now = datetime.now(expiration.tzinfo)
buffer_seconds = 300
return (expiration - now).total_seconds() > buffer_seconds
If the credential is within 5 minutes of expiring, we treat it as expired. Simple, defensive, saves you from debugging ExpiredTokenException at 3 AM.
Session Validation: The Hot Path
Every authenticated API request runs through this:
def validate_session_and_role(
session_id: str,
tenant_id: str | None = None,
use_case_id: str | None = None,
environment: str | None = None,
) -> dict:
# Single Redis GET β sub-millisecond
session_data = get_session(session_id)
if not session_data:
raise ValueError("Session not found or expired")
user_email = session_data.get("userId")
roles = session_data.get("roles", [])
result = {
"valid": True,
"user_email": user_email,
"all_roles": roles,
"highest_role": derive_highest_role(roles),
}
# Optional: validate specific tenant/use-case access
if tenant_id and use_case_id and environment:
matching_role = find_role_for_context(
roles, tenant_id, use_case_id, environment
)
if not matching_role:
raise ValueError(
f"No access to {tenant_id}/{use_case_id}/{environment}"
)
result["role"] = matching_role
return result
This is the difference between "every request takes 200ms to validate" and "every request takes <1ms to validate." The session is already in Redis. The role lookup is a JSON parse + list scan. Done.
The Login Flow: Putting It Together
Browser
β
β GET /auth/userinfo
βΌ
ALB (OIDC authenticate)
β
β verified user β forwarded with OIDC headers
βΌ
Backend Login Handler
β
ββ 1. Decode & verify OIDC token (claims extraction)
ββ 2. Map IdP groups β platform roles (7-role hierarchy)
ββ 3. Build entitlements (tenant β use_case β env β role)
ββ 4. Store session in Redis (session:<uuid>)
ββ 5. Return session_id + tenants to frontend
β
βΌ
Frontend stores session_id
β
β Subsequent API calls include X-Session-Id header
βΌ
Any Backend Service
β
ββ Validate session from Redis (sub-ms)
ββ Check role mapping for requested resource
ββ If STS credentials needed:
ββ Check Redis cache first (sub-ms)
ββ Call STS only on cache miss (~200ms)
The first login is the "expensive" one (~500ms total including STS). Every subsequent request benefits from the cache.
Connection Pooling: Don't Skip This
A surprisingly common mistake: creating a new Redis connection per request.
# β Don't do this
def get_session(session_id):
client = redis.Redis(host="localhost", port=6379) # new connection!
return client.get(f"session:{session_id}")
# β
Do this β reuse a connection pool
_pool = ConnectionPool(host="localhost", port=6379, max_connections=50)
def get_session(session_id):
client = redis.Redis(connection_pool=_pool)
return client.get(f"session:{session_id}")
Each TCP connection to Redis costs ~1ms to establish. At 1,000 req/s, that's 1 full second of CPU time per second just on handshakes. Connection pooling makes this a non-issue.
Observability: Know Your Hit Ratio
We track cache operations with Prometheus counters:
from prometheus_client import Counter, Gauge
cache_operations_total = Counter(
"cache_operations_total",
"Total cache operations",
["tenant_id", "service", "operation", "status"],
)
cache_hit_ratio = Gauge(
"cache_hit_ratio",
"Rolling cache hit ratio",
["tenant_id", "service"],
)
Labels like operation=get_creds and status=hit|miss|expired|error let you build dashboards that answer:
- What's our STS cache hit ratio? (target: >85%)
- Which tenants have the most cache misses? (may indicate config drift)
- Are we seeing Redis errors? (time to check cluster health)
If your hit ratio drops below 80%, something is wrongβeither TTLs are too short, sessions are thrashing, or your Redis instance is under memory pressure.
TLS + Secrets Manager: Production Hardening
In production, Redis connections should be encrypted and passwords should never live in env vars:
def _load_password_from_secrets_manager(secret_arn: str) -> str | None:
"""Load Redis auth token from AWS Secrets Manager."""
sm = boto3.client("secretsmanager")
resp = sm.get_secret_value(SecretId=secret_arn)
secret = resp.get("SecretString", "")
# Support both plain strings and JSON secrets
if secret.strip().startswith("{"):
obj = json.loads(secret)
for key in ("password", "authToken", "token"):
if key in obj:
return obj[key]
return secret.strip()
We also cache the fetched secret in-processβno need to call Secrets Manager on every pool initialization. And we configure TLS via the SSLConnection class from the Redis Python client:
from redis.connection import SSLConnection
pool_kwargs["connection_class"] = SSLConnection
This gives you in-transit encryption for ElastiCache, which is a compliance checkbox you'd rather check early.
Gotchas (A.K.A. What Bit Us So It Doesn't Bite You)
1. Stale Credentials After Role Changes
If a user's role changes (e.g., promoted from USE_CASE_DEVELOPER to USE_CASE_OWNER), the cached session still has the old role mappings. Our fix: invalidate the session on role change and force a re-login.
def invalidate_session(session_id: str) -> bool:
client = get_redis_client()
return client.delete(f"session:{session_id}") > 0
2. Redis Goes Down β What Then?
Redis is fast, but it's not invincible. If the Redis cluster is unreachable:
- Session validation should fail-closed (reject the request, don't silently allow it)
- Log aggressively so ops teams see the outage
- Never fall back to "allow all" β that's a security vulnerability disguised as fault tolerance
3. Session Key Collisions
Using predictable keys (like session:<user_email>) opens the door to session hijacking. Use session:<uuid4> β the session ID should be unguessable.
4. Memory Pressure in Multi-Tenant Environments
Each session stores role mappings for every tenant/use-case the user can access. A platform admin with access to 50 tenants has a bigger session object than a single-tenant end user. Monitor Redis memory usage and set maxmemory-policy to volatile-lru so expired keys get evicted first.
5. Binding Token Replay Attacks
If your auth flow uses one-time binding tokens (e.g., for device code flows), mark them as consumed in Redis with a short TTL:
def mark_binding_token_consumed(token: str, ttl: int = 900) -> bool:
key = f"binding_token:consumed:{token[:16]}"
get_redis_client().setex(key, ttl, "1")
return True
def is_binding_token_consumed(token: str) -> bool:
key = f"binding_token:consumed:{token[:16]}"
return bool(get_redis_client().exists(key))
When You Should Not Use This Pattern
- Single-user apps β if you have 10 users, the extra Redis infrastructure isn't worth it. A signed JWT with short expiry is simpler.
- Stateless-only architectures β if your design principle is "no server-side state," Redis sessions are a philosophical violation. (But also: stateless auth at scale has its own costs.)
- No AWS roles to assume β if you're not using STS, the credential caching half of this pattern doesn't apply. The session caching half still might.
A Practical Implementation Checklist
- [ ] Deploy Redis (ElastiCache Serverless or self-managed cluster with replication)
- [ ] Enable TLS in-transit (
SSLConnection) - [ ] Store Redis password in Secrets Manager, not env vars
- [ ] Use connection pooling (
ConnectionPoolwithmax_connections) - [ ] Set session TTL to match your security requirements (we use 1 hour)
- [ ] Add 5-minute expiration buffer on STS credential cache
- [ ] Implement
health_check()β ping Redis on startup and expose/health - [ ] Add Prometheus metrics for cache hit/miss/error rates
- [ ] Set
maxmemory-policytovolatile-lruon the Redis instance - [ ] Document your invalidation strategy (when do cached sessions get killed?)
- [ ] Test Redis-down scenarios (your app should fail-closed, not fail-open)
- [ ] Load SSM parameters at startup, not import time (env vars must be populated first)
The Numbers
Before Redis caching:
- Login: ~800ms (OIDC + STS + DB lookups)
- Subsequent API auth: ~200ms per request (session re-validation + STS)
- STS calls: 1 per authenticated request
After Redis caching:
- Login: ~500ms (OIDC + STS + Redis write β the STS is cached for next time)
- Subsequent API auth: <1ms (Redis GET + JSON parse)
- STS calls: 1 per unique tenant/role/env combination per session lifetime
At 10,000 authenticated requests per hour, that's the difference between 10,000 STS calls and ~50. Your AWS bill notices. Your users notice. Your on-call rotation notices.
Closing: The Fastest Auth Call Is the One You Don't Make
Redis isn't just a cache layer for your database queries. It's the foundation of a fast, secure auth perimeter.
The session cache eliminates per-request identity lookups. The STS credential cache eliminates per-request IAM calls. Together, they turn your auth layer from a distributed systems problem into a local memory read.
And when security is fast, developers stop looking for shortcuts around it.
What's your strategy for caching short-lived AWS credentials? Do you cache at the application layer, use credential providers, or something else entirely? Drop a comment β I'm curious what patterns are working for others.
Resources
- AWS Docs: STS AssumeRole β rate limits and best practices
- Redis: Connection Pooling in the Python client
- AWS ElastiCache: In-transit encryption
- Prometheus: Client instrumentation for Python
About the Author
Suraj Khaitan β Gen AI Architect | Building scalable platforms and secure cloud-native systems
Connect on LinkedIn | Follow for more engineering and architecture write-ups
Top comments (1)
The 5-minute safety buffer on credential TTL is smart. We hit a race condition without one β request starts with 30s left on the token, STS call takes 2s, credential expires mid-flight.