Suraj Khaitan

Posted on Feb 28

🚀 Stop Calling STS on Every Request: Redis Caching Patterns That Cut Login Latency by 10x

#aws #python #redis #ai

How caching sessions and temporary AWS credentials in Redis turned our auth layer from a bottleneck into a near-zero-cost lookup

The Moment We Realized Our Auth Was a DDoS on Ourselves

Every authenticated request in our multi-tenant platform did the same dance:

Validate the user's session
Check their role mappings (tenant, use case, environment)
Call AWS STS to assume the right IAM role
Return temporary credentials so downstream services could talk to S3, DynamoDB, Bedrock, etc.

Steps 1–3 hit the network. Every. Single. Time.

At modest traffic, it was fine. At scale, we were essentially DDoS-ing our own identity layer—STS throttling kicked in, latency spiked, and users saw login spinners that never stopped spinning.

The fix wasn't a new auth framework. It was Redis.

TL;DR (If You Skim, Skim This)

Problem: Per-request STS calls + stateless session validation = slow logins + rate limiting at scale.
Move: Cache session data and STS credentials in Redis with structured keys and smart TTLs.
Result: Sub-millisecond session lookups, ~90% fewer STS API calls, and a warm credential cache that makes subsequent requests feel instant.
Tradeoff: You need a cache invalidation strategy and must handle Redis failures gracefully.

Why This Pattern Is Having a Moment

Three trends are colliding right now:

Multi-tenant platforms are everywhere. Each tenant has its own IAM boundary, its own roles, its own credential scope. That's a lot of AssumeRole calls.
STS has hard rate limits. AWS throttles AssumeRole at ~500 requests/second per account. Hit that in production and you'll learn the meaning of AccessDenied the hard way.
Users expect instant auth. Nobody waits 2 seconds for a login to "warm up." If the first click feels slow, trust evaporates.

Redis sits at the intersection of all three: it's fast enough to feel like memory, persistent enough to survive pod restarts (in clustered mode), and simple enough that the caching logic doesn't become its own microservice.

The Architecture: Two Caches, One Redis

We use Redis for two distinct but related caching concerns:

1. Session Cache (Identity Layer)

When a user logs in (via OIDC), we create a platform session in Redis:

session_data = {
    "userId": "jane.doe@example.com",
    "roles": [
        {
            "TenantId": "acme-corp",
            "UseCaseId": "doc-search",
            "Environment": "prod",
            "RoleName": "USE_CASE_DEVELOPER",
        },
        {
            "TenantId": "acme-corp",
            "UseCaseId": "chatbot",
            "Environment": "dev",
            "RoleName": "USE_CASE_OWNER",
        },
    ],
    "highest_role": "USE_CASE_OWNER",
    "platform_roles": ["USE_CASE_OWNER", "USE_CASE_DEVELOPER"],
    "sts": {},  # STS credentials are added lazily
}

Key format: session:<uuid>
TTL: 1 hour (configurable via env)

This replaces the classic "hit the database on every request" pattern. Once stored, every downstream service validates auth by reading from Redis—not by calling the IdP or querying a user table.

2. STS Credential Cache (AWS Access Layer)

When a user accesses a specific tenant/use-case/environment, we call sts:AssumeRole to get short-lived credentials. These get cached inside the session object:

session_data["sts"]["acme-corp|doc-search|prod|USE_CASE_DEVELOPER"] = {
    "AccessKeyId": "ASIA...",
    "SecretAccessKey": "wJal...",
    "SessionToken": "FwoG...",
    "Expiration": "2026-02-28T19:00:00+00:00",
}

Key format (composite): TenantId|UseCaseId|Environment|RoleName
TTL: Derived from credential expiry minus a 5-minute safety buffer

This means the second time a user touches the same tenant/environment, we skip STS entirely.

The Code: Session Storage

Here's the core of how we store a session after successful OIDC login:

import json
import redis
from redis.connection import ConnectionPool

DEFAULT_TTL_SECONDS = 3600  # 1 hour

# Singleton connection pool — one per process
_connection_pool: ConnectionPool | None = None


def get_redis_pool() -> ConnectionPool:
    global _connection_pool
    if _connection_pool is None:
        _connection_pool = ConnectionPool(
            host=os.environ.get("REDIS_HOST", "localhost"),
            port=int(os.environ.get("REDIS_PORT", "6379")),
            db=0,
            max_connections=50,
            decode_responses=True,
            socket_keepalive=True,
            socket_connect_timeout=5,
            retry_on_timeout=True,
        )
    return _connection_pool


def get_redis_client() -> redis.Redis:
    return redis.Redis(connection_pool=get_redis_pool())


def store_session(
    session_id: str,
    user_id: str,
    roles: list[dict],
    highest_role: str | None = None,
    platform_roles: list[str] | None = None,
    ttl_seconds: int = DEFAULT_TTL_SECONDS,
) -> bool:
    try:
        client = get_redis_client()
        session_data = {
            "userId": user_id,
            "roles": roles,
            "sts": {},
            "highest_role": highest_role,
            "platform_roles": platform_roles or [],
        }
        client.setex(
            f"session:{session_id}",
            ttl_seconds,
            json.dumps(session_data),
        )
        return True
    except redis.RedisError:
        return False

Why setex instead of set + expire? Atomicity. If the process crashes between set and expire, you get a session that never dies. setex is a single atomic operation.

The Code: STS Credential Caching

The real performance win is here—caching the output of sts:AssumeRole:

import boto3
from datetime import datetime

sts_client = boto3.client("sts")
EXPIRATION_BUFFER_SEC = 300  # 5 minutes


def get_sts_credentials(
    session_id: str,
    platform_role: str,
    user_email: str,
    tenant_id: str,
    use_case_id: str,
    environment: str,
    force_refresh: bool = False,
) -> dict:
    # Step 1: Check the cache first
    if not force_refresh:
        cached = get_credentials_from_session(
            session_id, tenant_id, use_case_id,
            environment, platform_role,
        )
        if cached and is_credential_valid(cached):
            return cached  # 🎯 Cache hit — skip STS entirely

    # Step 2: Cache miss — call STS
    role_arn = resolve_role_arn(platform_role)
    resp = sts_client.assume_role(
        RoleArn=role_arn,
        RoleSessionName=f"{tenant_id}-{use_case_id}-{environment}"[:64],
        DurationSeconds=3600,
    )

    creds = resp["Credentials"]
    credential_data = {
        "AccessKeyId": creds["AccessKeyId"],
        "SecretAccessKey": creds["SecretAccessKey"],
        "SessionToken": creds["SessionToken"],
        "Expiration": creds["Expiration"].isoformat(),
    }

    # Step 3: Cache with smart TTL (expire before AWS does)
    expiration = datetime.fromisoformat(credential_data["Expiration"])
    ttl = int(
        (expiration - datetime.now(expiration.tzinfo)).total_seconds()
    ) - EXPIRATION_BUFFER_SEC

    if ttl > 0:
        store_credentials_in_session(
            session_id, tenant_id, use_case_id,
            environment, platform_role, credential_data, ttl,
        )

    return credential_data

The EXPIRATION_BUFFER_SEC = 300 is critical. STS credentials expire at a hard boundary. If you serve a credential that's 10 seconds from death, the downstream AWS call will fail with a confusing ExpiredTokenException. The 5-minute buffer ensures we always refresh before the cliff.

Credential Validity Check

A clean helper that prevents serving stale credentials:

def is_credential_valid(credentials: dict) -> bool:
    expiration_str = credentials.get("Expiration")
    if not expiration_str:
        return False

    expiration = datetime.fromisoformat(
        expiration_str.replace("Z", "+00:00")
    )
    now = datetime.now(expiration.tzinfo)

    buffer_seconds = 300
    return (expiration - now).total_seconds() > buffer_seconds

If the credential is within 5 minutes of expiring, we treat it as expired. Simple, defensive, saves you from debugging ExpiredTokenException at 3 AM.

Session Validation: The Hot Path

Every authenticated API request runs through this:

def validate_session_and_role(
    session_id: str,
    tenant_id: str | None = None,
    use_case_id: str | None = None,
    environment: str | None = None,
) -> dict:
    # Single Redis GET — sub-millisecond
    session_data = get_session(session_id)
    if not session_data:
        raise ValueError("Session not found or expired")

    user_email = session_data.get("userId")
    roles = session_data.get("roles", [])

    result = {
        "valid": True,
        "user_email": user_email,
        "all_roles": roles,
        "highest_role": derive_highest_role(roles),
    }

    # Optional: validate specific tenant/use-case access
    if tenant_id and use_case_id and environment:
        matching_role = find_role_for_context(
            roles, tenant_id, use_case_id, environment
        )
        if not matching_role:
            raise ValueError(
                f"No access to {tenant_id}/{use_case_id}/{environment}"
            )
        result["role"] = matching_role

    return result

This is the difference between "every request takes 200ms to validate" and "every request takes <1ms to validate." The session is already in Redis. The role lookup is a JSON parse + list scan. Done.

The Login Flow: Putting It Together

Browser
  │
  │  GET /auth/userinfo
  ▼
ALB (OIDC authenticate)
  │
  │  verified user → forwarded with OIDC headers
  ▼
Backend Login Handler
  │
  ├─ 1. Decode & verify OIDC token (claims extraction)
  ├─ 2. Map IdP groups → platform roles (7-role hierarchy)
  ├─ 3. Build entitlements (tenant → use_case → env → role)
  ├─ 4. Store session in Redis (session:<uuid>)
  ├─ 5. Return session_id + tenants to frontend
  │
  ▼
Frontend stores session_id
  │
  │  Subsequent API calls include X-Session-Id header
  ▼
Any Backend Service
  │
  ├─ Validate session from Redis (sub-ms)
  ├─ Check role mapping for requested resource
  └─ If STS credentials needed:
       ├─ Check Redis cache first (sub-ms)
       └─ Call STS only on cache miss (~200ms)

The first login is the "expensive" one (~500ms total including STS). Every subsequent request benefits from the cache.

Connection Pooling: Don't Skip This

A surprisingly common mistake: creating a new Redis connection per request.

# ❌ Don't do this
def get_session(session_id):
    client = redis.Redis(host="localhost", port=6379)  # new connection!
    return client.get(f"session:{session_id}")

# ✅ Do this — reuse a connection pool
_pool = ConnectionPool(host="localhost", port=6379, max_connections=50)

def get_session(session_id):
    client = redis.Redis(connection_pool=_pool)
    return client.get(f"session:{session_id}")

Each TCP connection to Redis costs ~1ms to establish. At 1,000 req/s, that's 1 full second of CPU time per second just on handshakes. Connection pooling makes this a non-issue.

Observability: Know Your Hit Ratio

We track cache operations with Prometheus counters:

from prometheus_client import Counter, Gauge

cache_operations_total = Counter(
    "cache_operations_total",
    "Total cache operations",
    ["tenant_id", "service", "operation", "status"],
)

cache_hit_ratio = Gauge(
    "cache_hit_ratio",
    "Rolling cache hit ratio",
    ["tenant_id", "service"],
)

Labels like operation=get_creds and status=hit|miss|expired|error let you build dashboards that answer:

What's our STS cache hit ratio? (target: >85%)
Which tenants have the most cache misses? (may indicate config drift)
Are we seeing Redis errors? (time to check cluster health)

If your hit ratio drops below 80%, something is wrong—either TTLs are too short, sessions are thrashing, or your Redis instance is under memory pressure.

TLS + Secrets Manager: Production Hardening

In production, Redis connections should be encrypted and passwords should never live in env vars:

def _load_password_from_secrets_manager(secret_arn: str) -> str | None:
    """Load Redis auth token from AWS Secrets Manager."""
    sm = boto3.client("secretsmanager")
    resp = sm.get_secret_value(SecretId=secret_arn)
    secret = resp.get("SecretString", "")

    # Support both plain strings and JSON secrets
    if secret.strip().startswith("{"):
        obj = json.loads(secret)
        for key in ("password", "authToken", "token"):
            if key in obj:
                return obj[key]

    return secret.strip()

We also cache the fetched secret in-process—no need to call Secrets Manager on every pool initialization. And we configure TLS via the SSLConnection class from the Redis Python client:

from redis.connection import SSLConnection

pool_kwargs["connection_class"] = SSLConnection

This gives you in-transit encryption for ElastiCache, which is a compliance checkbox you'd rather check early.

Gotchas (A.K.A. What Bit Us So It Doesn't Bite You)

1. Stale Credentials After Role Changes

If a user's role changes (e.g., promoted from USE_CASE_DEVELOPER to USE_CASE_OWNER), the cached session still has the old role mappings. Our fix: invalidate the session on role change and force a re-login.

def invalidate_session(session_id: str) -> bool:
    client = get_redis_client()
    return client.delete(f"session:{session_id}") > 0

2. Redis Goes Down — What Then?

Redis is fast, but it's not invincible. If the Redis cluster is unreachable:

Session validation should fail-closed (reject the request, don't silently allow it)
Log aggressively so ops teams see the outage
Never fall back to "allow all" — that's a security vulnerability disguised as fault tolerance

3. Session Key Collisions

Using predictable keys (like session:<user_email>) opens the door to session hijacking. Use session:<uuid4> — the session ID should be unguessable.

4. Memory Pressure in Multi-Tenant Environments

Each session stores role mappings for every tenant/use-case the user can access. A platform admin with access to 50 tenants has a bigger session object than a single-tenant end user. Monitor Redis memory usage and set maxmemory-policy to volatile-lru so expired keys get evicted first.

5. Binding Token Replay Attacks

If your auth flow uses one-time binding tokens (e.g., for device code flows), mark them as consumed in Redis with a short TTL:

def mark_binding_token_consumed(token: str, ttl: int = 900) -> bool:
    key = f"binding_token:consumed:{token[:16]}"
    get_redis_client().setex(key, ttl, "1")
    return True

def is_binding_token_consumed(token: str) -> bool:
    key = f"binding_token:consumed:{token[:16]}"
    return bool(get_redis_client().exists(key))

When You Should Not Use This Pattern

Single-user apps — if you have 10 users, the extra Redis infrastructure isn't worth it. A signed JWT with short expiry is simpler.
Stateless-only architectures — if your design principle is "no server-side state," Redis sessions are a philosophical violation. (But also: stateless auth at scale has its own costs.)
No AWS roles to assume — if you're not using STS, the credential caching half of this pattern doesn't apply. The session caching half still might.

A Practical Implementation Checklist

[ ] Deploy Redis (ElastiCache Serverless or self-managed cluster with replication)
[ ] Enable TLS in-transit (SSLConnection)
[ ] Store Redis password in Secrets Manager, not env vars
[ ] Use connection pooling (ConnectionPool with max_connections)
[ ] Set session TTL to match your security requirements (we use 1 hour)
[ ] Add 5-minute expiration buffer on STS credential cache
[ ] Implement health_check() — ping Redis on startup and expose /health
[ ] Add Prometheus metrics for cache hit/miss/error rates
[ ] Set maxmemory-policy to volatile-lru on the Redis instance
[ ] Document your invalidation strategy (when do cached sessions get killed?)
[ ] Test Redis-down scenarios (your app should fail-closed, not fail-open)
[ ] Load SSM parameters at startup, not import time (env vars must be populated first)

The Numbers

Before Redis caching:

Login: ~800ms (OIDC + STS + DB lookups)
Subsequent API auth: ~200ms per request (session re-validation + STS)
STS calls: 1 per authenticated request

After Redis caching:

Login: ~500ms (OIDC + STS + Redis write — the STS is cached for next time)
Subsequent API auth: <1ms (Redis GET + JSON parse)
STS calls: 1 per unique tenant/role/env combination per session lifetime

At 10,000 authenticated requests per hour, that's the difference between 10,000 STS calls and ~50. Your AWS bill notices. Your users notice. Your on-call rotation notices.

Closing: The Fastest Auth Call Is the One You Don't Make

Redis isn't just a cache layer for your database queries. It's the foundation of a fast, secure auth perimeter.

The session cache eliminates per-request identity lookups. The STS credential cache eliminates per-request IAM calls. Together, they turn your auth layer from a distributed systems problem into a local memory read.

And when security is fast, developers stop looking for shortcuts around it.

What's your strategy for caching short-lived AWS credentials? Do you cache at the application layer, use credential providers, or something else entirely? Drop a comment — I'm curious what patterns are working for others.

Resources

AWS Docs: STS AssumeRole — rate limits and best practices
Redis: Connection Pooling in the Python client
AWS ElastiCache: In-transit encryption
Prometheus: Client instrumentation for Python

About the Author

Suraj Khaitan — Gen AI Architect | Building scalable platforms and secure cloud-native systems

Connect on LinkedIn | Follow for more engineering and architecture write-ups

Top comments (1)

klement Gunndu • Mar 1

The 5-minute safety buffer on credential TTL is smart. We hit a race condition without one — request starts with 30s left on the token, STS call takes 2s, credential expires mid-flight.