JWT Lifecycle vs. Secret Rotation: Which is More Secure?

#jwt #security #apisecurity #secretmanagement

ℹ️ What You Will Learn in This Post

I will delve into the two main pillars of the JWT security model: token lifecycle management and secret key rotation strategies. I'll explain, based on my own experiences and concrete examples, how these two approaches should be used together in real-world scenarios.

Introduction: Fundamentals of JWT Security

When API security is discussed, JWT (JSON Web Token) is almost always one of the first solutions on the table. I, too, have relied on JWTs in many projects, especially when building microservice architectures or a stateless authentication structure. However, simply using JWTs isn't enough; ensuring their security is only possible with proper lifecycle management and secret key rotation.

Just taking and using a token means missing the bigger picture. The truth is, how a JWT is created, how long it's valid, and how the secret key that signs it is managed are vital for your overall system security. In this post, I will explain, based on my own experiences, what these two critical components – the JWT's lifecycle and the rotation of its signing secret key – mean and how they should be implemented.

JWT Lifecycle Management: Why Short-Lived Tokens?

The primary purpose of JWTs is to exchange information securely between client and server without maintaining state. Once signed, these tokens cannot be changed or revoked (at least not with a standard mechanism) until their expiration. While this feature makes them very useful, it also carries a significant security risk: if a JWT is compromised, it can be used by malicious actors for its entire validity period.

For this reason, my philosophy has always been to use short-lived access tokens. I typically keep an access_token's lifespan between 15 minutes and 1 hour. Alongside this, I use a longer-lived refresh_token to maintain a continuous session without disrupting the user experience. For example, when designing operator screens in a production ERP, we sometimes extended refresh_tokens up to 7 days to avoid constant token renewal requests, but this always requires careful balancing.

Token Revocation and Blacklisting Mechanisms

While short-lived access_tokens minimize damage in case of token theft, they are not "forever" secure. Sometimes, I might need to urgently terminate a user's session or revoke a token before its expiration. This is where blacklisting comes into play.

In my approaches, I blacklist access_tokens by storing them in a fast, distributed cache system like Redis. When a user logs out or suspicious activity is detected, I add the relevant access_token to Redis and mark it with a TTL (Time-To-Live) equal to its validity period. With every incoming request, I first check if the token is in Redis. If the token is blacklisted, the request is rejected.

# Example of a JWT blacklist check in a FastAPI application
from redis import Redis
from datetime import timedelta

# Redis connection
redis_client = Redis(host='localhost', port=6379, db=0)

def blacklist_token(token: str, expires_delta: timedelta):
    """Blacklists the token in Redis."""
    redis_client.setex(f"blacklist:{token}", expires_delta.total_seconds(), "1")

def is_token_blacklisted(token: str) -> bool:
    """Checks if the token is blacklisted."""
    return redis_client.exists(f"blacklist:{token}") == 1

# Usage example
# access_token_expires = timedelta(minutes=15)
# blacklist_token(my_token, access_token_expires)
# if is_token_blacklisted(my_token):
#     raise HTTPException(status_code=401, detail="Token revoked")

This approach, of course, has a cost. An additional Redis check on every request adds a slight load to performance. Furthermore, in distributed systems, the high availability and performance of the Redis cluster itself become critical. Once, because I didn't configure Redis OOM eviction policy settings correctly, our blacklist service couldn't correctly store tokens under sudden load, and some revoked tokens became valid again for a short period. Such edge cases demonstrate how detailed system management needs to be.

JWT Secret Key Rotation: How Often Do You Change Your Key?

The security of JWTs largely depends on the secrecy of the secret key (or private key) with which they are signed. If this secret key is leaked, a malicious actor can sign any JWT and bypass authorization in your system as they wish. This is precisely why secret key rotation, i.e., regularly changing the key, is an indispensable security practice.

In my projects, I typically change secret keys every 30 to 90 days. This rotation process is managed through automated scripts or a CI/CD pipeline. However, this is just a number; the actual frequency varies depending on the system's sensitivity, potential attack vectors, and legal requirements. For instance, I opted for more frequent rotation in the backend of one of my financial calculator side products.

Strategies for Seamless Key Transition

While secret key rotation sounds simple, doing it without interruption in a running system is challenging. When you switch to a new secret, tokens signed with the old secret might still be valid, and instantly invalidating them would negatively impact user experience. To solve this problem, I usually use "key rollover" strategies.

In this strategy, multiple secret keys are kept active simultaneously. New tokens are signed with the newest secret, while incoming tokens can be validated with both new and old secrets. This ensures a smooth transition until the old tokens expire.

# A simple example: Managing multiple secret keys
# In a real system, these keys would be retrieved from a Key Management System (KMS) or
# a secure vault.

ACTIVE_SECRETS = [
    "super-secret-key-current",
    "super-secret-key-old-1",
    "super-secret-key-old-2"
]

def verify_jwt_with_multiple_keys(token: str) -> dict:
    for secret in ACTIVE_SECRETS:
        try:
            payload = jwt.decode(token, secret, algorithms=["HS256"])
            return payload
        except jwt.ExpiredSignatureError:
            raise HTTPException(status_code=401, detail="Token expired")
        except jwt.InvalidTokenError:
            continue # Could not verify with this secret, try another
    raise HTTPException(status_code=401, detail="Invalid token signature")

If asymmetric encryption (RSA, ECDSA) is used, this process can be managed more elegantly via JWKS (JSON Web Key Set) endpoints. The server publishes its public keys at a JWKS endpoint, and clients or other services validate tokens using these public keys. When a new key set is introduced, the JWKS endpoint is updated, and old keys continue to be published for a while. This is a structure I often see in protocols like OpenID Connect. I can delve into this topic in more detail in my [related: SSO integration with OpenID Connect] post.

Comparison from a Security Perspective: Who Complements Whom?

JWT lifecycle management and secret key rotation are two different mechanisms that address different security risks. Instead of comparing them, it's a much more accurate approach to view them as complementary elements.

JWT Lifecycle Management (Short-Lived Tokens): This approach limits the damage caused by a compromised token. If an access_token is stolen, its short lifespan narrows the window of time during which a malicious actor can use it. For example, a 15-minute token carries much less risk than a 24-hour token if stolen. Token revocation (blacklisting) further provides the ability to stop this damage even earlier.
Secret Key Rotation: This approach, on the other hand, limits the damage that would occur if the signing key itself were compromised. If your secret key is leaked, an attacker can generate valid tokens at will. Regular rotation restricts the validity period of a leaked key and forces the attacker to constantly obtain new keys, which increases the cost of the attack. While working on an internal banking platform, I experienced how critical such key rotations are and how important it is to automate them without manual intervention.

In short, short-lived tokens are a defense against token theft, while secret rotation is a defense against the theft of the signing key. In an ideal JWT security architecture, both should work together in harmony. Neglecting one significantly weakens the security provided by the other.

Integration and Challenges in My Application Architecture

In my experience, using both short-lived tokens and secret rotation together has always been a balancing act. For example, when developing an operator screen based on FastAPI and Vue.js for a production ERP, I set access_tokens to 30 minutes and refresh_tokens to 3 days. refresh_tokens were stored in a database (PostgreSQL) and replaced with a new refresh_token on each use, with the old token being revoked. This ensures that even if a refresh_token is stolen, it is single-use.

On the secret rotation side, I retrieved the keys from a Key Management System (KMS) or as an environment variable from my systemd units. In a client project, we used systemd timers to automate key rotation. A script running every 90 days would generate a new key, save it to the KMS, and restart the relevant service.

# systemd timer example (jwt-key-rotation.timer)
[Unit]
Description=Run JWT Key Rotation every 90 days

[Timer]
OnCalendar=*-*-01 03:00:00 # Runs on the 1st of every month at 03:00
Persistent=true

[Install]
WantedBy=timers.target

# systemd service example (jwt-key-rotation.service)
[Unit]
Description=Rotate JWT Signing Key
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/rotate_jwt_key.sh
User=jwt_rotator
Group=jwt_rotator
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

This rotate_jwt_key.sh script would generate a new secret, securely save it, and reload my Nginx reverse proxy or FastAPI services. However, caution is needed here. Last month, in a similar script, I wrote sleep 360 to wait for the new key to be distributed, only to find the script was OOM-killed after exceeding cgroup memory.high limits. I then switched to a polling-wait mechanism, meaning the script would wait in a loop until it verified that the service had indeed restarted and was using the new key. Such small errors can lead to unexpected outages in large systems.

Securely distributing secrets in distributed systems is also a separate challenge. In Docker Compose-based deployments, it's safer to use tools like Docker secrets or Vault instead of passing secrets as environment variables. For my own side product's backend, for my Docker Compose services running on a VPS, I adopted a kind of "bare-metal + container hybrid" deployment model by encrypting environment files and keeping them accessible only to authorized users. While not as robust as a fully automated KMS, this offers a practical solution for small-scale projects.

Conclusion: A Balanced Approach is Essential

Not just using your JWTs, but managing them correctly and securely, is a critical skill in modern application architectures. Based on my own experiences, I can clearly state that both JWT lifecycle management and secret key rotation are cornerstones of robust API security. One mitigates risks associated with token theft, while the other minimizes risks associated with the signing key being leaked.

Implementing these two strategies together makes your system more resilient against attacks from both inside and outside. Always use short-lived access_tokens, support them with blacklisting mechanisms when necessary, and don't forget to regularly rotate your secret keys. Automating these processes will reduce operational overhead and decrease the likelihood of errors, especially in large-scale and critical systems. Remember, security is not a one-time task but an ongoing process that requires continuous improvement. In my next post, I will discuss [related: event-sourcing implementations in distributed systems].