Mustafa ERBAY

Posted on May 23 • Originally published at mustafaerbay.com.tr

JWT Refresh and Revocation Mechanisms: The State of Security Practices

#jwt #security #token #authentication

JWT (JSON Web Token) is a compact, self-contained, and verifiable token format that we frequently use for authentication in modern APIs and distributed systems. I've been using JWTs in many of my projects for a while now, and they especially provide great convenience in microservice architectures or when communicating with mobile application backends. However, the "stateless" nature, which is JWT's biggest advantage, also brings certain complexities regarding token refresh and especially revocation mechanisms. If not managed correctly, these complexities can lead to serious security vulnerabilities.

When I first started implementing JWTs for the API used by operator screens in a production ERP system, everything seemed smooth at first. But situations like users being abruptly logged out when tokens expired, or our inability to instantly cut off a user's access in an emergency, showed me how critical these mechanisms are. In this post, I will thoroughly explain JWT refresh and revocation strategies, their impact on security, and which approaches I prefer based on my own experiences.

Fundamentals of JWT and the Challenges of Stateless Architecture

A JWT is essentially a simple string consisting of three parts: header.payload.signature. The header contains the token type and the signing algorithm used. The payload contains claims such as user identity and permissions. The signature is created by signing the header and payload with a secret key, ensuring the token's integrity. Thanks to this structure, there's no need to store any session information on the server side; the token's validity can be cryptographically verified with each request.

This stateless architecture offers significant advantages, especially in terms of horizontal scalability. Instead of each of hundreds of servers managing session states, the ability for each server to simply verify the token greatly simplifies things. However, there's also the other side of the coin: once a token is issued, it's quite difficult to revoke it before its expiration. For example, if a user's password is stolen or their account is compromised, if they have an active JWT, they can continue to access the system until it expires. This leaves us helpless in situations requiring immediate intervention.

⚠️ Security Risk of Stateless Architecture

The stateless nature of JWTs carries significant security risks without a proper revocation strategy. Once a token is issued, it remains valid until it expires, and during this period, it can be used by malicious actors.

So, how do we solve this problem? We typically try to strike a balance by using a pair of access token and refresh token. An access token is usually short-lived (e.g., 15 minutes to 1 hour), while a refresh token has a longer lifespan (e.g., a few days or weeks).

Managing with the Access Token and Refresh Token Model

In my applications, I generally prefer the access token and refresh token model. This model is a widely used method to mitigate the revocation difficulty posed by JWT's stateless nature. The access token is used to directly access API resources and has a short lifespan. This means that even if the token is stolen, the malicious user's access period will be limited. For instance, I typically keep access token lifetimes between 30 minutes and 1 hour.

When an access token expires, the client (mobile app or web frontend) requests a new access token using the refresh token. The refresh token, on the other hand, has a longer lifespan and is used only to obtain new access tokens. This way, the user doesn't have to log in again every 30 minutes, but the short lifespan of the access token minimizes risk in case of a potential leak. In my side product's mobile application, I ensure users don't have to enter their password every half hour using this method.

// Example Access Token Payload
{
  "sub": "1234567890",
  "name": "Mustafa Erbay",
  "iat": 1716384000, // Issued At (2024-05-22 00:00:00 UTC)
  "exp": 1716385800, // Expiration (2024-05-22 00:30:00 UTC) - After 30 minutes
  "aud": "my-api",
  "iss": "auth-service",
  "jti": "a1b2c3d4e5f6g7h8i9j0" // JWT ID
}

In this model, the critical security component is the refresh token. Even if an access token is stolen, it becomes invalid quickly. However, if a refresh token is stolen, an attacker can continuously generate new access tokens and access the system for a long time. Therefore, the security of the refresh token is vital for the overall security of the application.

ℹ️ Importance of Refresh Token

The refresh token is your application's most valuable authentication token. If stolen, an attacker can gain long-term access to the system. Storage and usage strategies should be determined accordingly.

In an ERP project for a manufacturing company, I restricted the refresh token to be used only from specific IP ranges and with specific device fingerprints. This made it difficult for a stolen token to be used in a different environment.

Refresh Token Security and Storage Strategies

Securely storing refresh tokens is one of the most critical steps for system security. Incorrect storage can lead to token theft and unauthorized access. Generally, two main storage areas are discussed: client-side and server-side.

Client-Side Storage:

Local Storage/Session Storage: This method is highly vulnerable to XSS (Cross-Site Scripting) attacks. Malicious JavaScript code running in the browser can easily access these storage areas and steal the refresh token. I made this mistake in an older version of one of my side products; when I detected an XSS vulnerability, I saw how easily tokens could be stolen. I immediately abandoned this approach.
HTTP-Only Cookies: Setting the refresh token as an HTTP-Only cookie reduces the XSS risk by preventing JavaScript access. Additionally, by using the Secure flag to ensure it's only sent over HTTPS, we protect against man-in-the-middle attacks. However, this approach can remain vulnerable to CSRF (Cross-Site Request Forgery) attacks. For CSRF protection, additional measures such as SameSite=Lax or Strict attributes and CSRF tokens are necessary.

Server-Side Storage:
My preference in production environments is generally to manage refresh tokens on the server side. This means the token itself is not stored on the client. Instead, the client is only given a session ID or a one-time use refresh token hash. The actual refresh token and its associated user information are stored in a secure database on the server side (typically Redis or PostgreSQL).

For example, when I create a refresh token, I store it in Redis with a UUID using the SET command:

import uuid
import redis
import time

# Redis connection
r = redis.Redis(host='localhost', port=6379, db=0)

def generate_and_store_refresh_token(user_id: int):
    refresh_token_id = str(uuid.uuid4())
    # Store refresh token in Redis, associated with user_id
    # Lifetime of 7 days (604800 seconds)
    r.setex(f"refresh_token:{refresh_token_id}", 604800, user_id)
    return refresh_token_id

# Example usage
user_id = 123
token_id = generate_and_store_refresh_token(user_id)
print(f"Generated refresh_token_id: {token_id}")

In this scenario, the refresh_token_id is sent to the client. When the client requests a new access token, it sends this refresh_token_id to the server. The server checks the user_id corresponding to the refresh_token_id from Redis. If the token is valid, it generates a new access token and perhaps invalidates the old refresh_token_id by creating a new one (refresh token rotation). This method significantly reduces the risk of refresh token theft because the token itself is not in a sensitive storage area, but directly under server control.

Token Revocation Mechanisms

Due to the nature of stateless JWTs, it's not directly possible to revoke a token once it's signed before it expires. However, we can solve this problem by using the access token and refresh token model mentioned above or by developing additional mechanisms.

1. Blacklisting

One of the most common and simple methods is blacklisting. When a user logs out or changes their password, we record the JTI (JWT ID) of their active access token in a fast-access store (typically Redis). With each API request, we check if the JTI of the incoming access token is on this blacklist. If it is, the token is considered invalid.

# Blacklisting in Redis
def blacklist_jwt(jti: str, exp_timestamp: int):
    # Keep the token in Redis for its remaining lifetime
    now = int(time.time())
    ttl = exp_timestamp - now
    if ttl > 0:
        r.setex(f"blacklist:{jti}", ttl, "revoked")
        print(f"Token {jti} blacklisted, will be deleted after {ttl} seconds.")

# Check if blacklisted
def is_jwt_blacklisted(jti: str) -> bool:
    return r.exists(f"blacklist:{jti}")

Trade-offs:

Storage Overhead: Blacklisting a large number of access tokens means additional memory usage in Redis. However, the short lifespan of tokens makes this overhead manageable.
Latency: A Redis check with every request adds an extra network call and latency. However, thanks to Redis's speed, this latency is usually in the millisecond range.
Eventual Consistency: In distributed systems, the propagation of blacklist updates to all servers might not be instantaneous, meaning an invalid token could still be accepted for a short period. In my experience, this scenario was generally an acceptable level of risk, especially in systems like ERPs where instant invalidation is critical, but micro-second delays were not an issue.

2. Refresh Token Rotation

This method involves replacing refresh tokens with a new token after each use. When a client uses its current refresh token to obtain a new access token in place of an expired one, the server sends not only a new access token but also a new refresh token. The old refresh token is immediately invalidated.

If an attacker obtains and tries to use an old refresh token, the server will detect that this token has already been used and invalidated. In this case, we can not only reject the attacker's request but also invalidate all previously issued refresh tokens (both the user's and the attacker's) to detect a potential theft and completely terminate the session.

Trade-offs:

Complexity: Requires more complex state management on both the client and server sides.
Experience: I used this method in one of my mobile applications, and initially we experienced synchronization issues on the client side. When the user, due to network latency, tried again with the old token before receiving the new refresh token, errors occurred. We solved this issue with idempotent requests and retry mechanisms.

3. Hybrid Approach (Integration with Session Management)

In large-scale enterprise applications, due to the revocation difficulties of the pure JWT stateless model, I have also considered hybrid approaches that combine traditional session management with JWT. In this model, when a user first authenticates, a server-side session is created, and a session ID linked to this session is provided to the client. The JWT then includes this session ID.

With each request, the JWT is validated, and the session ID within it is checked against the server-side session to ensure it is still active. When a user logs out or is deactivated by an administrator, the server-side session can be instantly terminated. This combines the fast validation of JWT with the robust revocation capabilities of traditional session management.

Trade-offs:

Statefulness: It moves away from being "completely stateless" because session state is maintained on the server side.
Performance: Checking the session with every request might require an additional database query, which can impact performance. However, this can be minimized with an in-memory cache system like Redis.

💡 Using JTI for Quick Revocation

If instant token revocation is critical, add a unique JTI (JWT ID) claim to each JWT and blacklist this JTI in a fast-access memory like Redis upon revocation. This provides a quick revocation mechanism without compromising the stateless nature.

Security Practices to Consider

When designing JWT and token refresh/revocation mechanisms, there are general security practices we must consider. These practices enhance the overall resilience of the system that uses the tokens as much as the tokens themselves.

1. Rate Limiting and DDoS Protection

Rate limiting is crucial to prevent unauthorized access attempts or brute-force attacks on refresh tokens. In a production ERP, I implemented a limit of 5 attempts per 5 minutes for the login endpoint and the refresh token endpoint. This can be achieved with a configuration like the following on Nginx:

http {
    # ...
    limit_req_zone $binary_remote_addr zone=login_req:10m rate=5r/m;
    limit_req_zone $binary_remote_addr zone=refresh_req:10m rate=5r/m;
    # ...

    server {
        # ...
        location /api/auth/login {
            limit_req zone=login_req burst=10 nodelay;
            proxy_pass http://backend_service;
        }

        location /api/auth/refresh {
            limit_req zone=refresh_req burst=5 nodelay;
            proxy_pass http://backend_service;
        }
    }
}

This configuration limits requests to the /api/auth/login and /api/auth/refresh endpoints to 5 per minute from each IP address. Additionally, using services like Cloudflare for Layer 7 protection against general DDoS attacks reduces malicious traffic reaching the servers.

2. Correct JWT Claims and Signing Algorithm

It is essential to have the correct claims within JWTs and to use a secure signing algorithm:

iss (Issuer): The entity that issued the token.
aud (Audience): The intended recipient of the token.
exp (Expiration Time): The time after which the token is no longer valid.
nbf (Not Before): The time before which the token must not be accepted for processing.
iat (Issued At): The time at which the JWT was issued.
jti (JWT ID): Provides a unique identifier for the token, used for blacklisting.

Symmetric algorithms like HS256 (HMAC with SHA-256) are generally sufficient for the signing algorithm, but in situations with multiple services where the signing service is different from the verifying service, asymmetric algorithms like RS256 (RSA with SHA-256) may be preferred. Asymmetric algorithms reduce the risk of exposure of the private key that signs the token, while allowing anyone to verify with the public key. My general preference, if multiple services will verify the token, is RS256.

3. Advanced Security Measures

Multi-Factor Authentication (MFA): Implementing MFA, especially for sensitive operations, adds an extra layer of security even if a token is stolen.
Session Monitoring and Anomaly Detection: Monitoring user behavior to detect abnormal access patterns (e.g., the same user logging in from different geographical locations in a short period) and automatically terminating suspicious sessions is beneficial. In my side product's financial calculators, when I detect abnormal login attempts or a high number of failed transactions on user accounts, I automatically revoke their refresh tokens.
Secure Communication: All token exchange and API communication must strictly occur over HTTPS.

My Approach and Future Perspective

In my nearly two decades of field experience, I've encountered many different scenarios and problems related to JWTs. Especially in a production ERP or an internal banking platform, balancing security and usability was very critical. Therefore, I generally prefer a combination of refresh token rotation and server-side refresh token storage. This ensures access tokens remain short-lived while providing instant detection and revocation capabilities if refresh tokens are stolen.

Regarding token revocation, blacklisting with JTI on Redis offers a sufficient solution for instantly invalidating existing access tokens. This allows us to take quick action, especially when a user needs to be urgently removed from the system or in situations like password changes.

In the future, AI-powered operational monitoring and anomaly detection will take token security to the next level. AI models can learn normal user behavior and detect suspicious token usage much faster and more accurately. For example, sudden changes in a user's API call frequency or type can be examined by an AI agent and interpreted as a potential token leak or misuse signal. I am even experimenting with LLMs to analyze logs and find suspicious token usage patterns.

The conveniences and performance advantages offered by JWT are undeniable. However, when implementing this technology, it is vital not to overlook the challenges posed by its stateless nature and to establish robust refresh and revocation mechanisms for the overall security of our systems. When applying these approaches in your own projects, I recommend choosing the most suitable solution by considering your system's specific needs and risk tolerance.

DEV Community

JWT Refresh and Revocation Mechanisms: The State of Security Practices

Fundamentals of JWT and the Challenges of Stateless Architecture

Managing with the Access Token and Refresh Token Model

Refresh Token Security and Storage Strategies

Token Revocation Mechanisms

1. Blacklisting

2. Refresh Token Rotation

3. Hybrid Approach (Integration with Session Management)

Security Practices to Consider

1. Rate Limiting and DDoS Protection

2. Correct JWT Claims and Signing Algorithm

3. Advanced Security Measures

My Approach and Future Perspective

Top comments (0)