DEV Community

ahmet gedik
ahmet gedik

Posted on

Rotating JWT Refresh Tokens for Video API Clients Without Logouts

Last quarter our mobile analytics flagged something ugly: a single leaked refresh token was being replayed from three different ASNs across Jakarta, Manila, and a datacenter range in Frankfurt. The token had been minted for a legitimate Android client, but it was now minting access tokens for somebody scraping our trending-video feed at 40 requests a second. Our refresh tokens lived for 30 days and never changed value, so one capture — from a rooted device, a misconfigured proxy, or a logged response body — was a month of free API access. At TopVideoHub we aggregate trending video across the Asia-Pacific region in CJK languages, and a lot of our traffic comes from mobile clients on carrier-grade NAT and flaky 3G, which makes naive token handling actively dangerous: you cannot tell a replay attack apart from a phone that just reconnected to the network.

This post walks through how we moved from static long-lived refresh tokens to refresh-token rotation with reuse detection, the exact schema and PHP we run on SQLite, the race condition that bites you on bad networks, and the Go client logic that survives it. Everything here runs on PHP 8.4, SQLite, LiteSpeed, and Cloudflare — no Redis, no external session store.

Why static refresh tokens are a liability

The standard OAuth2 pattern is short-lived access tokens (we use 15 minutes) plus a long-lived refresh token the client exchanges for a new access token. The access token is a stateless JWT we verify with a signature check — fast, no database hit. The refresh token is the problem child:

  • It is long-lived by design, so a single leak is a long-lived compromise.
  • If it is a stateless JWT too, you literally cannot revoke it without a blocklist, which defeats the point of statelessness.
  • A stolen refresh token is silent. The legitimate user keeps working, the attacker keeps working, and nothing in your logs distinguishes them.

Rotation fixes the third point, which is the one that matters. The idea: every time a client uses a refresh token, you invalidate it and issue a brand-new one. A refresh token becomes a strict single-use credential. If you ever see a refresh token used twice, exactly one of two things happened — a legitimate client retried after a dropped response, or someone is replaying a stolen token. We will handle both.

The rotation model in one paragraph

We group refresh tokens into families. When a user logs in, we open a family and issue the first refresh token in it. Each refresh exchange retires the current token and issues its successor in the same family, forming a chain. The access token stays a normal stateless JWT. The refresh token is an opaque random string whose hash we store server-side — never a JWT, because we need server-side state to make single-use work anyway, so a JWT buys us nothing but a larger token. If a retired token is ever presented again, we treat it as a breach signal and revoke the entire family, forcing a full re-login. That is the whole security model: chains that are append-only, and any fork in the chain nukes the chain.

Schema for token families on SQLite

We keep two tables. One for families, one for the individual tokens. We store only a SHA-256 of each refresh token so a database leak does not hand an attacker live credentials.

CREATE TABLE token_family (
    id            INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id       INTEGER NOT NULL,
    client_id     TEXT NOT NULL,           -- 'android', 'ios', 'web', 'partner-api'
    created_at    INTEGER NOT NULL,        -- unix seconds
    revoked_at    INTEGER,                 -- NULL = live family
    revoke_reason TEXT                     -- 'reuse', 'logout', 'expired'
);

CREATE TABLE refresh_token (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    family_id   INTEGER NOT NULL REFERENCES token_family(id),
    token_hash  TEXT NOT NULL UNIQUE,      -- sha256 hex of the opaque token
    prev_id     INTEGER,                   -- previous token in the chain
    issued_at   INTEGER NOT NULL,
    expires_at  INTEGER NOT NULL,
    used_at     INTEGER,                   -- NULL until exchanged
    superseded_by INTEGER                  -- id of the token that replaced it
);

CREATE INDEX idx_refresh_hash   ON refresh_token(token_hash);
CREATE INDEX idx_refresh_family ON refresh_token(family_id);
CREATE INDEX idx_family_user    ON token_family(user_id, client_id);
Enter fullscreen mode Exit fullscreen mode

A couple of deliberate choices. The token_hash column is UNIQUE, which means even a hash collision or a duplicate insert fails loudly at the storage layer instead of silently corrupting a chain. The used_at and superseded_by columns let us tell, for any token presented, whether it is fresh, already spent, or part of a dead family — which is exactly the information reuse detection needs. On SQLite we run in WAL mode (PRAGMA journal_mode=WAL) so readers verifying access tokens never block the writer rotating refresh tokens.

Issuing and rotating tokens in PHP 8.4

Here is the core. issueFamily() runs at login; rotate() runs at every refresh. The opaque token is 32 bytes of CSPRNG output, base64url-encoded. We never store it — we store its hash and hand the raw value to the client once.

<?php
declare(strict_types=1);

final class TokenService
{
    private const ACCESS_TTL  = 900;        // 15 minutes
    private const REFRESH_TTL = 2592000;    // 30 days

    public function __construct(
        private readonly PDO $db,
        private readonly string $jwtSecret,
    ) {}

    /** Called on successful login. Returns [accessJwt, refreshToken]. */
    public function issueFamily(int $userId, string $clientId): array
    {
        $now = time();
        $this->db->beginTransaction();

        $stmt = $this->db->prepare(
            'INSERT INTO token_family (user_id, client_id, created_at)
             VALUES (:u, :c, :t)'
        );
        $stmt->execute([':u' => $userId, ':c' => $clientId, ':t' => $now]);
        $familyId = (int) $this->db->lastInsertId();

        $refresh = $this->mintRefresh($familyId, null, $now);
        $this->db->commit();

        return [$this->mintAccess($userId, $clientId), $refresh];
    }

    private function mintRefresh(int $familyId, ?int $prevId, int $now): string
    {
        $raw  = rtrim(strtr(base64_encode(random_bytes(32)), '+/', '-_'), '=');
        $hash = hash('sha256', $raw);

        $stmt = $this->db->prepare(
            'INSERT INTO refresh_token
               (family_id, token_hash, prev_id, issued_at, expires_at)
             VALUES (:f, :h, :p, :i, :e)'
        );
        $stmt->execute([
            ':f' => $familyId,
            ':h' => $hash,
            ':p' => $prevId,
            ':i' => $now,
            ':e' => $now + self::REFRESH_TTL,
        ]);
        return $raw;
    }

    private function mintAccess(int $userId, string $clientId): string
    {
        $now    = time();
        $header = $this->b64(json_encode(['alg' => 'HS256', 'typ' => 'JWT']));
        $claims = $this->b64(json_encode([
            'sub' => $userId,
            'cid' => $clientId,
            'iat' => $now,
            'exp' => $now + self::ACCESS_TTL,
            'scope' => 'video:read',
        ]));
        $sig = $this->b64(hash_hmac('sha256', "$header.$claims", $this->jwtSecret, true));
        return "$header.$claims.$sig";
    }

    private function b64(string $data): string
    {
        return rtrim(strtr(base64_encode($data), '+/', '-_'), '=');
    }
}
Enter fullscreen mode Exit fullscreen mode

Note that issueFamily() wraps the family insert and the first token insert in one transaction. If the process dies between them you would otherwise have a family with no tokens — harmless but messy. The access JWT is hand-rolled HS256 here for clarity; in production we use the same primitives but constant-time compare on verify.

Rotation with reuse detection

This is where the security lives. The rotate() method has to do four things atomically: look up the presented token, confirm it is the live tip of its family, retire it, and mint its successor. If the token is valid but already spent, we are looking at a reuse event and we burn the family.

<?php
declare(strict_types=1);

final class RefreshException extends RuntimeException {}

final class RotationService
{
    public function __construct(private readonly PDO $db, private readonly TokenService $tokens) {}

    /** Exchange a refresh token for a new access + refresh pair. */
    public function rotate(string $presented, int $userId, string $clientId): array
    {
        $hash = hash('sha256', $presented);
        $now  = time();

        $this->db->beginTransaction();
        try {
            $row = $this->lockToken($hash);
            if ($row === null) {
                throw new RefreshException('unknown_token');
            }

            $familyId = (int) $row['family_id'];

            // Family already dead? Reject outright.
            if ($this->familyRevoked($familyId)) {
                throw new RefreshException('family_revoked');
            }

            // REUSE: a retired token is being presented again. Burn the family.
            if ($row['used_at'] !== null) {
                $this->revokeFamily($familyId, 'reuse', $now);
                $this->db->commit();
                throw new RefreshException('reuse_detected');
            }

            if ((int) $row['expires_at'] < $now) {
                throw new RefreshException('expired');
            }

            // Retire current token, mint successor in same family.
            $newRaw   = $this->tokens->mintRefreshPublic($familyId, (int) $row['id'], $now);
            $newId    = (int) $this->db->lastInsertId();

            $upd = $this->db->prepare(
                'UPDATE refresh_token
                    SET used_at = :n, superseded_by = :s
                  WHERE id = :id'
            );
            $upd->execute([':n' => $now, ':s' => $newId, ':id' => $row['id']]);

            $this->db->commit();
        } catch (RefreshException $e) {
            if ($this->db->inTransaction()) {
                $this->db->commit(); // keep the revoke if it happened
            }
            throw $e;
        }

        return [$this->tokens->mintAccessPublic($userId, $clientId), $newRaw];
    }

    private function lockToken(string $hash): ?array
    {
        // SQLite serializes writers; SELECT inside the write txn sees committed state.
        $stmt = $this->db->prepare(
            'SELECT id, family_id, used_at, expires_at
               FROM refresh_token WHERE token_hash = :h'
        );
        $stmt->execute([':h' => $hash]);
        $row = $stmt->fetch(PDO::FETCH_ASSOC);
        return $row ?: null;
    }

    private function familyRevoked(int $familyId): bool
    {
        $stmt = $this->db->prepare('SELECT revoked_at FROM token_family WHERE id = :id');
        $stmt->execute([':id' => $familyId]);
        return $stmt->fetchColumn() !== null;
    }

    private function revokeFamily(int $familyId, string $reason, int $now): void
    {
        $stmt = $this->db->prepare(
            'UPDATE token_family
                SET revoked_at = :n, revoke_reason = :r
              WHERE id = :id AND revoked_at IS NULL'
        );
        $stmt->execute([':n' => $now, ':r' => $reason, ':id' => $familyId]);
    }
}
Enter fullscreen mode Exit fullscreen mode

The key invariant: a token can only be retired once, enforced by the used_at IS NULL check inside a write transaction. SQLite serializes writers, so two concurrent rotations of the same token cannot both pass the check — one commits, the other sees used_at already set and trips reuse detection. That property is free on SQLite and is exactly why we did not reach for a fancier store.

When reuse_detected fires, we revoke the family and the user's next API call gets a 401, which the client turns into a fresh login. The attacker and the victim both get logged out, which is correct: we cannot tell which one is which, so we invalidate both and let the human re-authenticate.

The race condition on flaky networks

Here is the failure mode that will generate angry support tickets if you ignore it. A mobile client in a tunnel sends its refresh token. Our server rotates it, retires the old one, mints the new one — and the response evaporates because the radio dropped. The client never received the new token. It retries with the old token, which we have now marked used_at. Reuse detection fires. We revoke the family. A legitimate user just got logged out because their subway train went under a river.

The RFC (draft oauth-security-topics) acknowledges this and the pragmatic fix is a short grace window. Instead of treating any reuse as hostile, we allow the immediately-prior token to be re-presented for a few seconds and return the same successor we already minted, rather than minting another. We add a grace_until and store the successor's raw token hash so the retry is idempotent. Concretely, we relax the reuse branch:

  • If a retired token is presented and its superseded_by successor is still unused and we are within ~10 seconds of used_at, return the existing successor instead of revoking.
  • Outside that window, or if the successor was already used, treat it as a real breach and burn the family.

This cuts false-positive logouts to near zero on our APAC mobile traffic while keeping the window far too short to be useful to an attacker replaying a token hours later. Ten seconds is a network hiccup; it is not an exfiltration-and-replay pipeline.

Client retry logic that does not self-DDoS

The client has to cooperate. Our edge ingestion service is written in Go, and it talks to the same token API as our mobile apps, so it is the cleanest example. The rules: only one in-flight refresh at a time (a mutex), and on a reuse_detected or family_revoked response, stop retrying and re-authenticate — do not hammer the endpoint.

package authclient

import (
    "errors"
    "net/http"
    "sync"
    "time"
)

var ErrReauth = errors.New("refresh family revoked, full re-login required")

type TokenStore struct {
    mu      sync.Mutex
    access  string
    refresh string
    expiry  time.Time
    http    *http.Client
    baseURL string
}

// Valid returns a usable access token, refreshing at most once even under
// concurrent callers. The mutex guarantees a single in-flight rotation.
func (s *TokenStore) Valid() (string, error) {
    s.mu.Lock()
    defer s.mu.Unlock()

    if time.Now().Before(s.expiry.Add(-30 * time.Second)) {
        return s.access, nil // still fresh, no network call
    }

    resp, err := s.rotate(s.refresh)
    if err != nil {
        return "", err
    }
    switch resp.StatusCode {
    case http.StatusOK:
        s.access, s.refresh, s.expiry = resp.Access, resp.Refresh, resp.Expiry
        return s.access, nil
    case http.StatusUnauthorized:
        // reuse_detected or family_revoked: do NOT retry, re-login.
        s.access, s.refresh = "", ""
        return "", ErrReauth
    default:
        // transient 5xx: caller may retry with backoff, token untouched.
        return "", errors.New("refresh transient failure")
    }
}
Enter fullscreen mode Exit fullscreen mode

The two non-negotiable details: the mutex (so a goroutine storm does not fire ten concurrent rotations and trip your own reuse detection against yourself) and the hard stop on 401 (so a revoked family does not turn into a retry loop that looks like an attack). The grace window on the server and the single-flight mutex on the client are two halves of the same fix — you need both.

Verifying the chain holds under load

Before shipping, we ran an adversarial test: spin up concurrent rotations of the same token and assert that exactly one succeeds and the family ends up revoked. This Python script is what we keep in CI.

import concurrent.futures as cf
import requests

BASE = "https://api.topvideohub.com"

def rotate(token: str) -> int:
    r = requests.post(f"{BASE}/auth/refresh", json={"refresh_token": token}, timeout=5)
    return r.status_code

def test_double_use_revokes_family(seed_token: str):
    # Fire the SAME token twice, concurrently, outside the grace window.
    with cf.ThreadPoolExecutor(max_workers=2) as ex:
        first = ex.submit(rotate, seed_token)
        codes = sorted([first.result(), rotate(seed_token)])
    # One rotation succeeds (200), the replay is rejected (401).
    assert codes == [200, 401], f"unexpected: {codes}"

    # The family must now be dead: the freshly issued token also fails.
    # (fetched from the 200 response in the real test fixture)
    print("chain integrity holds: single-use enforced, family burned on replay")
Enter fullscreen mode Exit fullscreen mode

If that test ever returns [200, 200], your rotation is not atomic and you have a silent double-spend. It is the single most important assertion in the whole system.

Rolling it out behind Cloudflare and LiteSpeed

Two deployment notes specific to our stack. First, the /auth/refresh endpoint must be excluded from any edge caching — we set Cache-Control: no-store and added a Cloudflare cache rule to bypass the auth path entirely, because a cached refresh response is a catastrophe. Second, because we run Cloudflare in front of LiteSpeed, the client IP arrives in CF-Connecting-IP; we log that alongside every family revocation so the security dashboard can correlate reuse events by network. When we replayed the original incident through the new system, the family was revoked on the second request and the scraper's ASN showed up in the revocation log within seconds instead of running for three weeks.

What we measured after a month in production:

  • Token replay window dropped from up to 30 days to a single rotation interval (~15 minutes of access-token life plus a 10-second refresh grace).
  • False-positive logouts from network drops fell to roughly 0.02% of refresh calls once the grace window landed — before it, the naive rotation logged out about 1 in 500 mobile refreshes.
  • Storage cost is trivial: a few hundred bytes per active family, pruned by a nightly job that deletes families revoked or expired more than seven days ago.

Conclusion

Refresh-token rotation is not exotic and it does not need exotic infrastructure. The whole thing rides on three guarantees: refresh tokens are opaque and single-use, rotation is atomic so a token can only be retired once, and any reuse outside a tiny grace window burns the entire family. SQLite's writer serialization gives you the atomicity for free, a 10-second grace window kills the false-positive logouts that plague real mobile users, and a disciplined single-flight client keeps you from attacking yourself. If you are still shipping 30-day static refresh tokens to mobile video clients, you are one logged response body away from somebody else's free API key — rotate them.

Top comments (0)