DEV Community

ahmet gedik
ahmet gedik

Posted on

Implementing JWT Refresh Token Rotation for Video API Clients in PHP 8.4

Three months ago our anomaly dashboard flagged something ugly: a single refresh token issued to our Android client was being redeemed from two different autonomous systems within the same hour — one request originating in Frankfurt, the other in São Paulo. The token was valid, the signature checked out, and under our old static-refresh-token scheme there was nothing in the protocol itself that let us tell the legitimate device from the attacker. We run TrendVidStream, a global multi-region video streaming discovery platform, and our API serves mobile clients, partner integrations, and our own fleet of cron workers pulling trending-video data across eight regions. Every one of those clients authenticates with JWTs. That incident is why we moved to refresh-token rotation with reuse detection, and this post is the full implementation: the storage schema, the PHP 8.4 server code, the client-side retry logic, and the operational sharp edges that the RFC drafts do not warn you about.

Why a Static Refresh Token Is a 30-Day Skeleton Key

The standard two-token setup looks reasonable on paper. Access tokens are short-lived JWTs (ours live 10 minutes), validated statelessly by signature, so the hot path never touches the database. Refresh tokens are long-lived opaque strings (ours lived 30 days) that the client exchanges for a new access token when the old one expires.

The problem is what happens when a refresh token leaks. With a static token:

  • The stolen credential works for its entire remaining lifetime — up to 30 days of full API access.
  • The attacker and the legitimate client both keep using the same token, and from the server's perspective the traffic is indistinguishable.
  • Revocation only happens if the user notices something and logs out everywhere, which for a machine-to-machine video API client basically means never.
  • Leaks are more common than people admit: tokens end up in crash reports, in debug logs shipped to third-party aggregators, in backup snapshots of mobile devices, and in partner integrations that store them in plaintext config files.

Rotation changes the economics completely. Every time a refresh token is used, the server invalidates it and issues a brand-new one. Each token is single-use. If a token is ever presented twice, one of the two presenters is not the legitimate client — and the server now has a concrete, protocol-level signal of theft. That signal is the entire point. Rotation is not primarily about shrinking the exposure window; it is about turning a silent compromise into a loud, detectable event.

Token Families and the Rotation Chain

The mechanism that makes reuse detection work is the token family. When a client first authenticates with credentials, the server creates a family — just a random ID — and issues refresh token R1 inside it. When the client redeems R1, the server marks R1 as rotated and issues R2 in the same family. R2 begets R3, and so on. The family is a linked chain of single-use tokens, and at any moment exactly one token in the family should be active.

Now consider theft. An attacker steals R2 while the legitimate client still holds it. Two cases:

  • The attacker redeems R2 first. They get R3. The legitimate client later presents R2, the server sees a rotated token being reused, and revokes the entire family — including the attacker's R3. The attacker is cut off and the real user re-authenticates.
  • The legitimate client redeems R2 first and gets R3. The attacker later presents the stale R2, the server detects reuse, and again the whole family dies.

In both orderings the attacker loses access within one rotation cycle, and the server logs a reuse event with the family ID, client ID, and region attached. Compare that with the static scheme, where the attacker quietly coexists with the real client for a month.

The critical implementation detail is that reuse must revoke the whole family, not just the presented token. If you only reject the stale token, the attacker who rotated first keeps their fresh chain and you have detected the theft without actually stopping it.

The Storage Schema in SQLite

Our entire platform runs on SQLite — FTS5 powers the video search index, WAL mode handles concurrent readers under LiteSpeed, and we deploy to shared hosting over FTP, which means there is no Redis, no Postgres, and no daemon we can install. The auth store lives in the same constraints, and honestly SQLite is a great fit here: refresh grants are low-frequency writes (one per client per access-token lifetime), and WAL gives us the transactional guarantees rotation needs.

Two non-negotiable rules before the schema. First, refresh tokens are opaque random strings, not JWTs — there is nothing for the client to introspect, and an unsigned 256-bit random value cannot be forged offline. Second, the database stores only a SHA-256 hash of the token. If the database file ever leaks (and on FTP-deployed shared hosting you should assume it can), the hashes are useless to an attacker because the preimage space is 256 random bits, not a human password — no bcrypt needed, a single fast hash is correct here.

PRAGMA journal_mode = WAL;

CREATE TABLE refresh_tokens (
    id          INTEGER PRIMARY KEY,
    token_hash  TEXT    NOT NULL UNIQUE,  -- sha256 hex of the opaque token
    family_id   TEXT    NOT NULL,         -- random id shared by the whole chain
    client_id   TEXT    NOT NULL,         -- mobile app, partner, region worker
    user_id     INTEGER NOT NULL,
    status      TEXT    NOT NULL DEFAULT 'active',  -- active | rotated | revoked
    replaced_by INTEGER,                  -- id of the child token, if rotated
    issued_at   INTEGER NOT NULL,
    expires_at  INTEGER NOT NULL,
    rotated_at  INTEGER,
    region      TEXT,                     -- which regional context issued it
    FOREIGN KEY (replaced_by) REFERENCES refresh_tokens(id)
);

CREATE INDEX idx_rt_family ON refresh_tokens(family_id);
CREATE INDEX idx_rt_expiry ON refresh_tokens(expires_at);
Enter fullscreen mode Exit fullscreen mode

The region column earns its keep in incident response. When a reuse event fires, knowing that a family bootstrapped by our US trending worker is suddenly being redeemed by something claiming to be the mobile client tells you immediately what kind of leak you are looking at.

Issuing the Token Pair in PHP 8.4

Issuance happens in two places: initial login (new family) and rotation (existing family). Same code path, the only difference is whether a family ID is passed in. We use firebase/php-jwt for the access-token signing and PHP 8.3+ typed class constants to keep the TTLs honest.

final class TokenIssuer
{
    private const int ACCESS_TTL  = 600;        // 10 minutes
    private const int REFRESH_TTL = 2_592_000;  // 30 days

    public function __construct(
        private readonly PDO $db,
        private readonly string $signingKey,
        private readonly string $keyId,
    ) {}

    public function issuePair(
        int $userId,
        string $clientId,
        string $region,
        ?string $familyId = null,
    ): array {
        $familyId   ??= bin2hex(random_bytes(16));
        $refreshToken = bin2hex(random_bytes(32)); // opaque, never a JWT
        $now          = time();

        $stmt = $this->db->prepare(
            'INSERT INTO refresh_tokens
               (token_hash, family_id, client_id, user_id, issued_at, expires_at, region)
             VALUES (:hash, :family, :client, :user, :now, :exp, :region)'
        );
        $stmt->execute([
            'hash'   => hash('sha256', $refreshToken),
            'family' => $familyId,
            'client' => $clientId,
            'user'   => $userId,
            'now'    => $now,
            'exp'    => $now + self::REFRESH_TTL,
            'region' => $region,
        ]);

        $payload = [
            'iss' => 'api.trendvidstream.com',
            'sub' => (string) $userId,
            'cid' => $clientId,
            'iat' => $now,
            'exp' => $now + self::ACCESS_TTL,
        ];

        return [
            'access_token'  => JWT::encode($payload, $this->signingKey, 'HS256', $this->keyId),
            'refresh_token' => $refreshToken,
            'expires_in'    => self::ACCESS_TTL,
        ];
    }
}
Enter fullscreen mode Exit fullscreen mode

Note the fourth argument to JWT::encode — the kid header. Put it in from day one. When you eventually rotate the signing key itself (and you will), the kid lets your verifier pick the right key from a keyring instead of forcing a flag-day cutover where every outstanding access token dies at once. We keep the current key plus one predecessor in the keyring and retire the old one after ACCESS_TTL has fully elapsed.

The Rotation Endpoint with Reuse Detection

This is the heart of the system, and it is where most implementations get subtly wrong in one of two ways: they forget family revocation, or they ignore concurrency. The whole exchange must be transactional, and the rotated-status update must be a compare-and-swap so that two parallel requests presenting the same token cannot both win.

final class RefreshEndpoint
{
    private const int GRACE_SECONDS = 30;

    public function __construct(
        private readonly PDO $db,
        private readonly TokenIssuer $issuer,
    ) {}

    public function rotate(string $presented, string $region): array
    {
        $hash = hash('sha256', $presented);
        $this->db->beginTransaction();
        try {
            $stmt = $this->db->prepare(
                'SELECT * FROM refresh_tokens WHERE token_hash = :hash'
            );
            $stmt->execute(['hash' => $hash]);
            $row = $stmt->fetch(PDO::FETCH_ASSOC);
            $now = time();

            if ($row === false) {
                throw new AuthException('unknown_token', 401);
            }
            if ($row['status'] === 'revoked' || (int) $row['expires_at'] < $now) {
                throw new AuthException('expired_or_revoked', 401);
            }

            if ($row['status'] === 'rotated') {
                $age = $now - (int) $row['rotated_at'];
                if ($age <= self::GRACE_SECONDS) {
                    // A retry of the same request over a flaky mobile
                    // network, not an attack. Tell the client to re-read
                    // its token store and try again.
                    throw new AuthException('retry_with_new_token', 409);
                }
                // Reuse outside the grace window: assume theft.
                $this->revokeFamily($row['family_id']);
                $this->db->commit();
                $this->alertReuse($row);
                throw new AuthException('token_reuse_detected', 401);
            }

            // Happy path: compare-and-swap the status, then issue a child.
            $upd = $this->db->prepare(
                'UPDATE refresh_tokens
                    SET status = :rot, rotated_at = :now
                  WHERE id = :id AND status = :act'
            );
            $upd->execute([
                'rot' => 'rotated', 'now' => $now,
                'id'  => $row['id'], 'act' => 'active',
            ]);
            if ($upd->rowCount() !== 1) {
                // Lost a race with a parallel refresh in this same window.
                throw new AuthException('retry_with_new_token', 409);
            }

            $pair = $this->issuer->issuePair(
                (int) $row['user_id'],
                $row['client_id'],
                $region,
                $row['family_id'],
            );
            $this->db->commit();
            return $pair;
        } catch (Throwable $e) {
            if ($this->db->inTransaction()) {
                $this->db->rollBack();
            }
            throw $e;
        }
    }

    private function revokeFamily(string $familyId): void
    {
        $stmt = $this->db->prepare(
            'UPDATE refresh_tokens SET status = :st WHERE family_id = :fam'
        );
        $stmt->execute(['st' => 'revoked', 'fam' => $familyId]);
    }

    private function alertReuse(array $row): void
    {
        error_log(sprintf(
            'TOKEN_REUSE family=%s client=%s user=%d region=%s',
            $row['family_id'],
            $row['client_id'],
            (int) $row['user_id'],
            $row['region'] ?? '-',
        ));
    }
}
Enter fullscreen mode Exit fullscreen mode

A few decisions in there deserve justification.

The grace window is the concession you make to the real world. Strict single-use rotation assumes the client always receives the response containing its new token. On mobile networks that assumption fails constantly: the server commits the rotation, the response dies in transit, the client retries with the token it still has — which is now marked rotated. Without a grace window, every dropped response nukes a legitimate family and forces a full re-login. With a 30-second window, a quick retry gets a 409 telling it to reload its token store (where a parallel request may have already saved the new pair) rather than a family-killing 401. Thirty seconds is long enough to absorb any sane retry policy and short enough that an attacker replaying a day-old stolen token still trips the alarm. Some implementations return the cached child token inside the grace window instead of a 409; we chose the 409 because it keeps the endpoint stateless about response bodies and pushes the recovery into the client, where the single-flight lock (below) handles it cleanly.

The compare-and-swap (WHERE id = :id AND status = :act plus the rowCount() check) matters even under SQLite's writer lock. Two requests can both read the row as active before either writes; the CAS guarantees only one of them transitions it and issues the child, while the loser gets a retryable 409 instead of silently creating a second branch in the family.

And the reuse path commits before throwing. The family revocation must persist even though the request itself fails — rolling it back with the exception would undo the one thing the detection exists to do.

The Client Side Needs to Cooperate

Rotation breaks lazy clients. Any client that fires concurrent requests and refreshes on-demand from multiple threads will trample its own chain. Our region workers — the same Python fleet that pulls trending feeds for eight regions on staggered cron schedules — all share one rule: refreshes are single-flight, and the token pair is persisted atomically before the new access token is used.

import threading
import time
import requests

class TokenManager:
    def __init__(self, base_url, client_id, store):
        self.base_url = base_url
        self.client_id = client_id
        self.store = store              # persists the pair atomically to disk
        self._lock = threading.Lock()

    def access_token(self):
        tok = self.store.load()
        if tok['access_expires_at'] - time.time() > 30:
            return tok['access_token']
        return self._refresh()

    def _refresh(self):
        with self._lock:                # single-flight: one refresh at a time
            tok = self.store.load()     # re-check after acquiring the lock
            if tok['access_expires_at'] - time.time() > 30:
                return tok['access_token']

            for attempt in range(3):
                resp = requests.post(
                    f'{self.base_url}/oauth/token',
                    json={
                        'grant_type': 'refresh_token',
                        'refresh_token': tok['refresh_token'],
                        'client_id': self.client_id,
                    },
                    timeout=10,
                )
                if resp.status_code == 200:
                    body = resp.json()
                    self.store.save({
                        'access_token': body['access_token'],
                        'refresh_token': body['refresh_token'],
                        'access_expires_at': time.time() + body['expires_in'],
                    })
                    return body['access_token']
                if resp.status_code == 409:
                    # Server saw a parallel or retried rotation; another
                    # writer may have saved the fresh pair already.
                    time.sleep(1 + attempt)
                    tok = self.store.load()
                    continue
                if resp.status_code == 401:
                    raise ReauthRequired(resp.json().get('error'))
            raise RefreshFailed('gave up after 3 attempts')
Enter fullscreen mode Exit fullscreen mode

The details that matter, in rough order of how much pain they saved us:

  • Persist before use. The new refresh token must hit durable storage before the old one is forgotten. If the process crashes between receiving the pair and saving it, the client is locked out of its own family.
  • Refresh proactively, not reactively. The 30-second buffer before expiry means in-flight requests almost never race an expired token.
  • Treat 409 as 'reload and retry', 401 as 'stop and re-authenticate'. Conflating them either spams your login endpoint or retries forever against a revoked family.
  • One refresher per token store. Two cron processes sharing one credential file will fork the chain no matter how polite the server is. Each of our region workers gets its own client_id and its own family.

Operational Cleanup and the Cron Angle

Rotation multiplies your row count: every refresh leaves a tombstone. At a 10-minute access TTL, one busy client writes ~144 rows a day. Left alone, the table becomes a slowly growing log of every session your API ever had. We prune from the same staggered cron jobs that refresh the regional video feeds — they already run every few hours per region, so the auth maintenance rides along for free.

// cron/prune_tokens.php — rides along with the multi-region fetch cron.
$db = new PDO('sqlite:' . __DIR__ . '/../data/auth.db');
$db->exec('PRAGMA journal_mode = WAL');

// Keep revoked/rotated rows 90 days past expiry for forensics,
// then let them go.
$cutoff = time() - 90 * 86400;
$db->exec('DELETE FROM refresh_tokens WHERE expires_at < ' . $cutoff);
$db->exec('PRAGMA optimize');
Enter fullscreen mode Exit fullscreen mode

Why keep tombstones 90 days? Because the rotated chain is your audit trail. When a reuse event fires, walking the family through replaced_by tells you exactly when the chain forked, which region issued each link, and how long the attacker held the token before redeeming it. Deleting rotated rows eagerly throws away the only forensic record the protocol gives you.

One deployment note, since our stack is unusual: we ship code to all production hosts over FTP with lftp mirror scripts, which means secrets management has to be deliberate. The JWT signing key and the SQLite auth database are both excluded from the mirror set — the key lives in a per-site env file that is provisioned once by hand, and the database is born on the host. If your deploy pipeline is rsync, FTP, or anything else that mirrors a directory tree, audit the exclusion list before you ship an auth system; the failure mode is overwriting production token state with a stale local copy, which revokes every client at once and looks exactly like an outage.

Finally, monitor the reuse events as a product signal, not just a security one. After we shipped rotation, our first three TOKEN_REUSE alerts were not attackers — they were a partner integration running two replicas against one credential. The protocol caught a misconfiguration that would otherwise have surfaced as intermittent, unexplainable 401s in their logs.

Conclusion

Refresh-token rotation is one of the rare security upgrades where the hard part is not cryptography but state and concurrency: a single-use chain with family revocation, a compare-and-swap on the status transition, a grace window for retried requests, and clients disciplined enough to single-flight their refreshes and persist before forgetting. None of it needs heavyweight infrastructure — the implementation above runs in production on shared hosting with SQLite in WAL mode behind a video discovery API serving eight regions. The payoff showed up in week one: stolen-token replay went from invisible to a logged, family-revoking, alert-firing event. If your API still hands out 30-day static refresh tokens, the schema and two PHP classes above are genuinely an afternoon of work — and the first reuse alert they fire will justify it.

Top comments (0)