DEV Community

Cover image for Three JWT bugs that ship to prod silently — and the 5-line CI test that catches them
Blue Hills
Blue Hills

Posted on • Originally published at jwtshield.com

Three JWT bugs that ship to prod silently — and the 5-line CI test that catches them

Your auth tests pass. Your token verification works. Then your identity provider rotates a key at 02:47, your service hasn't refreshed its JWKS cache for 12 hours, and 8 minutes of production traffic hits 401.

Or worse: the rotation does happen, your cache picks up the new keys, but a service you haven't touched in six months is still pinning the old kid. Now half your fleet validates and half rejects, your error budget bleeds, and the only signal in your dashboard is "auth failures up."

This is the silent-bug class. Your unit tests don't cover it because the tokens you generate in tests don't drift. Your integration tests don't cover it because mocked issuers are eternal. Snyk doesn't catch it because it's not a vulnerability in your code — it's a configuration that goes stale between your last deploy and the moment it matters.

We built jwtshield to catch the three concrete failure modes that take down OIDC in production. Add a five-line GitHub Actions step. Each bug below is a real incident class with a reproduction and a one-line mitigation in CI.

Bug 1: JWKS rotation without overlap

Your identity provider publishes signing keys at https://login.example.com/.well-known/jwks.json. Your service caches that JWKS for some interval (10 minutes? An hour? Whatever your library defaults to). Tokens are signed by the current private key; verification uses the matching public key from the cache.

The provider rotates keys. Best practice is to publish the new key 24-48 hours before issuing tokens with it, so caches everywhere have time to pick it up. This is "overlap." Without it, the moment the provider switches signing keys, every cached JWKS in the world is stale until it refreshes.

Most identity providers do overlap correctly. Some don't. Some teams misconfigure their own internal IdPs. The result is a ~3-minute window where new tokens reference a kid that no verifier has seen yet.

Reproduction. Spin up a JWKS server. Sign a token with key A. Verify it. Rotate the JWKS endpoint to key B with no overlap. Sign a new token with key B. Try to verify with the cached JWKS. You'll see one of two failures:

ERR: kid 'b1' not found in JWKS
ERR: signature verification failed
Enter fullscreen mode Exit fullscreen mode

The check. jwtshield's /v1/validate/jwks-rotation accepts a previous JWKS, a current JWKS, an optional sample token, and an optional overlap policy. It returns one of no_change | safe_overlap | overlap | disjoint. disjoint means: no key from the previous set is in the current set. That's the failure mode.

curl -X POST https://api.jwtshield.com/v1/validate/jwks-rotation \
  -H "Authorization: Bearer $JWTSHIELD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "previous_jwks": <last-known good>,
    "current_jwks":  <freshly fetched>,
    "overlap_policy": { "min_overlap_count": 1 }
  }'
Enter fullscreen mode Exit fullscreen mode

If you run this on every deploy of the service that owns the issuer config, you catch the rotation gap before it ships.

Bug 2: Wrong audience claim

The aud claim in a JWT names the service the token is intended for. A token issued for api://billing should not authenticate against api://reporting. This is the audience check, and it is the difference between "we have auth" and "we have authorization."

The bug: a service accepts any well-signed token from a trusted issuer, regardless of aud. A user signs in to billing, billing issues a token, the user replays the token against reporting, and reporting hands back the user's data. The signature is valid. The expiry is fresh. The issuer is on the allowlist. The only thing wrong is that this token was never meant for this service.

This is a configuration bug. The verifier on reporting was set up six quarters ago by an engineer who has since left, and it doesn't pin the audience. New endpoints get added; the audience check stays missing.

Reproduction. Use any JWT library that accepts an "issuer" but not an "audience" parameter. Issue a token from your IdP for service A. Send it to service B. Most setups let it through.

The check. jwtshield's /v1/test/auth-regression accepts a list of (token, expected_failure_codes) tuples and runs them against your policy. Add one entry per service:

- token: <token issued for api://reporting>
  policy:
    issuer: https://login.example.com
    audiences: [api://billing]
    allowed_algs: [RS256]
  expected_failure_codes: [AUDIENCE_MISMATCH]
Enter fullscreen mode Exit fullscreen mode

The suite passes only if the token correctly fails with AUDIENCE_MISMATCH. If the policy quietly accepts it, the suite fails the PR. The audience configuration drift becomes visible the moment it's introduced.

Bug 3: Issuer config drift (the OIDC discovery doc lies)

Every OIDC provider exposes a discovery document at /.well-known/openid-configuration. It lists the issuer URL, JWKS URI, supported algorithms, and the endpoints clients need. Your service reads it once at startup, caches the values, and verifies tokens against the cached config.

The provider updates the discovery doc. The cached config is now stale. The most common drift modes:

  • The issuer changes hostnames (acquisition, rebrand, region split). Tokens carry iss: https://new.example.com, your verifier expects https://old.example.com, validation fails.
  • The supported algorithms change. The provider deprecates RS256 in favor of ES256. Your verifier accepts both, so tokens still validate, but the policy you intended to enforce is now wrong.
  • The JWKS URI moves. Your cached JWKS goes stale because the polling URL no longer returns keys.

Reproduction. Set up a verifier that caches the discovery doc on first call. Update the discovery doc on the provider side. Wait for the next token request. Validation passes against stale config until something visible breaks.

The check. jwtshield's /v1/lint/oidc-config takes the issuer URL, expected audiences, allowed algorithms, JWKS URI, and redirect URIs. It fetches the live discovery doc, fetches the live JWKS, and emits structured findings:

{
  "valid": false,
  "findings": [
    {
      "code": "JWKS_URI_MISMATCH",
      "severity": "high",
      "message": "Configured JWKS URI does not match discovery document",
      "evidence": {
        "configured": "https://login.example.com/.well-known/jwks.json",
        "discovered": "https://login.example.com/oauth/jwks"
      },
      "remediation": "Update your verifier configuration to use the discovered URI..."
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Run it nightly against your prod issuer. The first time the discovery doc moves, you find out before your customers do.

The fix: five lines of CI

All three checks ship in jwtshield-ci, our GitHub Actions wrapper. Add this to any workflow that touches your auth path:

- uses: redbullhorns/jwtshield-ci@v1
  with:
    issuer: https://login.example.com
    audience: api://backend
    fail-on-severity: high
Enter fullscreen mode Exit fullscreen mode

The Action calls jwtshield's regression suite with your policy, prints a structured status table, and fails the build on any high-severity finding. It runs in roughly 800ms. It costs nothing on the free tier (200 verifies/month).

We send synthetic test tokens, never your production tokens. Tokens are validated in memory and discarded — zero retention. Your audit trail lives at https://jwtshield.com/runs/<id> if you want compliance evidence.

The full status table on a passing run:

◼ jwtshield-ci v1.0.0 ─────────────────────────────────
  signature:        ✓ PASS
  issuer:           ✓ PASS
  audience:         ✓ PASS
  algorithm:        ✓ PASS
  time:             ✓ PASS
  required_claims:  ✓ PASS
  ─────────────────────────────────────────────────────
  6/6 checks passed · 0 findings
  evidence: https://jwtshield.com/runs/abc123def456
Enter fullscreen mode Exit fullscreen mode

Try it

Free tier: 200 verifies per month, all algorithms, community support. No credit card.

# 1. Get a key
open https://jwtshield.com/signup

# 2. Run the rotation classifier locally
curl -X POST https://api.jwtshield.com/v1/validate/jwks-rotation \
  -H "Authorization: Bearer $JWTSHIELD_API_KEY" \
  -H "Content-Type: application/json" \
  -d @rotation.json

# 3. Add the Action to your CI
# .github/workflows/auth.yml
- uses: redbullhorns/jwtshield-ci@v1
  with:
    issuer: https://login.example.com
    audience: api://backend
    fail-on-severity: high
Enter fullscreen mode Exit fullscreen mode

Pricing: $0 Starter (200 verifies, 1 issuer) → $49 Developer → $99 Startup → $199 Team → custom Enterprise. The Team tier covers 50,000 verifies a month, 25 issuers, full CI regression suite, and 30-day evidence retention.

If you've shipped a JWT validator in the last five years, you have at least one of these three bugs latent in production. The check is five lines.


Discuss on: Hacker News · dev.to · Hashnode · Mastodon

Related:

Top comments (0)