Ravi Gupta

Posted on Jun 29

A Deactivated Admin Could Still Use Their Token. That's When Dual-Mode JWT Stopped Being About Speed.

#python #security #backend #authorization

What building cross-service RBAC taught me about the difference between a fast check and a correct one

VaultPay is a wallet microservice I built on top of AuthShield.
Previous parts:
Part 1 is here: I Built AuthShield and Immediately Knew It Wasn't Enough
Part 2 is here: The Silent Failure I Never Saw Coming: What VaultPay Taught Me About Consistency Under Failure
Part 3 is here: I Started With a Blocklist. That Was the Wrong Instinct and VaultPay Taught Me Why.
Part 4 is here: I Watched Money Move Twice From the Same Request. That's When I Understood Idempotency.
Part 5 is here: I Almost Hashed a Document Number That Needed to Be Read Again

When I designed JWT validation for VaultPay, the only thing I was optimising for was speed.

Local verification, no network call, decode the token with the shared secret, read the claims, move on. Every request gets this. It's fast - no round trip to AuthShield, no added latency on the hot path. That felt like the obvious right answer for a system processing financial transactions, where every millisecond on the request path matters.

Then I asked myself a question I hadn't thought through properly: what happens if an admin gets deactivated in AuthShield right now, this second, while they still have a valid token sitting in their browser?

The answer, with pure local validation, is uncomfortable. Nothing happens. The token is still cryptographically valid. The signature checks out. The claims say role: admin. VaultPay has no way of knowing that AuthShield revoked this person's access thirty seconds ago, because VaultPay never asked AuthShield. It just trusted the token.

That's the moment dual-mode validation stopped being a performance optimisation and became a correctness requirement.

Two Services, No Shared Database

VaultPay and AuthShield are separate microservices with separate databases. AuthShield owns user accounts, login, JWT issuance, and role management. VaultPay owns wallets, transactions, KYC, and admin operations on top of that.

They don't share a database, which means VaultPay can never run a query against AuthShield's user table to check "is this role still valid." The only way VaultPay knows anything about a user's identity is through what's encoded in the JWT, or through an explicit HTTP call to AuthShield.

This is a deliberate boundary, and it's the right one - coupling two services through a shared database creates exactly the kind of tangled dependency that makes both services harder to change independently. But the boundary has a cost: VaultPay's view of "who is this user" is only as fresh as the last token issuance, unless it actively asks for an update.

Client logs in → AuthShield issues JWT (HS256)
Client sends JWT to VaultPay on every request
VaultPay decodes JWT using shared JWT_SECRET_KEY
            ↓
    No database call to AuthShield
    No way to know if anything changed
    since the token was issued

That gap between "what the token says" and "what's actually true right now" is the entire problem this post is about.

The Fast Path

For most endpoints, VaultPay validates the JWT entirely locally.

async def get_current_user(token: str = Depends(oauth2_scheme)) -> UserContext:
    try:
        payload = jwt.decode(token, settings.JWT_SECRET_KEY, algorithms=["HS256"])
    except jwt.ExpiredSignatureError:
        raise TokenExpiredError()
    except jwt.InvalidTokenError:
        raise TokenInvalidError()

    # Check token hasn't been explicitly revoked (logout, etc.)
    if await redis.exists(f"revoked:{payload['jti']}"):
        raise TokenRevokedError()

    if not payload.get("is_active", True):
        raise AccountDisabledError()

    return UserContext(
        user_id=payload["user_id"],
        email=payload["email"],
        roles=payload["roles"],
    )

No network call to AuthShield. The decode is local, the revocation check is a Redis lookup, not a database query. This path handles sending money, viewing wallet balance, checking transaction history - the overwhelming majority of traffic.

The tradeoff is explicit and known: a user deactivated in AuthShield might still get one request through, until their token naturally expires or they're explicitly logged out via the revocation list. For most operations, that window is acceptable. A user viewing their own balance one extra time before their access catches up to reality isn't a meaningful risk.

But "most operations" isn't all operations.

The Strict Path

For high-stakes actions - a super admin promoting another user's role, deactivating an account, anything that changes the permission structure of the system itself - local validation isn't enough. VaultPay makes an HTTP call to AuthShield's /users/me endpoint to verify the token is still live against AuthShield's actual current state.

async def get_current_user_strict(
    token: str = Depends(oauth2_scheme),
    authshield: AuthShieldClient = Depends(get_authshield_client),
) -> UserContext:
    # First, validate locally as normal — catches expired/malformed tokens fast
    local_user = await get_current_user(token)

    # Then verify against AuthShield's live state
    try:
        live_user = await authshield.get_user(local_user.user_id, token)
    except AuthShieldUnavailableError:
        raise ServiceUnavailableError("Cannot verify session — AuthShield unreachable")

    if not live_user.is_active:
        raise AccountDisabledError()

    if live_user.roles != local_user.roles:
        raise RoleMismatchError("Role has changed since token issuance")

    return UserContext(
        user_id=live_user.user_id,
        email=live_user.email,
        roles=live_user.roles,
    )

This costs roughly 50ms in round-trip latency. For a super admin operation that happens rarely and carries real consequences if it executes on stale permissions, that 50ms is a fair trade. For sending money - something a user does dozens of times - it would be a needless tax on every request for a risk that doesn't justify it.

The decision of which path an endpoint uses isn't about how "important" the endpoint sounds. It's about how much damage a stale permission check could do, and how quickly that staleness needs to be caught.

Why This Is a Correctness Problem, Not Just a Performance One

It's tempting to frame dual-mode validation as "fast version for common stuff, slow-but-thorough version for sensitive stuff" - a pure performance tradeoff. That framing is incomplete.

The real distinction is between operations where eventual consistency is acceptable and operations where it isn't.

A user checking their balance with a slightly stale token isn't a correctness violation - their balance is still accurate, their identity hasn't changed, nothing about the operation depends on knowing about a change that happened in the last few seconds. The local check is not just fast, it's also correct for what the operation needs.

A super admin promoting a role, or an admin deactivating an account, is different. If that admin's own access was revoked sixty seconds ago, executing their request anyway isn't a performance compromise - it's the system doing the wrong thing. The local check would return a result, but the result would be wrong, because the world has changed since the token was issued and the operation cares about that change.

That's the reframe that mattered to me. Dual-mode validation isn't "give sensitive operations extra care because they're sensitive." It's "ask whether eventual consistency is acceptable for this specific operation, and only pay for strong consistency where it's actually required."

The 4-Tier Role Hierarchy

RBAC in VaultPay is a strict superset hierarchy - four roles, each one inheriting everything below it.

super_admin
    ↑
  admin           (can call AuthShield admin APIs)
    ↑
moderator         (can review KYC, freeze wallets)
    ↑
  user            (base — create wallet, transact)

A user can create a wallet, send money, set a PIN, submit KYC. A moderator can do all of that plus review KYC submissions and freeze or unfreeze any wallet. An admin adds the ability to edit transaction limits and deactivate users through AuthShield. A super_admin can create new admin accounts and access system-wide stats.

The roles live in the JWT claims, set by AuthShield at issuance. VaultPay never maintains its own copy of "what role does this user have" - it trusts the claim on the fast path, and re-verifies it against AuthShield on the strict path.

Enforcement is a FastAPI dependency, not scattered conditional checks:

@router.get("/admin/kyc")
async def list_kyc_submissions(
    kyc_service: KYCService = Depends(get_kyc_service),
    user: UserContext = Depends(require_roles(["moderator", "admin", "super_admin"])),
):
    return await kyc_service.list_pending(user)

require_roles() decodes the JWT, checks the role list against what's allowed, and raises a 403 if it doesn't match. Every privileged endpoint declares its required roles in the function signature. There's no separate place where role logic could drift out of sync with what the endpoint actually does — the check is part of the route definition itself.

What "No Shared Database" Actually Costs

The architecture deliberately keeps VaultPay from touching AuthShield's database directly. The only sanctioned paths are the JWT (issued once, trusted until expiry) and explicit HTTP calls through AuthShieldClient for anything that needs live state - getting a user's current details, deactivating an account, promoting a role.

class AuthShieldClient:
    base_url: str
    timeout: 10.0  # seconds

    async def get_user(self, user_id: UUID, token: str) -> AuthShieldUser: ...
    async def deactivate_user(self, user_id: UUID, admin_token: str) -> None: ...
    async def promote_user_role(self, user_id: UUID, role: str, admin_token: str) -> None: ...

This client raises AuthShieldUnavailableError on connection failures, timeouts, or non-2xx responses - and that error has to be handled deliberately wherever the strict path is used. If AuthShield is down, VaultPay can't verify live state for sensitive operations, and the honest response is to fail that specific request rather than silently falling back to the (potentially stale) local check. Falling back would defeat the entire reason the strict path exists.

This is the real cost of the no-shared-database boundary: every cross-service correctness guarantee has an explicit failure mode you have to design for, instead of a database transaction quietly handling it for you. The isolation that makes the two services independently deployable and independently scalable is the same isolation that makes "is this still true right now" an actual question instead of a given.

What I Actually Learned

I went into JWT validation thinking about it as one problem with one answer - decode the token, trust the claims, move fast. The deactivated-admin scenario is what split that into two different problems wearing the same costume.

Fast and slow validation aren't a spectrum from "less thorough" to "more thorough." They're answers to a more specific question: does this operation need to know about changes that happened after the token was issued? If no, local validation is not a compromise - it's the correct amount of checking. If yes, nothing short of asking the source of truth directly will actually be correct, and the latency cost is the price of that correctness, not a tax you're paying out of caution.

The 4-tier role hierarchy and the no-shared-database boundary both depend on getting this distinction right. A role system is only as trustworthy as the freshness of the check behind it. Building VaultPay's auth layer made it clear that "how fast should this check be" and "how correct does this check need to be" are two separate questions - and conflating them is how you end up with a deactivated admin who can still act like one.

This is the last post in the technical series for now. Auth secures identity. Consistency protects money in motion. Trust systems protect against unknown access. Idempotency protects against duplicate intent. Encryption and audit logs protect data people are trusting you with. And RBAC across two services protects the boundary between who someone was when they logged in and who they're allowed to be right now.

Every one of those is a different shape of the same underlying question: what does it actually mean for a system to be correct, not just functional.

Engineering docs + code samples: Vaultpay-Engineering

Top comments (1)

Pon • Jun 29

I'm taking the eventual-vs-strong framing with me -- that's the right axis, not sensitive-vs-not. One thing I'd poke at on the fast path: is_active in the JWT is frozen at issuance, so a deactivated admin still reads is_active=True there until the token expires. The Redis revocation check is what closes that window, which means the real load-bearing step is that deactivating a user in AuthShield writes the revocation entry synchronously, not just flips the DB column. If the table update lands but the revoke push lags or fails, your acceptable window stretches from seconds to whole-token-lifetime, and nothing in VaultPay can tell the difference. The strict path is immune because it asks the source; the fast path is only ever as fresh as that one write.