DEV Community

Cover image for Building an IAM Service with FastAPI: Refresh Token Families, TOTP MFA, and RBAC
cypher682
cypher682

Posted on

Building an IAM Service with FastAPI: Refresh Token Families, TOTP MFA, and RBAC

AuthCore is a production-style Identity and Access Management API built with FastAPI, PostgreSQL, Redis, Celery, Alembic, Docker, GitHub Actions, and Trivy.

The goal was not another "JWT login" tutorial project. The goal was to build a security-sensitive backend service with enough real behavior to force decisions about token lifecycle, failure handling, audit discipline, and operational evidence.

This is a writeup of the implementation — what was built, what decisions were made, and where the edge cases live.


Architecture

Client
  │
  ▼
FastAPI app (Uvicorn)
  │── PostgreSQL  — users, sessions, token families, MFA configs, RBAC, audit logs
  │── Redis       — rate limit counters, failed-login counters, lockout state
  │── Celery      — background task foundation (email delivery)
  ▼
GitHub Actions CI
  ├── Ruff lint
  ├── Black format check
  ├── Docker Compose tests (with Postgres + Redis services)
  ├── Coverage gate (80%)
  ├── Docker image build
  └── Trivy critical vulnerability scan
Enter fullscreen mode Exit fullscreen mode

Auth routes: POST /api/v1/auth/register, /auth/login, /auth/refresh
MFA routes: POST /api/v1/auth/mfa/setup, /mfa/verify, /mfa/disable, /mfa/challenge/verify
Session routes: GET /api/v1/users/me/sessions, DELETE /api/v1/users/me/sessions/{session_id}
Admin routes: RBAC management, audit log queries — superuser-only


The Auth Flow

Registration:

  1. Normalize email.
  2. Enforce password policy (min 12 chars, lowercase, uppercase, number, symbol, rejects common weak passwords and email-username substrings).
  3. If breach checking is enabled, check password against HaveIBeenPwned using k-anonymity (first 5 SHA-1 hash characters only — raw password never leaves the system).
  4. Hash with bcrypt.
  5. Create user.
  6. Create initial session.
  7. Issue access token + refresh token.
  8. Write audit event.

Login:

  1. Check Redis lockout state. If locked: 423 Locked.
  2. Verify password. On failure: increment Redis counter, write failure audit event, return 401.
  3. On success: clear failure counter.
  4. If MFA is disabled: issue tokens immediately.
  5. If MFA is enabled: return a short-lived MFA challenge token instead. Full tokens only after TOTP verification.

Refresh Token Rotation and Reuse Detection

Access tokens are short-lived. Refresh tokens live longer, which means they need stricter handling.

AuthCore tracks refresh tokens by token family. Each family has one valid token at a time — the latest issued one. The previous tokens in the family are invalidated on rotation.

The rotation flow:

  1. Client presents a refresh token.
  2. AuthCore verifies the token signature.
  3. AuthCore checks that the presented token hash matches the latest hash stored for that family.
  4. If it matches: issue a new access token and refresh token, store new hash.
  5. If it does not match — an older refresh token was reused:
    • Revoke the entire family.
    • Write an audit event.
    • Return 401.

The implementation detail that matters here: failure audit events that raise HTTPException are explicitly committed before raising. This matters because the FastAPI exception flow would otherwise roll back the transaction, silently losing the audit record of suspicious behavior.

# Commit the audit event before raising — do not let the rollback erase evidence
await session.commit()
raise HTTPException(status_code=401, detail="Refresh token reuse detected")
Enter fullscreen mode Exit fullscreen mode

MFA: Separating Password Verification from Full Authentication

When MFA is enabled, a successful password verification is not enough to issue tokens.

Instead, login returns a short-lived MFA challenge token — not a bearer token. The client must then call:

POST /api/v1/auth/mfa/challenge/verify
Body: { "challenge_token": "...", "totp_code": "123456" }
Enter fullscreen mode Exit fullscreen mode

The challenge token is signed with a short TTL and cannot be used to call any other endpoint. Only after valid TOTP verification does AuthCore issue access and refresh tokens.

This separates "password is correct" from "authentication is complete." Users who have enabled MFA cannot bypass it by replaying a login request.

The MFA setup flow:

  1. POST /mfa/setup — creates a pending MFA config, returns the TOTP provisioning URI (for QR code display). Secret is masked in evidence screenshots.
  2. POST /mfa/verify — validates the first TOTP code, marks MFA enabled.
  3. POST /mfa/disable — requires a valid current TOTP code before disabling.

Sessions

AuthCore creates a session record on every register and login. Each session stores:

  • IP address
  • User agent string
  • Device fingerprint (derived from IP + user agent)
  • last_active timestamp
  • expires_at
  • Concurrent session limit with pruning of oldest sessions

Users can list their active sessions and revoke any session by ID. Session revocation writes an audit event — giving users visibility and control over active access, which is a baseline requirement for any serious IAM system.


RBAC: Roles, Permissions, and Superuser Bypass

The RBAC model:

User → UserRole → Role → RolePermission → Permission
Enter fullscreen mode Exit fullscreen mode

The permission check is a FastAPI dependency injected on protected routes:

@router.get("/rbac/permission-check")
async def check_permission(
    user: User = Depends(require_permission("admin:manage")),
):
    ...
Enter fullscreen mode Exit fullscreen mode

Superusers bypass named permission checks entirely. Normal users need explicit permission assignment through roles. All admin mutations write audit events.


Rate Limiting and Brute-Force Lockout

Two separate controls:

Route-level rate limiting (SlowAPI): configurable per-route limits. Exceeded: 429 Too Many Requests.

Account and IP lockout (Redis): each failed login increments a Redis counter keyed by email and IP. After the configured threshold: 423 Locked. Clears automatically after TTL.

The choice to use Redis for lockout counters (not PostgreSQL) is deliberate: these are fast-changing, ephemeral values that do not need durability.

Evidence flow: 401 on bad password → multiple 401423 on lockout → 429 if the route rate limit is hit.


Password Policy and Breach Checking

Local policy: minimum 12 characters, lowercase + uppercase + digit + symbol required, rejects common weak passwords and email-username substrings.

Optional HaveIBeenPwned check (disabled by default): uses k-anonymity — only the first 5 characters of the SHA-1 hash are sent to the API. Matching suffixes are compared locally. The raw password never leaves the system.


Email Verification and Password Reset

Both flows use the same security pattern: the raw token is never stored.

  1. Generate a random token.
  2. Store only the SHA-256 hash in PostgreSQL, with an expiry.
  3. Pass the raw token to the Celery email task (foundation implemented — external SMTP delivery not yet wired).
  4. Client presents the raw token. AuthCore hashes it and compares.

For password reset: all existing refresh token families are revoked after reset, preventing an attacker with a captured refresh token from continuing to use it after a password change.


Implementation Notes Worth Calling Out

Direct bcrypt instead of Passlib: Passlib's backend probe had compatibility issues with bcrypt under Python 3.13. The implementation uses bcrypt directly — behavior is identical.

SQLAlchemy metadata column conflict: metadata is a reserved attribute on SQLAlchemy's declarative base. Fix: map the column to a different Python-side name while keeping the database column name unchanged.

event_metadata: Mapped[dict] = mapped_column("metadata", JSON, nullable=True)
Enter fullscreen mode Exit fullscreen mode

Testing and CI

18 tests, 81% coverage. Docker-first — tests run inside containers with a real Postgres and Redis:

tests/test_security.py       — bcrypt, JWT, TOTP helpers
tests/test_auth.py           — register, login, refresh, reuse detection
tests/test_mfa.py            — setup, verify, disable
tests/test_sessions.py       — session listing and revocation
tests/test_rbac_admin.py     — role/permission creation and assignment
tests/test_audit.py          — audit log queries
tests/test_lockout.py        — brute-force lockout behavior
tests/test_email_flows.py    — verification and reset token flows
Enter fullscreen mode Exit fullscreen mode

CI pipeline: Ruff lint → Black format check → Docker Compose tests with Alembic migrations → coverage gate (80%) → Docker image build (multi-stage, non-root runtime user) → Trivy critical vulnerability scan with SARIF upload.



Repo

https://github.com/cypher682/authcore-service

Evidence (screenshots of every flow): https://github.com/cypher682/authcore-service/tree/main/docs/evidence

Top comments (0)