Keycloak tm1

#architecture #cybersecurity #kubernetes

Scope and assumptions

Deployment: Keycloak running in a Kubernetes pod(s) inside a cluster, reachable only via an internal REST API gateway (no direct external access).
Purpose: Keycloak is the authentication provider issuing JWTs consumed by other applications after successful authentication.
Components considered: Keycloak pods, ingress/internal API gateway, Service/ClusterIP, ConfigMaps/Secrets, PersistentVolumes (if used for DB/backups), underlying K8s control plane (API server, etcd), network policies, and client applications that accept JWTs.
Threat model method: STRIDE applied to data flows and components; each threat lists likely MITRE ATT&CK techniques (TIDs described conceptually), mitigations, and operational challenges.

Data flow (high level)

User/client → API gateway (TLS) → Keycloak (internal REST) for authentication.
Keycloak authenticates user (userstore, social/LDAP, or federated IdP) and issues a JWT.
Client presents JWT to other services behind the gateway; services validate token signature, claims, expiry, etc.
Keycloak contacts backing services (database, user federation, SMTP, token revocation/INTrospect endpoints).

STRIDE threats, MITRE technique mappings, and mitigations

Spoofing (identity impersonation)

Threats:
- Attacker forges requests to Keycloak or the gateway impersonating internal services or users.
- Compromised service account or API key used to call Keycloak admin endpoints.
- Stolen admin credentials or session cookies allow console access.
MITRE techniques (examples):
- Valid Accounts; Credential Access; Abuse of Credentials; Use of Compromised Accounts.
Mitigations:
- Mutual TLS between API gateway and Keycloak; enforce client certs for internal service-to-service calls.
- Use short-lived, scoped service accounts (K8s) with minimal RBAC; avoid long-lived static tokens.
- Enforce strong MFA for Keycloak admin console and privileged APIs.
- Rotate and store secrets in a secure vault; do not store admin credentials in ConfigMaps.
- Log and alert on anomalous admin or service-account activity.
Challenges:
- Managing mTLS certificates lifecycle across pods and gateways.
- Ensuring zero-trust internal network posture without breaking legacy flows.

Tampering (modification of data, configs, or code)

Threats:
- Modification of JWT validation logic, Keycloak configuration, or themes via compromised image or malicious ConfigMap.
- Tampering with token storage or refresh token flows (e.g., modifying DB entries).
- Man-in-the-middle altering JWTs in transit if TLS misconfigured.
MITRE techniques:
- Modify System Image; Hijack Execution Flow; Manipulate Data; Ingress Tool Transfer.
Mitigations:
- Use image signing and image policy admission controllers (e.g., in-toto, Sigstore, Notary).
- Protect ConfigMaps/Secrets: mark secrets, mount-only where required, use KMS-backed SecretStores.
- Immutable infrastructure patterns: avoid manual in-cluster edits; use GitOps for config changes with PR reviews.
- Enable and enforce TLS for all in-cluster traffic; use network policies to limit egress/ingress.
- Run Keycloak with read-only filesystem where possible and minimal container privileges.
Challenges:
- Deploying and operating image signing and admission controllers reliably.
- Migrating legacy operational patterns to GitOps and immutable configs.

Repudiation (deny actions were taken)

Threats:
- Lack of reliable audit logs for authentication events, admin changes, token issuance/revocation.
- Log tampering or loss (e.g., attacker deletes or modifies logs in pod).
MITRE techniques:
- Indicator Removal on Host; Log Deletion or Manipulation.
Mitigations:
- Centralize and append-only store logs to an external SIEM or logging cluster (use TLS + authentication to ingest).
- Enable Keycloak audit logging (admin events, user events) and secure log pipeline with integrity checks.
- Retain immutable audit trails (WORM) for a defined retention period.
- Use kubernetes audit logging for K8s API server activity; monitor for suspicious RBAC changes.
Challenges:
- Cost and complexity of secure, immutable logging.
- Ensuring logs contain necessary context (JWT IDs, request IDs) without leaking sensitive data.

Information Disclosure (exposure of sensitive data)

Threats:
- Secrets (client secrets, signing keys, database credentials) exposed in etcd, ConfigMaps, or logs.
- JWTs leaked via insecure storage, logs, or overly permissive CORS on services consuming JWTs.
- Backup or snapshot exposure containing Keycloak data.
MITRE techniques:
- Exfiltration Over Alternative Protocols; Data from Information Repositories.
Mitigations:
- Encrypt secrets at rest (etcd encryption, KMS) and in transit. Restrict etcd access to control plane only.
- Use hardware-backed key management for JWT signing keys (HSM or cloud KMS). Rotate signing keys with overlap/rollover strategy.
- Avoid logging full JWTs or sensitive claims; mask or redact tokens in logs.
- Apply least privilege to RBAC and network policies; restrict access to backups and PVs.
- Limit token lifetime and scope; use audience and issuer claims strictly.
Challenges:
- Key rotation without breaking token validation across multiple services.
- Ensuring third-party federated IdPs adhere to same secrecy standards.

Denial of Service (availability)

Threats:
- High authentication request volume (DDoS) to Keycloak or API gateway causing outage.
- Resource exhaustion in the pod (CPU/memory) or backing DB leading to failed auths.
- Misconfigured liveness/readiness causing cascading restarts during spikes.
MITRE techniques:
- Network Denial of Service; Resource Hijacking; Service Stop.
Mitigations:
- Put rate-limiting / throttling at the API gateway; per-IP and per-client quotas.
- Autoscale Keycloak horizontally with proper session affinity handling and a robust database tier (connection pooling).
- Use resource requests/limits and QoS classes in Kubernetes; reserve node capacity for critical auth components.
- Implement circuit breakers and graceful degradation for downstream services if auth unavailable.
- Monitor realistic SLA and create alerts for auth latency/availability anomalies.
Challenges:
- Statefulness of sessions and single-signer JWT handling complicate horizontal scaling.
- Balancing rate limits to block abuse without blocking legitimate burst traffic.

Elevation of Privilege (gain higher privileges)

Threats:
- Exploits in Keycloak or its dependencies allowing admin privilege escalation or remote code execution.
- Misconfigured RBAC or overly broad client roles allowing privilege abuse.
- Compromised container allowing access to K8s node credentials or host filesystem.
MITRE techniques:
- Exploit Public-Facing Application; Privilege Escalation; Abuse Elevated Permissions.
Mitigations:
- Keep Keycloak and dependencies patched; subscribe to security advisories and have a patching process.
- Harden container runtime: run as non-root, drop Linux capabilities, use seccomp and AppArmor/SELinux profiles.
- Implement least-privilege RBAC both in Keycloak (clients, roles) and Kubernetes (ServiceAccounts).
- Use Pod Security Policies / OPA/Gatekeeper policies to prevent privileged pods or hostPath mounts.
- Scan images and run vulnerability scanning in CI; block known vulnerable images from deployment.
Challenges:
- Timely patching in environments requiring high stability.
- Legacy integrations that require elevated permissions.

Additional JWT-specific threats and mitigations

Threat: Replay of stolen JWTs.
- Mitigation: Short JWT lifetime; use refresh tokens with rotation and detection; include jti and nonce and optional token revocation lists or introspection for high-sensitivity flows.
- Challenge: Performance/complexity of token introspection at scale; balancing stateless JWT benefits vs revocation needs.
Threat: JWT signature algorithm downgrade or misconfiguration (e.g., none algorithm or weak key).
- Mitigation: Enforce strong algorithms (RS256/ES256), validate "alg", rotate keys securely, use KMS/HSM for private keys.
- Challenge: Coordinating key rollover across services and clients.
Threat: Incorrect claim validation (audience, issuer, expiry).
- Mitigation: Standardize validation library usage; publish JWKS endpoint securely behind gateway; enforce claim checks in all consuming services.
- Challenge: Legacy clients may accept tokens leniently.

Mapping to MITRE ATT&CK: practical examples

Credential theft → Valid Accounts / Credential Access: attacker steals Keycloak admin password or service account token.
Lateral movement → Internal Spearphishing / Use of Valid Accounts: compromised pod uses cluster network to call Keycloak admin APIs.
Persistence → Create or Modify System Process: attacker modifies startup to maintain access.
Defense evasion → Credential dumping; log deletion: attacker tampers with Keycloak logs.

Mitigation tiers and prioritized recommendations

Preventive (highest priority)
- Enforce mTLS between gateway and Keycloak; restrict Keycloak Service to internal cluster network only.
- Protect signing keys in KMS/HSM and rotate keys; enable etcd encryption and KMS-backed secrets.
- Harden containers (non-root, seccomp, read-only FS) and enforce admission policies for images.
- Strong RBAC and MFA for admin operations; no static admin credentials in code.
Detective
- Centralized immutable logging and alerting for admin events, auth anomalies, suspicious token issuance, and configuration changes.
- Runtime monitoring: anomalous authentication rates, failed logins, privilege changes.
Responsive
- Token revocation & introspection capability and incident runbooks for compromised keys/accounts.
- Rapid patching and rollback procedures; blue-green or canary for breaking changes.

Operational challenges and residual risks

Key rotation: coordinating signing key rotation so currently valid JWTs remain accepted while moving to new keys requires design (JWKS with key identifiers and overlap).
Token revocation vs statelessness: adding revocation/introspection reintroduces state and latency; must be engineered for scale.
Internal trust assumptions: "internal-only" network can be breached; insider threats and lateral movement remain a major risk.
Performance vs security trade-offs: strict validation, introspection, and logging add overhead; must be balanced with SLAs.
Third-party identity providers: security posture of federated IdPs affects the whole system; limited control over external IdP vulnerabilities.
Secret sprawl: multiple clients and environments increase key/secret management complexity.

Practical checklist (actionable)