If you've tried Cortex or Mimir for multi-tenant Prometheus, you've hit the same wall: every client needs tenant headers. Your existing Grafana dashboards break. CLI tools need updates. API integrations require modification. Your proof-of-concept becomes a migration project.
There's a better approach: query-level access control with zero client changes.
The tenant header problem
Current multi-tenant solutions require the X-Scope-OrgID header on every request:
# Every client needs modification
curl -H "X-Scope-OrgID: team-alpha" http://prometheus:9090/api/v1/query?query=up
# Grafana datasources need tenant configuration
# CLI tools need wrapper scripts
# API clients need header injection
This breaks existing infrastructure and creates security vulnerabilities - headers can be spoofed, misconfigured, or simply forgotten.
Prerequisites
Before deploying OpenLBAC, ensure you have:
- Docker and Docker Compose installed
- An existing Prometheus instance (or willingness to deploy one)
- An OIDC provider with group claims (Keycloak, Okta, Auth0)
- Basic understanding of PromQL and observability concepts
Setup: Complete multi-tenant stack
Clone and deploy the full OpenLBAC stack with example Prometheus and Keycloak:
git clone https://github.com/openlbac/openlbac
cd openlbac
# Deploy complete stack: OpenLBAC + Prometheus + Keycloak
docker-compose up -d
This starts four OpenLBAC components:
# docker-compose.yml (relevant sections)
services:
lbac-server:
image: openlbac/lbac-server:latest
ports:
- "8090:8090"
environment:
- DATABASE_URL=postgresql://postgres:password@postgres:5432/openlbac
lbac-proxy:
image: openlbac/lbac-proxy:latest
ports:
- "8080:8080"
environment:
- LBAC_CORE_URL=http://lbac-core:9090
- UPSTREAM_PROMETHEUS_URL=http://prometheus:9090
lbac-core:
image: openlbac/lbac-core:latest
ports:
- "9090:9090"
environment:
- LBAC_SERVER_URL=http://lbac-server:8090
prometheus:
image: prom/prometheus:latest
ports:
- "9091:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
Configuration: OIDC integration
Step 1: Configure OIDC provider
For Keycloak (included in docker-compose):
# Access Keycloak admin console
open http://localhost:8081
# Default credentials: admin/admin
# Create realm: "observability"
# Create groups: "platform-team", "backend-team", "frontend-team"
# Create users and assign group memberships
Step 2: Configure OpenLBAC OIDC
Update the OIDC configuration in config/oidc.yml:
oidc:
provider_url: "http://localhost:8081/realms/observability"
client_id: "openlbac-client"
client_secret: "your-client-secret"
scopes: ["openid", "profile", "groups"]
groups_claim: "groups"
policies:
- name: "platform-team"
groups: ["platform-team"]
rules:
- label: "namespace"
operator: "=~"
values: ["production|staging|development"]
- name: "backend-team"
groups: ["backend-team"]
rules:
- label: "namespace"
operator: "="
values: ["production"]
- label: "service"
operator: "=~"
values: ["api|database|cache"]
- name: "frontend-team"
groups: ["frontend-team"]
rules:
- label: "namespace"
operator: "="
values: ["production"]
- label: "service"
operator: "="
values: ["web"]
Testing: Query rewriting in action
Step 1: Verify policy enforcement
Test different user contexts to see query rewriting:
# Platform team member (full access)
curl -H "Authorization: Bearer $PLATFORM_TOKEN" \
"http://localhost:8080/api/v1/query?query=up"
# Backend team member (service-filtered)
curl -H "Authorization: Bearer $BACKEND_TOKEN" \
"http://localhost:8080/api/v1/query?query=up"
# Frontend team member (web service only)
curl -H "Authorization: Bearer $FRONTEND_TOKEN" \
"http://localhost:8080/api/v1/query?query=up"
Step 2: Observe automatic query rewriting
Original query:
rate(http_requests_total{status="200"}[5m])
Rewritten for backend-team:
rate(http_requests_total{status="200", namespace="production", service=~"api|database|cache"}[5m])
Rewritten for frontend-team:
rate(http_requests_total{status="200", namespace="production", service="web"}[5m])
Step 3: Test with existing tools
Point your existing tools to OpenLBAC instead of Prometheus directly:
# Grafana datasource configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
data:
prometheus.yml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://lbac-proxy:8080 # Changed from prometheus:9090
access: proxy
jsonData:
httpHeaderName1: "Authorization"
secureJsonData:
httpHeaderValue1: "Bearer $GRAFANA_OIDC_TOKEN"
Your existing dashboards work unchanged. CLI tools work unchanged. The only change is the endpoint URL.
Optimization: Production deployment
Horizontal scaling
Scale proxy instances for high throughput:
# docker-compose.override.yml
services:
lbac-proxy:
deploy:
replicas: 3
nginx:
image: nginx:alpine
ports:
- "8080:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
Nginx configuration for load balancing:
upstream lbac_proxies {
server lbac-proxy-1:8080;
server lbac-proxy-2:8080;
server lbac-proxy-3:8080;
}
server {
listen 80;
location / {
proxy_pass http://lbac_proxies;
proxy_set_header Authorization $http_authorization;
}
}
Monitoring and alerting
Monitor OpenLBAC components:
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'lbac-proxy'
static_configs:
- targets: ['lbac-proxy:8080']
metrics_path: '/metrics'
- job_name: 'lbac-core'
static_configs:
- targets: ['lbac-core:9090']
metrics_path: '/metrics'
Key metrics to monitor:
-
lbac_proxy_query_duration_seconds- Query rewriting latency -
lbac_proxy_queries_total- Request rate and error rate -
lbac_core_policy_updates_total- Policy propagation health -
lbac_core_connected_proxies- Proxy connectivity status
Audit and compliance
Enable audit logging for compliance frameworks:
# config/audit.yml
audit:
enabled: true
sink_type: "elasticsearch"
elasticsearch:
endpoint: "http://elasticsearch:9200"
index: "openlbac-audit"
events:
- "query_execution"
- "policy_violation"
- "authentication_failure"
- "authorization_decision"
Each query execution logs:
- User identity and group membership
- Original query and rewritten query
- Data sources accessed
- Timestamp and request metadata
Troubleshooting common issues
Issue: JWT token validation fails
Symptoms: 401 Unauthorized responses from lbac-proxy
Solution: Verify OIDC configuration and token format:
# Check token structure
echo $TOKEN | base64 -d | jq .
# Verify groups claim exists
# Ensure issuer matches provider_url
# Check client_id in token audience
Issue: Queries return empty results
Symptoms: Valid queries return no data after policy application
Solution: Check policy rule logic and label presence:
# Verify labels exist in your metrics
curl http://prometheus:9090/api/v1/labels
# Test policy rules manually
curl "http://localhost:8090/api/v1/policies/test" \
-d '{"query": "up", "user_groups": ["backend-team"]}'
Issue: High query latency
Symptoms: Increased response times after OpenLBAC deployment
Solution: Optimize policy complexity and caching:
# config/performance.yml
cache:
policy_cache_ttl: 300s
query_cache_ttl: 60s
optimization:
max_rule_complexity: 10
parallel_policy_evaluation: true
OpenLBAC adds <5ms latency in most configurations. Higher latencies indicate policy complexity issues or network bottlenecks.
Why this approach works better
Zero client modification: Existing tools continue working unchanged. No headers to manage or spoof.
Real-time enforcement: Policies apply at query execution, not just dashboard level. API access and CLI tools automatically controlled.
Enterprise IdP integration: Leverage existing Keycloak, Okta, or Auth0 groups. No custom authentication development required.
Comprehensive audit: Every query decision logged for PCI DSS, GDPR, and SOC 2 compliance requirements.
Five-minute deployment: Docker Compose to production-ready in minutes, not weeks of client migration.
The tenant header approach assumes you can modify every client. The query rewriting approach assumes you can't - and works anyway.
What's your experience with multi-tenant observability? Have you hit the tenant header wall, or found other approaches that preserve existing integrations?
Top comments (0)