DEV Community

Cover image for Building a Stateless Multi-Realm Auth Service: Why Your Auth Gateway Doesn't Need a Database
Laith Al Enooz
Laith Al Enooz

Posted on

Building a Stateless Multi-Realm Auth Service: Why Your Auth Gateway Doesn't Need a Database

The Problem: Why Most Auth Services Are Tightly Coupled

Picture this: you're building a platform that serves multiple applications—a mobile app, a web dashboard, and maybe a partner API. Each needs its own authentication realm in Keycloak, but your auth service is hardcoded to a single realm. To support multiple apps, you'd typically:

  1. Deploy separate auth services for each realm (wasteful)
  2. Store realm configurations in a database (unnecessary complexity)
  3. Build an admin UI to manage realm mappings (even more code to maintain)

What if I told you there's a better way? One that requires zero database, zero configuration management, and scales horizontally without any server-side state?

The Insight: Realm and Client as Request Parameters

Think of your auth service like a translator at the UN. The translator doesn't need to remember which language each delegate speaks—they tell you in real-time. Similarly, your auth service doesn't need to store which realm each application uses; the application tells you with every request.

This is the core insight behind the stateless multi-realm architecture: treat realm and client credentials as request-scoped parameters, not server configuration.

// Traditional approach: hardcoded configuration
type AuthService struct {
    keycloakURL   string
    realm         string  // ❌ Server-wide configuration
    clientID      string  // ❌ Fixed at startup
    clientSecret  string  // ❌ All apps share same client
}

// Stateless approach: per-request parameters
type LoginRequest struct {
    RealmName    string `header:"X-Realm-Name"`     // ✅ Dynamic per request
    ClientID     string `header:"X-Client-Id"`      // ✅ App specifies its client
    ClientSecret string `header:"X-Client-Secret"`  // ✅ No shared secrets
    Username     string `json:"username"`
    Password     string `json:"password"`
}
Enter fullscreen mode Exit fullscreen mode

Domain Modeling: Making Realm a First-Class Citizen

The shift from server configuration to request parameters is more than a technical detail—it's a domain modeling decision that fundamentally changes how your service operates.

The Gateway Pattern

Instead of maintaining user state or realm mappings, the service becomes a pure gateway:

┌─────────────┐
│   Mobile    │─┐
│     App     │ │  X-Realm-Name: mobile-realm
└─────────────┘ │  X-Client-Id: mobile-app
                │
┌─────────────┐ │  ┌──────────────────┐      ┌──────────────┐
│     Web     │─┼─→│  Auth Service    │─────→│  Keycloak    │
│  Dashboard  │ │  │  (Stateless)     │←─────│ (Source of   │
└─────────────┘ │  └──────────────────┘      │   Truth)     │
                │       ↕ Redis                └──────────────┘
┌─────────────┐ │    (Caching Only)
│  Partner    │─┘
│     API     │    X-Realm-Name: partner-realm
└─────────────┘    X-Client-Id: partner-client
Enter fullscreen mode Exit fullscreen mode

Notice what's not in this diagram: there's no database, no admin interface, no configuration service. The auth service is genuinely stateless—it can be scaled up or down instantly without any coordination.

Real-World Usage: How It Works

Let's see how different applications use the same service instance:

Mobile App Login

curl -X POST https://auth.example.com/api/v1/auth/login \
  -H "X-Realm-Name: mobile-realm" \
  -H "X-Client-Id: mobile-app" \
  -H "X-Client-Secret: mobile-secret-xyz" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "user@example.com",
    "password": "secure-password"
  }'
Enter fullscreen mode Exit fullscreen mode

Web Dashboard Login (Same Service!)

curl -X POST https://auth.example.com/api/v1/auth/login \
  -H "X-Realm-Name: company-realm" \
  -H "X-Client-Id: web-dashboard" \
  -H "X-Client-Secret: web-secret-abc" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "admin@company.com",
    "password": "admin-password"
  }'
Enter fullscreen mode Exit fullscreen mode

Same endpoint, same service instance, different realms—no configuration needed.

Architecture Deep Dive

1. The Stateless Keycloak Client

The heart of the service is a Keycloak client that builds request contexts dynamically:

type KeycloakClient struct {
    baseURL     string
    httpClient  *http.Client
    cache       cache.Cache
    // Note: NO realm, NO clientID stored here
}

func (kc *KeycloakClient) Login(ctx context.Context, req LoginRequest) (*TokenResponse, error) {
    // Build realm-specific URL from request parameters
    tokenURL := fmt.Sprintf("%s/realms/%s/protocol/openid-connect/token",
        kc.baseURL,
        req.RealmName,
    )

    // Use client credentials from request (not from config)
    params := url.Values{
        "client_id":     {req.ClientID},
        "client_secret": {req.ClientSecret},
        "username":      {req.Username},
        "password":      {req.Password},
        "grant_type":    {"password"},
    }

    // Execute request with tracing context
    return kc.makeRequest(ctx, tokenURL, params)
}
Enter fullscreen mode Exit fullscreen mode

2. Intelligent Caching Strategy

Even though the service is stateless, we still use Redis for performance, not for state management:

// Cache key includes realm context
cacheKey := fmt.Sprintf("token:%s:%s:%s", 
    req.RealmName, 
    req.ClientID, 
    req.Username,
)

// Check cache first
if cachedToken, err := kc.cache.Get(ctx, cacheKey); err == nil {
    return cachedToken, nil
}

// Cache miss: fetch from Keycloak
token, err := kc.fetchFromKeycloak(ctx, req)
if err != nil {
    return nil, err
}

// Cache with TTL slightly less than token expiry
ttl := time.Duration(token.ExpiresIn) * time.Second - 30*time.Second
kc.cache.Set(ctx, cacheKey, token, ttl)
Enter fullscreen mode Exit fullscreen mode

Important distinction: The cache is an optimization, not a requirement. If Redis goes down, the service continues to function—it just makes more calls to Keycloak.

3. Observability: Realm-Aware Tracing

OpenTelemetry tracing automatically includes realm context:

func (kc *KeycloakClient) makeRequest(ctx context.Context, url string, params url.Values) (*Response, error) {
    ctx, span := kc.tracer.Start(ctx, "keycloak.request",
        trace.WithAttributes(
            attribute.String("realm.name", params.Get("realm_name")),
            attribute.String("client.id", params.Get("client_id")),
            attribute.String("operation", "token"),
        ),
    )
    defer span.End()

    // Make HTTP request with propagated context
    resp, err := kc.httpClient.Do(req.WithContext(ctx))

    // Record metrics by realm
    metrics.RecordRequestDuration(ctx, time.Since(start),
        "realm", params.Get("realm_name"),
        "client", params.Get("client_id"),
    )

    return resp, err
}
Enter fullscreen mode Exit fullscreen mode

This gives you distributed tracing across realm boundaries in tools like Jaeger:

HTTP POST /api/v1/auth/login [realm=mobile-realm, client=mobile-app] (120ms)
  ├─ Cache.Get [key=token:mobile-realm:mobile-app:user] (2ms) MISS
  ├─ Keycloak.GetToken [realm=mobile-realm] (95ms)
  └─ Cache.Set [ttl=3570s] (3ms)
Enter fullscreen mode Exit fullscreen mode

Production Readiness: The Complete Package

Dual Interface: gRPC + REST

// Same business logic, two interfaces
type Server struct {
    keycloakClient *keycloak.Client
    cache          cache.Cache
    metrics        *metrics.Collector
}

// gRPC endpoint
func (s *Server) Login(ctx context.Context, req *pb.LoginRequest) (*pb.LoginResponse, error) {
    return s.keycloakClient.Login(ctx, toLoginRequest(req))
}

// HTTP endpoint (Gin handler)
func (s *Server) HandleLogin(c *gin.Context) {
    var req LoginRequest

    // Extract realm/client from headers
    req.RealmName = c.GetHeader("X-Realm-Name")
    req.ClientID = c.GetHeader("X-Client-Id")
    req.ClientSecret = c.GetHeader("X-Client-Secret")

    if err := c.ShouldBindJSON(&req); err != nil {
        c.JSON(400, gin.H{"error": "invalid request"})
        return
    }

    token, err := s.keycloakClient.Login(c.Request.Context(), req)
    if err != nil {
        c.JSON(401, gin.H{"error": "authentication failed"})
        return
    }

    c.JSON(200, token)
}
Enter fullscreen mode Exit fullscreen mode

Health Checks with Dependency Monitoring

type HealthChecker struct {
    keycloakClient *keycloak.Client
    cache          cache.Cache
    lastCheck      time.Time
    cachedStatus   *HealthStatus
    mu             sync.RWMutex
}

func (h *HealthChecker) Check(ctx context.Context) *HealthStatus {
    status := &HealthStatus{
        Service: "healthy",
        Dependencies: make(map[string]DependencyStatus),
    }

    // Check Keycloak (source of truth)
    if err := h.keycloakClient.HealthCheck(ctx); err != nil {
        status.Dependencies["keycloak"] = DependencyStatus{
            Status:  "unhealthy",
            Message: err.Error(),
        }
        status.Service = "degraded"
    } else {
        status.Dependencies["keycloak"] = DependencyStatus{
            Status: "healthy",
        }
    }

    // Check Redis (optional dependency)
    if err := h.cache.Ping(ctx); err != nil {
        status.Dependencies["redis"] = DependencyStatus{
            Status:  "unhealthy",
            Message: "cache unavailable (service will continue without caching)",
        }
        // Note: Service remains healthy even if cache is down
    } else {
        status.Dependencies["redis"] = DependencyStatus{
            Status: "healthy",
        }
    }

    return status
}
Enter fullscreen mode Exit fullscreen mode

Graceful Shutdown

func main() {
    // Setup servers
    httpServer := setupHTTPServer()
    grpcServer := setupGRPCServer()

    // Graceful shutdown handling
    stop := make(chan os.Signal, 1)
    signal.Notify(stop, os.Interrupt, syscall.SIGTERM)

    go func() {
        if err := httpServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatal("HTTP server error:", err)
        }
    }()

    go func() {
        if err := grpcServer.Serve(listener); err != nil {
            log.Fatal("gRPC server error:", err)
        }
    }()

    <-stop
    log.Println("Shutting down gracefully...")

    // Give in-flight requests time to complete
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    httpServer.Shutdown(ctx)
    grpcServer.GracefulStop()

    log.Println("Servers stopped")
}
Enter fullscreen mode Exit fullscreen mode

Performance Characteristics

Horizontal Scalability

Because the service is truly stateless:

Load Balancer
     ↓
┌────┴────┐
│  Pod 1  │  ← Can scale from 1 to 100 pods instantly
├─────────┤
│  Pod 2  │  ← No coordination needed between pods
├─────────┤
│  Pod 3  │  ← No sticky sessions required
└─────────┘
     ↓
  Keycloak
Enter fullscreen mode Exit fullscreen mode

Caching Strategy

Request Flow with Cache:

1. Request arrives → Check cache (2ms)
   ├─ Hit (90% of requests) → Return cached token (total: 2ms)
   └─ Miss (10% of requests) → Fetch from Keycloak (100ms) → Cache (3ms) (total: 103ms)

Result: Average response time = (0.9 × 2ms) + (0.1 × 103ms) = 12.1ms
Enter fullscreen mode Exit fullscreen mode

Load Testing Results

# 1000 concurrent users, 10,000 requests
wrk -t12 -c1000 -d30s --latency \
  -H "X-Realm-Name: test-realm" \
  -H "X-Client-Id: test-client" \
  -H "X-Client-Secret: secret" \
  http://localhost:8080/api/v1/auth/login

# Results:
Latency Distribution
  50%    11ms
  75%    15ms
  90%    22ms
  99%    45ms

Requests/sec: 8,234
Transfer/sec: 2.1MB
Enter fullscreen mode Exit fullscreen mode

When to Use This Pattern

✅ Perfect For:

  • Multi-tenant platforms where each tenant has its own Keycloak realm
  • Microservices architectures where services need to scale independently
  • Cloud-native deployments where you want instant scalability
  • Cost-sensitive environments where reducing infrastructure is important
  • High-availability requirements where eliminating single points of failure matters

❌ Not Ideal For:

  • Single-realm applications (though it still works, you're adding complexity without benefit)
  • Services that need complex user data beyond what Keycloak provides (at that point, you probably need a user service)
  • Scenarios with extremely high request rates where even Redis latency is too much (consider in-memory caching with careful cache coherency strategies)

The Trade-offs: What You're Really Giving Up

Let's be honest about the constraints:

1. Client Secret in Headers

Trade-off: Client secrets in request headers means more data on the wire.

Mitigation:

  • Use TLS everywhere (you should anyway)
  • Client secrets aren't user passwords—they're app credentials
  • Consider header compression in your load balancer

2. No Custom User Metadata

Trade-off: You can't easily store custom user metadata beyond what Keycloak supports.

Mitigation:

  • Use Keycloak's user attributes (they're quite flexible)
  • If you need complex user profiles, build a separate user service
  • This service focuses on authentication, not user management

3. Trust in Keycloak

Trade-off: Keycloak becomes a critical dependency.

Mitigation:

  • Deploy Keycloak in HA mode (you should anyway)
  • The service continues to work with cached tokens even if Keycloak has brief outages
  • Monitor Keycloak health and set up proper alerting

Getting Started

Prerequisites

# Required
- Go 1.25+
- Redis (for caching)
- Keycloak server

# Optional (for observability)
- Jaeger or any OTLP-compatible collector
- Prometheus
- Grafana
Enter fullscreen mode Exit fullscreen mode

Quick Start

# Clone the repo
git clone https://github.com/laithalenooz/auth-service-go
cd auth-service-go

# Set up environment
cp .env.example .env
# Edit .env with your Keycloak details

# Start with Docker Compose (includes Redis, Keycloak, Jaeger, Prometheus, Grafana)
docker-compose up -d

# Run the service
make run
Enter fullscreen mode Exit fullscreen mode

Your First Request

# Create a user in Keycloak's master realm
curl -X POST http://localhost:8080/api/v1/auth/register \
  -H "X-Realm-Name: master" \
  -H "X-Client-Id: auth-service" \
  -H "X-Client-Secret: your-client-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "testuser",
    "email": "test@example.com",
    "password": "password123",
    "first_name": "Test",
    "last_name": "User"
  }'

# Login
curl -X POST http://localhost:8080/api/v1/auth/login \
  -H "X-Realm-Name: master" \
  -H "X-Client-Id: auth-service" \
  -H "X-Client-Secret: your-client-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "username": "testuser",
    "password": "password123"
  }'
Enter fullscreen mode Exit fullscreen mode

Observability in Action

Jaeger Traces

Visit http://localhost:16686 to see distributed traces:

Each trace shows:

  • Which realm was accessed
  • Which client made the request
  • Cache hit/miss
  • Keycloak response time
  • Total request duration

Prometheus Metrics

# Authentication success rate by realm
sum(rate(auth_login_success_total[5m])) by (realm_name, client_id)

# Cache hit rate
sum(rate(cache_hits_total[5m])) / sum(rate(cache_requests_total[5m]))

# Request latency by realm (p95)
histogram_quantile(0.95, 
  sum(rate(http_request_duration_seconds_bucket[5m])) by (realm_name, le)
)
Enter fullscreen mode Exit fullscreen mode

Grafana Dashboards

The project includes pre-built dashboards showing:

  • Request rate and latency by realm
  • Authentication success/failure rates
  • Cache performance
  • Service health status
  • Keycloak response times

What I Learned Building This

1. Statelessness is Liberating

Once you stop trying to maintain state, a lot of complexity disappears. No database migrations, no cache coherency issues, no distributed locks. The service becomes a pure function: requests go in, responses come out.

2. Domain Modeling Matters More Than Technology

The decision to treat realm and client as request parameters wasn't primarily a technical decision—it was a domain modeling insight. Understanding that "realm context" belongs to the request, not the server, simplified everything else.

3. Observability Isn't Optional

In a stateless system where each request might hit a different realm, comprehensive tracing is the only way to debug issues. OpenTelemetry tracing paid for itself within the first week.

4. Caching for Performance, Not Correctness

Using Redis as a pure performance optimization (not a source of truth) means you can reason about the system with or without cache. This makes testing easier and reduces the blast radius when something goes wrong.

Conclusion: Rethinking Authentication Layers

Most authentication services are designed around the assumption that they need to "know" about their users and clients. By inverting this—making the caller provide the context—we eliminate entire classes of complexity:

  • No configuration management
  • No database to maintain
  • No synchronization between instances
  • No limits on horizontal scaling

The result is a service that does one thing well: intelligently proxy authentication requests to Keycloak, with caching and observability included.

If you're building a multi-tenant platform or a microservices architecture, consider whether your authentication layer actually needs to store anything. You might be able to delete more code than you write.


GitHub: laithalenooz/auth-service-go

Tech Stack: Go, gRPC, Gin, Redis, Keycloak, OpenTelemetry, Prometheus

Found this useful? Star the repo and let me know what you think in the comments!

Questions? I'm happy to discuss the architecture, trade-offs, or help with implementation details.

Top comments (0)