The Problem: Why Most Auth Services Are Tightly Coupled
Picture this: you're building a platform that serves multiple applications—a mobile app, a web dashboard, and maybe a partner API. Each needs its own authentication realm in Keycloak, but your auth service is hardcoded to a single realm. To support multiple apps, you'd typically:
- Deploy separate auth services for each realm (wasteful)
- Store realm configurations in a database (unnecessary complexity)
- Build an admin UI to manage realm mappings (even more code to maintain)
What if I told you there's a better way? One that requires zero database, zero configuration management, and scales horizontally without any server-side state?
The Insight: Realm and Client as Request Parameters
Think of your auth service like a translator at the UN. The translator doesn't need to remember which language each delegate speaks—they tell you in real-time. Similarly, your auth service doesn't need to store which realm each application uses; the application tells you with every request.
This is the core insight behind the stateless multi-realm architecture: treat realm and client credentials as request-scoped parameters, not server configuration.
// Traditional approach: hardcoded configuration
type AuthService struct {
keycloakURL string
realm string // ❌ Server-wide configuration
clientID string // ❌ Fixed at startup
clientSecret string // ❌ All apps share same client
}
// Stateless approach: per-request parameters
type LoginRequest struct {
RealmName string `header:"X-Realm-Name"` // ✅ Dynamic per request
ClientID string `header:"X-Client-Id"` // ✅ App specifies its client
ClientSecret string `header:"X-Client-Secret"` // ✅ No shared secrets
Username string `json:"username"`
Password string `json:"password"`
}
Domain Modeling: Making Realm a First-Class Citizen
The shift from server configuration to request parameters is more than a technical detail—it's a domain modeling decision that fundamentally changes how your service operates.
The Gateway Pattern
Instead of maintaining user state or realm mappings, the service becomes a pure gateway:
┌─────────────┐
│ Mobile │─┐
│ App │ │ X-Realm-Name: mobile-realm
└─────────────┘ │ X-Client-Id: mobile-app
│
┌─────────────┐ │ ┌──────────────────┐ ┌──────────────┐
│ Web │─┼─→│ Auth Service │─────→│ Keycloak │
│ Dashboard │ │ │ (Stateless) │←─────│ (Source of │
└─────────────┘ │ └──────────────────┘ │ Truth) │
│ ↕ Redis └──────────────┘
┌─────────────┐ │ (Caching Only)
│ Partner │─┘
│ API │ X-Realm-Name: partner-realm
└─────────────┘ X-Client-Id: partner-client
Notice what's not in this diagram: there's no database, no admin interface, no configuration service. The auth service is genuinely stateless—it can be scaled up or down instantly without any coordination.
Real-World Usage: How It Works
Let's see how different applications use the same service instance:
Mobile App Login
curl -X POST https://auth.example.com/api/v1/auth/login \
-H "X-Realm-Name: mobile-realm" \
-H "X-Client-Id: mobile-app" \
-H "X-Client-Secret: mobile-secret-xyz" \
-H "Content-Type: application/json" \
-d '{
"username": "user@example.com",
"password": "secure-password"
}'
Web Dashboard Login (Same Service!)
curl -X POST https://auth.example.com/api/v1/auth/login \
-H "X-Realm-Name: company-realm" \
-H "X-Client-Id: web-dashboard" \
-H "X-Client-Secret: web-secret-abc" \
-H "Content-Type: application/json" \
-d '{
"username": "admin@company.com",
"password": "admin-password"
}'
Same endpoint, same service instance, different realms—no configuration needed.
Architecture Deep Dive
1. The Stateless Keycloak Client
The heart of the service is a Keycloak client that builds request contexts dynamically:
type KeycloakClient struct {
baseURL string
httpClient *http.Client
cache cache.Cache
// Note: NO realm, NO clientID stored here
}
func (kc *KeycloakClient) Login(ctx context.Context, req LoginRequest) (*TokenResponse, error) {
// Build realm-specific URL from request parameters
tokenURL := fmt.Sprintf("%s/realms/%s/protocol/openid-connect/token",
kc.baseURL,
req.RealmName,
)
// Use client credentials from request (not from config)
params := url.Values{
"client_id": {req.ClientID},
"client_secret": {req.ClientSecret},
"username": {req.Username},
"password": {req.Password},
"grant_type": {"password"},
}
// Execute request with tracing context
return kc.makeRequest(ctx, tokenURL, params)
}
2. Intelligent Caching Strategy
Even though the service is stateless, we still use Redis for performance, not for state management:
// Cache key includes realm context
cacheKey := fmt.Sprintf("token:%s:%s:%s",
req.RealmName,
req.ClientID,
req.Username,
)
// Check cache first
if cachedToken, err := kc.cache.Get(ctx, cacheKey); err == nil {
return cachedToken, nil
}
// Cache miss: fetch from Keycloak
token, err := kc.fetchFromKeycloak(ctx, req)
if err != nil {
return nil, err
}
// Cache with TTL slightly less than token expiry
ttl := time.Duration(token.ExpiresIn) * time.Second - 30*time.Second
kc.cache.Set(ctx, cacheKey, token, ttl)
Important distinction: The cache is an optimization, not a requirement. If Redis goes down, the service continues to function—it just makes more calls to Keycloak.
3. Observability: Realm-Aware Tracing
OpenTelemetry tracing automatically includes realm context:
func (kc *KeycloakClient) makeRequest(ctx context.Context, url string, params url.Values) (*Response, error) {
ctx, span := kc.tracer.Start(ctx, "keycloak.request",
trace.WithAttributes(
attribute.String("realm.name", params.Get("realm_name")),
attribute.String("client.id", params.Get("client_id")),
attribute.String("operation", "token"),
),
)
defer span.End()
// Make HTTP request with propagated context
resp, err := kc.httpClient.Do(req.WithContext(ctx))
// Record metrics by realm
metrics.RecordRequestDuration(ctx, time.Since(start),
"realm", params.Get("realm_name"),
"client", params.Get("client_id"),
)
return resp, err
}
This gives you distributed tracing across realm boundaries in tools like Jaeger:
HTTP POST /api/v1/auth/login [realm=mobile-realm, client=mobile-app] (120ms)
├─ Cache.Get [key=token:mobile-realm:mobile-app:user] (2ms) MISS
├─ Keycloak.GetToken [realm=mobile-realm] (95ms)
└─ Cache.Set [ttl=3570s] (3ms)
Production Readiness: The Complete Package
Dual Interface: gRPC + REST
// Same business logic, two interfaces
type Server struct {
keycloakClient *keycloak.Client
cache cache.Cache
metrics *metrics.Collector
}
// gRPC endpoint
func (s *Server) Login(ctx context.Context, req *pb.LoginRequest) (*pb.LoginResponse, error) {
return s.keycloakClient.Login(ctx, toLoginRequest(req))
}
// HTTP endpoint (Gin handler)
func (s *Server) HandleLogin(c *gin.Context) {
var req LoginRequest
// Extract realm/client from headers
req.RealmName = c.GetHeader("X-Realm-Name")
req.ClientID = c.GetHeader("X-Client-Id")
req.ClientSecret = c.GetHeader("X-Client-Secret")
if err := c.ShouldBindJSON(&req); err != nil {
c.JSON(400, gin.H{"error": "invalid request"})
return
}
token, err := s.keycloakClient.Login(c.Request.Context(), req)
if err != nil {
c.JSON(401, gin.H{"error": "authentication failed"})
return
}
c.JSON(200, token)
}
Health Checks with Dependency Monitoring
type HealthChecker struct {
keycloakClient *keycloak.Client
cache cache.Cache
lastCheck time.Time
cachedStatus *HealthStatus
mu sync.RWMutex
}
func (h *HealthChecker) Check(ctx context.Context) *HealthStatus {
status := &HealthStatus{
Service: "healthy",
Dependencies: make(map[string]DependencyStatus),
}
// Check Keycloak (source of truth)
if err := h.keycloakClient.HealthCheck(ctx); err != nil {
status.Dependencies["keycloak"] = DependencyStatus{
Status: "unhealthy",
Message: err.Error(),
}
status.Service = "degraded"
} else {
status.Dependencies["keycloak"] = DependencyStatus{
Status: "healthy",
}
}
// Check Redis (optional dependency)
if err := h.cache.Ping(ctx); err != nil {
status.Dependencies["redis"] = DependencyStatus{
Status: "unhealthy",
Message: "cache unavailable (service will continue without caching)",
}
// Note: Service remains healthy even if cache is down
} else {
status.Dependencies["redis"] = DependencyStatus{
Status: "healthy",
}
}
return status
}
Graceful Shutdown
func main() {
// Setup servers
httpServer := setupHTTPServer()
grpcServer := setupGRPCServer()
// Graceful shutdown handling
stop := make(chan os.Signal, 1)
signal.Notify(stop, os.Interrupt, syscall.SIGTERM)
go func() {
if err := httpServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatal("HTTP server error:", err)
}
}()
go func() {
if err := grpcServer.Serve(listener); err != nil {
log.Fatal("gRPC server error:", err)
}
}()
<-stop
log.Println("Shutting down gracefully...")
// Give in-flight requests time to complete
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
httpServer.Shutdown(ctx)
grpcServer.GracefulStop()
log.Println("Servers stopped")
}
Performance Characteristics
Horizontal Scalability
Because the service is truly stateless:
Load Balancer
↓
┌────┴────┐
│ Pod 1 │ ← Can scale from 1 to 100 pods instantly
├─────────┤
│ Pod 2 │ ← No coordination needed between pods
├─────────┤
│ Pod 3 │ ← No sticky sessions required
└─────────┘
↓
Keycloak
Caching Strategy
Request Flow with Cache:
1. Request arrives → Check cache (2ms)
├─ Hit (90% of requests) → Return cached token (total: 2ms)
└─ Miss (10% of requests) → Fetch from Keycloak (100ms) → Cache (3ms) (total: 103ms)
Result: Average response time = (0.9 × 2ms) + (0.1 × 103ms) = 12.1ms
Load Testing Results
# 1000 concurrent users, 10,000 requests
wrk -t12 -c1000 -d30s --latency \
-H "X-Realm-Name: test-realm" \
-H "X-Client-Id: test-client" \
-H "X-Client-Secret: secret" \
http://localhost:8080/api/v1/auth/login
# Results:
Latency Distribution
50% 11ms
75% 15ms
90% 22ms
99% 45ms
Requests/sec: 8,234
Transfer/sec: 2.1MB
When to Use This Pattern
✅ Perfect For:
- Multi-tenant platforms where each tenant has its own Keycloak realm
- Microservices architectures where services need to scale independently
- Cloud-native deployments where you want instant scalability
- Cost-sensitive environments where reducing infrastructure is important
- High-availability requirements where eliminating single points of failure matters
❌ Not Ideal For:
- Single-realm applications (though it still works, you're adding complexity without benefit)
- Services that need complex user data beyond what Keycloak provides (at that point, you probably need a user service)
- Scenarios with extremely high request rates where even Redis latency is too much (consider in-memory caching with careful cache coherency strategies)
The Trade-offs: What You're Really Giving Up
Let's be honest about the constraints:
1. Client Secret in Headers
Trade-off: Client secrets in request headers means more data on the wire.
Mitigation:
- Use TLS everywhere (you should anyway)
- Client secrets aren't user passwords—they're app credentials
- Consider header compression in your load balancer
2. No Custom User Metadata
Trade-off: You can't easily store custom user metadata beyond what Keycloak supports.
Mitigation:
- Use Keycloak's user attributes (they're quite flexible)
- If you need complex user profiles, build a separate user service
- This service focuses on authentication, not user management
3. Trust in Keycloak
Trade-off: Keycloak becomes a critical dependency.
Mitigation:
- Deploy Keycloak in HA mode (you should anyway)
- The service continues to work with cached tokens even if Keycloak has brief outages
- Monitor Keycloak health and set up proper alerting
Getting Started
Prerequisites
# Required
- Go 1.25+
- Redis (for caching)
- Keycloak server
# Optional (for observability)
- Jaeger or any OTLP-compatible collector
- Prometheus
- Grafana
Quick Start
# Clone the repo
git clone https://github.com/laithalenooz/auth-service-go
cd auth-service-go
# Set up environment
cp .env.example .env
# Edit .env with your Keycloak details
# Start with Docker Compose (includes Redis, Keycloak, Jaeger, Prometheus, Grafana)
docker-compose up -d
# Run the service
make run
Your First Request
# Create a user in Keycloak's master realm
curl -X POST http://localhost:8080/api/v1/auth/register \
-H "X-Realm-Name: master" \
-H "X-Client-Id: auth-service" \
-H "X-Client-Secret: your-client-secret" \
-H "Content-Type: application/json" \
-d '{
"username": "testuser",
"email": "test@example.com",
"password": "password123",
"first_name": "Test",
"last_name": "User"
}'
# Login
curl -X POST http://localhost:8080/api/v1/auth/login \
-H "X-Realm-Name: master" \
-H "X-Client-Id: auth-service" \
-H "X-Client-Secret: your-client-secret" \
-H "Content-Type: application/json" \
-d '{
"username": "testuser",
"password": "password123"
}'
Observability in Action
Jaeger Traces
Visit http://localhost:16686 to see distributed traces:
Each trace shows:
- Which realm was accessed
- Which client made the request
- Cache hit/miss
- Keycloak response time
- Total request duration
Prometheus Metrics
# Authentication success rate by realm
sum(rate(auth_login_success_total[5m])) by (realm_name, client_id)
# Cache hit rate
sum(rate(cache_hits_total[5m])) / sum(rate(cache_requests_total[5m]))
# Request latency by realm (p95)
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (realm_name, le)
)
Grafana Dashboards
The project includes pre-built dashboards showing:
- Request rate and latency by realm
- Authentication success/failure rates
- Cache performance
- Service health status
- Keycloak response times
What I Learned Building This
1. Statelessness is Liberating
Once you stop trying to maintain state, a lot of complexity disappears. No database migrations, no cache coherency issues, no distributed locks. The service becomes a pure function: requests go in, responses come out.
2. Domain Modeling Matters More Than Technology
The decision to treat realm and client as request parameters wasn't primarily a technical decision—it was a domain modeling insight. Understanding that "realm context" belongs to the request, not the server, simplified everything else.
3. Observability Isn't Optional
In a stateless system where each request might hit a different realm, comprehensive tracing is the only way to debug issues. OpenTelemetry tracing paid for itself within the first week.
4. Caching for Performance, Not Correctness
Using Redis as a pure performance optimization (not a source of truth) means you can reason about the system with or without cache. This makes testing easier and reduces the blast radius when something goes wrong.
Conclusion: Rethinking Authentication Layers
Most authentication services are designed around the assumption that they need to "know" about their users and clients. By inverting this—making the caller provide the context—we eliminate entire classes of complexity:
- No configuration management
- No database to maintain
- No synchronization between instances
- No limits on horizontal scaling
The result is a service that does one thing well: intelligently proxy authentication requests to Keycloak, with caching and observability included.
If you're building a multi-tenant platform or a microservices architecture, consider whether your authentication layer actually needs to store anything. You might be able to delete more code than you write.
GitHub: laithalenooz/auth-service-go
Tech Stack: Go, gRPC, Gin, Redis, Keycloak, OpenTelemetry, Prometheus
Found this useful? Star the repo and let me know what you think in the comments!
Questions? I'm happy to discuss the architecture, trade-offs, or help with implementation details.
Top comments (0)