We operate a backend API stack on GCP. Currently:
Varnish handles caching + SourceIP authentication.
(IP refers to the infrastructure servers’ IPs, while Source is a URL parameter passed in API calls that identifies the property.)Both Varnish and backend run on the same GCP server.
As we scale for resilience and performance, we need to implement:
Authentication & Validation
Route requests to the correct backend.
Enforce API key + SourceIP-based auth.
Validate identities via parameters:
Source (URL param, used in backend calls)
IP (frontend server IPs)
USR-ID (end-user ID, from URL param)
Rate Limiting
Based on Source + IP + USR-ID.
Alert at x% of the threshold. Block at y% (API Gateway) or z% (Varnish).
Configurable block durations.
Circuit Breaking / Backend Protection
Stop routing on backend failure.
Note: In Varnish, we use a heartbeat health-check mechanism that is similar to a circuit breaker, but not a true implementation
Logging & Observability
Track rate-limit breaches, blocks, circuit-breaking events, and request metadata.
Alerts on abnormal traffic or backend failures.
Options We’re Considering
- API Gateway (New Development)
- Central entry point for traffic.
- Handles auth, routing, rate limiting, logging, and observability.
- Centralised logic → easier management as APIs grow.
- Adds an extra hop, but increases visibility + maintainability.
- Enhanced Varnish (Current, with Modifications)
- Already deployed per server.
- Would need manual updates for: rate limiting (Source/IP/USR-ID), logging, and backend protection.
- No true circuit breaker, but can serve stale cache or block during backend downtime.
Key Questions
Centralisation vs Distribution: Better to centralise controls in an API Gateway, or enhance Varnish on each server??
Performance & Maintenance: Does the extra hop of an API Gateway outweigh its benefits in observability and control??
Scaling with Varnish: How do you avoid config drift and manage scaling in a Varnish-based setup??
Deployment Topology: Should Varnish and backend run on the same GCP server, or be separated for resilience??
Real-World Experiences: Has anyone migrated from Varnish-based controls to an API Gateway?? What worked, what didn’t??
Looking for:
- Real-world experiences
- Best practices
- Resource recommendations
Top comments (0)