When we talk about Istio or Envoy, we often hear terms like "dynamic configuration" and "control plane push." But what exactly is the underlying protocol that makes all of this work? The answer is xDS.
In this article, I'll do a deep dive into the xDS protocol — what it is, how it works, and why it matters beyond just Service Mesh.
What is xDS?
xDS is a family of discovery service APIs that allow a data plane proxy (like Envoy) to dynamically fetch configuration from a management server — without any restart or file reload.
The "x" in xDS is a wildcard. It covers:
| API | Full Name | Purpose |
|---|---|---|
| LDS | Listener Discovery Service | Dynamically manage listeners (ports) |
| RDS | Route Discovery Service | Dynamically manage routing rules |
| CDS | Cluster Discovery Service | Dynamically manage upstream clusters |
| EDS | Endpoint Discovery Service | Dynamically manage cluster endpoints |
| SDS | Secret Discovery Service | Dynamically manage TLS certificates |
Key insight: xDS is not just for Service Mesh. gRPC also uses xDS for service discovery. It defines a universal, extensible control API for microservices — any configuration can be resolved through discovery.
The Four Core Resources in Envoy
Each xDS type corresponds to a specific resource type. The type is stored in the TypeUrl field of every DiscoveryRequest and DiscoveryResponse, in the format:
type.googleapis.com/<resource type>
For example: type.googleapis.com/envoy.api.v2.Cluster means this is a CDS (Cluster) resource.
1. LDS — Listener Discovery Service
A Listener is a port that Envoy opens to accept incoming connections. It can be configured with L3/L4 filters.
{
"name": "...",
"address": "{...}",
"filter_chains": [],
"listener_filters": [],
"traffic_direction": "...",
"access_log": []
}
2. RDS — Route Discovery Service
Routes act as the bridge between Listeners and Clusters. They define traffic distribution rules, virtual hosts, header manipulation, timeouts, and retries.
{
"name": "...",
"virtual_hosts": [],
"response_headers_to_add": [],
"request_headers_to_add": [],
"validate_clusters": "{...}"
}
3. CDS — Cluster Discovery Service
A Cluster is an abstraction of an upstream service. It includes load balancing policy, health checks, circuit breaker config, and TLS settings.
{
"name": "...",
"type": "...",
"eds_cluster_config": "{...}",
"lb_policy": "...",
"health_checks": [],
"circuit_breakers": "{...}",
"outlier_detection": "{...}"
}
4. EDS — Endpoint Discovery Service
EDS is the actual service discovery layer. It returns the live endpoints (IP + port) for a given cluster.
{
"cluster_name": "...",
"endpoints": [],
"policy": "{...}"
}
5. SDS — Secret Discovery Service
SDS enables dynamic TLS certificate rotation without restarting Envoy. In early Istio versions, certificate updates required a hot restart — SDS eliminated that entirely.
{
"name": "...",
"tls_certificate": "{...}",
"validation_context": "{...}"
}
How xDS Works: gRPC Streaming Subscription
Early xDS used REST/JSON polling. Starting from v2, it switched to gRPC bidirectional streaming, which provides:
- Lower latency for config updates
- Better performance under high churn
- Native support for ACK/NACK flow control
Request Flow for a Typical HTTP Route
The API request order follows the dependency chain:
LDS → RDS → CDS → EDS
- Envoy fetches Listeners (LDS) to know which ports to open
- From the Listener config, it gets the Route name → fetches Routes (RDS)
- Routes reference Clusters → fetches Clusters (CDS)
- Clusters need live Endpoints → fetches Endpoints (EDS)
Full Subscription vs. Delta (Incremental) Subscription
| Mode | Behavior |
|---|---|
| Full (SotW) | Management server returns all subscribed resources on every update |
| Delta (Incremental) | Only changed resources are sent — much more efficient at scale |
xDS Protocol Deep Dive
Request Message Example
version_info: "" # empty = first request
node:
id: envoy # unique node identifier (e.g. hostname)
resource_names:
- foo
- bar
type_url: type.googleapis.com/envoy.api.v2.ClusterLoadAssignment
response_nonce: "" # empty = first request
Key fields:
-
version_info: empty on first request; subsequent requests echo the server's version -
node.id: only required on the first message per stream -
resource_names: the specific resources being subscribed to -
response_nonce: used to correlate ACK/NACK with a specific server push
Response Message Example
version_info: X
resources:
- foo ClusterLoadAssignment proto encoding
- bar ClusterLoadAssignment proto encoding
type_url: type.googleapis.com/envoy.api.v2.ClusterLoadAssignment
nonce: A
ACK and NACK
After receiving a push from the management server, Envoy responds with either:
- ✅ ACK: Config applied successfully → sends back the new
version_info - ❌ NACK: Config failed to apply → sends back the old
version_infowith an error detail
This gives the control plane full visibility into whether each Envoy instance has successfully adopted a new config.
Resource Update & The Nonce Problem
Here's where it gets interesting. Consider this scenario:
- Envoy subscribes to cluster
foo(version X, nonce A) - Management server pushes a new version Y (nonce B) because
foo's endpoints changed -
At the same time, Envoy wants to add a new subscription to cluster
bar
If Envoy sends a new DiscoveryRequest with version_info=X and resource_names=[foo, bar], the management server might misinterpret this as a NACK for version Y.
Solution: Nonce
The nonce uniquely identifies each push. Envoy's new subscription request carries nonce=A (from its last ACK), while the version Y push has nonce=B. The management server can distinguish between them unambiguously.
Istio's pragmatic approach: Istio's control plane (Pilot) doesn't strictly follow the nonce/version_info spec. Instead, it checks whether the
resource_names(Clusters list) has actually changed. If yes, it treats the request as a resource update rather than ACK/NACK. Simpler and easier to reason about.
Summary
xDS is the backbone of Envoy's dynamic configuration system. Understanding it is essential for anyone working with Istio, Envoy-based gateways, or building custom control planes.
Key takeaways:
- LDS → RDS → CDS → EDS is the standard dependency chain
- gRPC streaming replaced REST polling for real-time, low-latency updates
- Nonce solves the ACK/NACK ambiguity problem in concurrent update scenarios
- SDS enables zero-downtime certificate rotation
- xDS is not Istio-specific — gRPC and other frameworks use it too
💻 Explore the full Istio + Envoy implementation:
github.com/muzinan123/servicemesh📖 Next in this series: Istio & Envoy Service Mesh Architecture
Top comments (0)