DEV Community

James Lee
James Lee

Posted on

xDS Protocol Deep Dive: The Universal Control Plane API Behind Envoy and Istio

When we talk about Istio or Envoy, we often hear terms like "dynamic configuration" and "control plane push." But what exactly is the underlying protocol that makes all of this work? The answer is xDS.

In this article, I'll do a deep dive into the xDS protocol — what it is, how it works, and why it matters beyond just Service Mesh.


What is xDS?

xDS is a family of discovery service APIs that allow a data plane proxy (like Envoy) to dynamically fetch configuration from a management server — without any restart or file reload.

The "x" in xDS is a wildcard. It covers:

API Full Name Purpose
LDS Listener Discovery Service Dynamically manage listeners (ports)
RDS Route Discovery Service Dynamically manage routing rules
CDS Cluster Discovery Service Dynamically manage upstream clusters
EDS Endpoint Discovery Service Dynamically manage cluster endpoints
SDS Secret Discovery Service Dynamically manage TLS certificates

Key insight: xDS is not just for Service Mesh. gRPC also uses xDS for service discovery. It defines a universal, extensible control API for microservices — any configuration can be resolved through discovery.


The Four Core Resources in Envoy

Each xDS type corresponds to a specific resource type. The type is stored in the TypeUrl field of every DiscoveryRequest and DiscoveryResponse, in the format:

type.googleapis.com/<resource type>
Enter fullscreen mode Exit fullscreen mode

For example: type.googleapis.com/envoy.api.v2.Cluster means this is a CDS (Cluster) resource.

1. LDS — Listener Discovery Service

A Listener is a port that Envoy opens to accept incoming connections. It can be configured with L3/L4 filters.

{
  "name": "...",
  "address": "{...}",
  "filter_chains": [],
  "listener_filters": [],
  "traffic_direction": "...",
  "access_log": []
}
Enter fullscreen mode Exit fullscreen mode

2. RDS — Route Discovery Service

Routes act as the bridge between Listeners and Clusters. They define traffic distribution rules, virtual hosts, header manipulation, timeouts, and retries.

{
  "name": "...",
  "virtual_hosts": [],
  "response_headers_to_add": [],
  "request_headers_to_add": [],
  "validate_clusters": "{...}"
}
Enter fullscreen mode Exit fullscreen mode

3. CDS — Cluster Discovery Service

A Cluster is an abstraction of an upstream service. It includes load balancing policy, health checks, circuit breaker config, and TLS settings.

{
  "name": "...",
  "type": "...",
  "eds_cluster_config": "{...}",
  "lb_policy": "...",
  "health_checks": [],
  "circuit_breakers": "{...}",
  "outlier_detection": "{...}"
}
Enter fullscreen mode Exit fullscreen mode

4. EDS — Endpoint Discovery Service

EDS is the actual service discovery layer. It returns the live endpoints (IP + port) for a given cluster.

{
  "cluster_name": "...",
  "endpoints": [],
  "policy": "{...}"
}
Enter fullscreen mode Exit fullscreen mode

5. SDS — Secret Discovery Service

SDS enables dynamic TLS certificate rotation without restarting Envoy. In early Istio versions, certificate updates required a hot restart — SDS eliminated that entirely.

{
  "name": "...",
  "tls_certificate": "{...}",
  "validation_context": "{...}"
}
Enter fullscreen mode Exit fullscreen mode

How xDS Works: gRPC Streaming Subscription

Early xDS used REST/JSON polling. Starting from v2, it switched to gRPC bidirectional streaming, which provides:

  • Lower latency for config updates
  • Better performance under high churn
  • Native support for ACK/NACK flow control

Request Flow for a Typical HTTP Route

The API request order follows the dependency chain:

LDS → RDS → CDS → EDS
Enter fullscreen mode Exit fullscreen mode
  1. Envoy fetches Listeners (LDS) to know which ports to open
  2. From the Listener config, it gets the Route name → fetches Routes (RDS)
  3. Routes reference Clusters → fetches Clusters (CDS)
  4. Clusters need live Endpoints → fetches Endpoints (EDS)

Full Subscription vs. Delta (Incremental) Subscription

Mode Behavior
Full (SotW) Management server returns all subscribed resources on every update
Delta (Incremental) Only changed resources are sent — much more efficient at scale

xDS Protocol Deep Dive

Request Message Example

version_info: ""          # empty = first request
node:
  id: envoy               # unique node identifier (e.g. hostname)
resource_names:
  - foo
  - bar
type_url: type.googleapis.com/envoy.api.v2.ClusterLoadAssignment
response_nonce: ""        # empty = first request
Enter fullscreen mode Exit fullscreen mode

Key fields:

  • version_info: empty on first request; subsequent requests echo the server's version
  • node.id: only required on the first message per stream
  • resource_names: the specific resources being subscribed to
  • response_nonce: used to correlate ACK/NACK with a specific server push

Response Message Example

version_info: X
resources:
  - foo ClusterLoadAssignment proto encoding
  - bar ClusterLoadAssignment proto encoding
type_url: type.googleapis.com/envoy.api.v2.ClusterLoadAssignment
nonce: A
Enter fullscreen mode Exit fullscreen mode

ACK and NACK

After receiving a push from the management server, Envoy responds with either:

  • ACK: Config applied successfully → sends back the new version_info
  • NACK: Config failed to apply → sends back the old version_info with an error detail

This gives the control plane full visibility into whether each Envoy instance has successfully adopted a new config.


Resource Update & The Nonce Problem

Here's where it gets interesting. Consider this scenario:

  1. Envoy subscribes to cluster foo (version X, nonce A)
  2. Management server pushes a new version Y (nonce B) because foo's endpoints changed
  3. At the same time, Envoy wants to add a new subscription to cluster bar

If Envoy sends a new DiscoveryRequest with version_info=X and resource_names=[foo, bar], the management server might misinterpret this as a NACK for version Y.

Solution: Nonce

The nonce uniquely identifies each push. Envoy's new subscription request carries nonce=A (from its last ACK), while the version Y push has nonce=B. The management server can distinguish between them unambiguously.

Istio's pragmatic approach: Istio's control plane (Pilot) doesn't strictly follow the nonce/version_info spec. Instead, it checks whether the resource_names (Clusters list) has actually changed. If yes, it treats the request as a resource update rather than ACK/NACK. Simpler and easier to reason about.


Summary

xDS is the backbone of Envoy's dynamic configuration system. Understanding it is essential for anyone working with Istio, Envoy-based gateways, or building custom control planes.

Key takeaways:

  • LDS → RDS → CDS → EDS is the standard dependency chain
  • gRPC streaming replaced REST polling for real-time, low-latency updates
  • Nonce solves the ACK/NACK ambiguity problem in concurrent update scenarios
  • SDS enables zero-downtime certificate rotation
  • xDS is not Istio-specific — gRPC and other frameworks use it too

💻 Explore the full Istio + Envoy implementation:
github.com/muzinan123/servicemesh

📖 Next in this series: Istio & Envoy Service Mesh Architecture

Top comments (0)