Vito Tumas for RippleX Developers

Posted on Jun 26

To Squelch or not to Squelch? Optimising XRP Ledger Validator Communication

#xrpledger #networking #optimisation

The XRP Ledger is expanding. As the number of nodes and validators joining the network grows, the Ledger is becoming increasingly resilient and robust. But this success brings a critical challenge: a rising tide of network traffic that, if left unmanaged, can strain the resources of every node operator.

While innovations like Zero-Knowledge Proofs, Real-World Assets, and DeFi take the spotlight, the essential networking that underpins them is often overlooked. It's the plumbing of the digital house: invisible and forgotten until you turn on the shower and the once-mighty stream weakens to a frustrating trickle. Similarly, a blockchain's networking is invisible until its performance degrades to unacceptable levels of cost and delay.

Thus, as a preemptive measure to ensure XRP Ledger performance does not falter, we propose optimising the XRP Ledger validator communication. In this article, we will examine the algorithms and steps we are taking to maintain high water pressure.

Background

Before we dive into the algorithm details, let's briefly recap how communication works in the XRP Ledger. The XRP Ledger network consists of interconnected servers (nodes) running the rippled client. A special subset of these nodes are validators, participating directly in the consensus process. Each node maintains several connections to other nodes, known as its peers. Note that these peers are a small subset of all network nodes.

To achieve consensus, validators constantly exchange two critical types of messages, which we'll refer to collectively as "validator messages":

Proposals: Messages that contain a set of transactions to be included in the next ledger. Validators use these to agree on a common transaction set.
Validations: Messages that serve as a final confirmation, ensuring that a specific ledger version has been agreed upon.

Since validator messages don't have a single destination, they are relayed from node to node across the network. The efficiency of this relay mechanism is critical for the health and performance of the entire XRP Ledger.

Current State: The Great Flood

Currently, the XRP Ledger employs a flooding (or broadcasting) algorithm to disseminate validator messages. Here's how it operates:

When a node receives a message from a peer, it first checks if it has encountered this specific message before.

If the message is new, the node:

Caches the message to recognise future duplicates.
Processes the message as required.
Forwards the message to every peer except the one who sent it.

If the message is a duplicate, The node drops it.

The flooding algorithm has a few distinct advantages:

High Reliability: As long as every node has at least one active peer, flooding ensures that all nodes eventually receive every message.
Effectiveness: A flooded message will traverse every possible path through the network. Consequently, it is guaranteed to travel the fastest possible route from the sender to every other node.

However, flooding also has a significant drawback: it is highly inefficient. Because a message travels through all available paths, each node receives the same message multiple times. For example, if a node has 30 peers, it will receive 30 copies of the same validator message—one from each peer. This redundancy is the price paid for reliability, and its scale can be surprising.

To understand the scale of this inefficiency, let's consider some typical figures for the XRP Ledger:

Network & Message Parameters

Total Message Size	All Validators	UNL Validators	Daily Ledgers	Nodes	Connections
432 bytes	203	35	20,000	1,015	~13,000

Traffic Calculations under Flooding

Note: These calculations assume one proposal and one validation message per validator per ledger. In reality, validators often produce multiple proposals due to differing transaction sets. This assumption provides a conservative lower bound for comparison.

Each validator generates 432 bytes per ledger. Together, all validators generate:
- 203 validators × 432 bytes/validator = 87.7 KB per ledger
Under flooding, each peer connection effectively carries 87.7 KB of data for each ledger. Per day, each connection transfers:
- 87.7 KB/ledger/connection × 20,000 ledgers/day = 1.754 GB per connection per day
Collectively, all connections transfer:
- 1.754 GB/connection/day × 13,000 total connections = 22.8 TB per day
Each of the 1,015 nodes processes this unique set of messages. Therefore, the 'useful' portion of the total traffic is:
- 1.754 GB/node/day × 1,015 nodes = approx. 1.8 TB per Day

Comparing useful traffic to total traffic:

(1.8 TB unique / 22.8 TB total) × 100% = 7.8%, meaning that 92.2% of validator traffic is redundant.

The bright tomorrow: Squelching

The journey to optimise XRP Ledger's validator traffic began in 2020 with an algorithm called "Squelching." The foundational code for this feature has been part of rippled since version 1.7.0 (released in 2020) and discussed in a previous blog post: Message Routing Optimizations Pt 1: Proposal & Validation Relaying.

Before we explore the specific mechanics of the squelching algorithm, let's consider the name itself.

Squelch (noun):

A soft sucking sound made when pressure is applied to liquid or mud.
A circuit that suppresses the output of a radio receiver if the signal strength falls below a certain level.

While the first definition offers a more amusing image, the second is remarkably apt for our algorithm. It directly points to the core mechanism Base Squelching uses to reduce duplicate traffic: suppression. The underlying philosophy is that a server should decide what traffic it wants to receive from its peers and have a mechanism to suppress traffic it is not interested in.

The squelching algorithm enables a server to select a subset of peers as sources of validator messages and suppress the remaining peers, thereby significantly reducing the duplicate messages it receives. However, despite its availability, this initial version of Squelching did not see widespread adoption across the network. From our observations, we learned about a few key limitations:

Its benefits are only realised when most servers enable the feature.
It primarily focused on reducing duplicate traffic from trusted validators.
Crucially, it did not address the growing volume of messages from the ever-increasing number of untrusted validators.

This last point is particularly important. The recent growth in untrusted validators, while welcome, has increased the processing load on nodes, highlighting the need for a more comprehensive solution.

With these lessons in mind, we have revisited the original squelching implementation to introduce two key optimisations:

"Base Squelching" is an improved version of the original squelching algorithm, designed to suppress duplicate traffic from both trusted and untrusted validators, thereby ensuring improved interoperability across the network.
"Enhanced Squelching" is a new, complementary algorithm designed to reduce the volume of unique untrusted validator messages.

Base Squelching

Base Squelching is an improved algorithm designed to drastically reduce duplicate validator messages from all sources—trusted and untrusted. It allows each node to intelligently select its information sources, ensuring seamless operation even when connected to peers that don't use this new logic.

How Base Squelching Works

Base Squelching works through a continuous process of source selection and suppression, managed by each node for each validator individually. Think of it as each node constantly interviewing its peers to find the most reliable messengers for a specific validator (let's call it Validator X).

Here's a detailed look at the process:

1. Initial Learning Phase – Monitoring All Peers

Initially, a node listens to all its peers, receiving messages from Validator X from each of them. For each peer, the node maintains a counter that it increments every time it gets a message from Validator X via that peer.

2. Identifying Potential Sources – The Consideration List

Once a peer has successfully delivered a certain number of unique messages from Validator X, that peer is deemed a potentially reliable source and is added to a "consideration list" for Validator X. The message threshold introduces a tradeoff; if the threshold is low, peers will be squelched faster, but at the cost of less information about their reliability. Therefore, we chose 20 messages, or roughly 20 ledgers, as the criteria.

This process also includes a timeliness condition: a node resets the peer's progress if it fails to deliver a new message within 8 seconds, ensuring that the node considers only fast and well-connected peers.

3. Selecting Primary Sources – Random Selection

When the consideration list has a minimum number of qualified peers, the node conducts a random selection. It chooses 5 peers from this list to be its designated primary sources for Validator X's messages. This number provides a good balance between reliability and traffic reduction. If a node selects too few peers, it may not receive sufficient messages from the validator. On the other hand, if it selects too many peers, the benefits of reducing duplicate traffic will diminish.

4. Squelching Other Peers – Suppressing Duplicates

After choosing the primary sources for Validator X, the node sends a "squelch" control message to all its other peers (i.e., those not selected as primary sources for Validator X).

This squelch message instructs these other peers to temporarily stop forwarding messages from that particular Validator (Validator X) to the node that sent the squelch message. For each peer, the squelch duration is random, between 5 and 10 minutes. For nodes with more than 60 peers, the upper bound increases up to an hour. Since the squelch duration is random, this ensures that peers will not start sending messages simultaneously, causing spikes in traffic and load.

5. Dynamic Re-evaluation – Ensuring Adaptability

The selection of primary sources is not static. The process continuously adapts to network changes:

Squelch Expiry: When a squelch expires, the peer reenters the learning phase to qualify again.
Selected Peer Disconnects: If a chosen primary source disconnects, the node sends an "unsquelch" message to all peers, restarting the entire selection process to find a replacement.
New Peer Connects: A new peer immediately enters the learning phase, competing against the established sources without triggering a complete reset.

Handling Squelch Requests

The algorithm also defines how a node (Node R) must respond to squelch requests from its peers (Peer S).

Maintaining Squelch Records: Node R keeps a simple record for each peer, listing which validators that peer has squelched and for how long.
Processing an Incoming Squelch Message: When Node R receives a squelch message from Peer S regarding messages from a specific Validator V:
- Verify Duration: Node R first checks the requested squelch duration, which must be less than a predefined maximum of one hour. This verification is crucial as it ensures that a validator's messages are never indefinitely or excessively silenced by any single peer.
- Update Records: If the duration is valid, Node R updates its records for Peer S, noting not to send messages from Validator V to Peer S until the squelch duration expires.
Special Case for Own Messages: A validator node has a special condition. If Validator A receives a squelch request from one of its peers concerning its own (Validator A's) messages, it ignores this request. The protective measure ensures a validator can always propagate its messages and maintain its network presence.
Clearing a Squelch: If Node R receives a squelch message from Peer S for Validator V with a squelch duration of zero, Node R clears any existing squelch entry for Validator V related to Peer S. This effectively acts as an "unsquelch" request.
Applying Squelch Rules During Relaying: It consults its records before Node R relays any validator message to a specific peer. Suppose Peer S squelched Validator V. In that case, Node R will refrain from sending that message to Peer S. However, it will still relay the message to peers for whom no such squelch is active.

This recipient-side logic ensures that squelch requests are respected, reducing redundant traffic across the network overall.

Base Squelching involves each node continuously learning and adapting, identifying useful peers for each specific validator's messages. It randomly chooses a small set of primary sources from a qualified pool and temporarily squelches others for that validator. This dynamic process reduces duplicate messages and keeps the system adaptive to changing network conditions and peer availability.

Quantifying the Impact of Base Squelching

While the detailed mechanism of Base Squelching is granular, we can model its aggregate effect as reducing redundant pathways for validator messages, thereby estimating the high-level traffic impact. The result is a projection where each node effectively processes the complete set of validator messages as if receiving them through 5 optimised peer connections.

The total effective connections in the network after squelching:
- 1,015 nodes × 5 effective connections/node = 5,075 connections
The baseline data rate from the flooding scenario, 1.754 GB/day, is multiplied by the reduced network connections. Under base squelching, the new daily network traffic is:
- 1.754 GB/connection/day × 5,075 connections = 8.9 TB per day

When comparing useful traffic to the new total (1.8 TB unique / 8.9 TB total) × 100% = 20.2%, network redundancy for validator messages drops from 92.2% to 79.8%, demonstrating a substantial reduction in wasted bandwidth just by enabling Base Squelching.

Second Improvement: Enhanced Squelching

While Base Squelching organises the existing validator traffic, Enhanced Squelching fundamentally alters the traffic a node accepts. It takes optimisation a step further by dramatically reducing the volume of unique messages a node processes, focusing specifically on the growing crowd of untrusted validators.

The best way to think of this is like a concierge at an exclusive event. Base Squelching acts like an usher, efficiently guiding accepted guests inside without creating a mob. Enhanced Squelching, however, is the concierge at the velvet rope—it decides who gets in. Its job is to maintain a small, high-quality guest list of untrusted validators and politely turn away the rest.

Reducing unique untrusted validator messages is crucial because nodes on the XRP Ledger primarily care about messages from trusted validators on their Unique Node List (UNL). They only relay messages from untrusted validators on the off-chance they might be useful to a peer. Enhanced Squelching applies strict "VIP" criteria, monitoring untrusted validators for their activity and network reach. It's worth noting that rippled servers already limit the relay of proposals from untrusted sources, so Enhanced Squelching focuses on filtering the much more common validation messages.

How Enhanced Squelching Works

Enhanced Squelching enables each node to act like a discerning concierge, identifying a small, active, and well-propagated set of untrusted validators to listen to. It does this by meticulously monitoring their validation messages. Here’s how the algorithm operates:

1. Initial Monitoring & New Information:

Initially, a node receives all untrusted validator validation messages from any of its peers. When a node encounters a unique untrusted validation message for the first time, it processes this message and begins tracking the validator's metrics. Tracking is active as long as the node has not yet selected its full complement of primary untrusted validators and has 'open slots' it is looking to fill with qualified candidates.

Even when a node fills all primary slots, the system remains dynamic. If a currently selected untrusted validator is later deselected, the slot reopens, prompting the node to reconsider other qualifying untrusted validators. This design ensures the network can adapt to new participants while effectively managing load from untrusted sources.

2. Tracking Untrusted Validator Activity

For each untrusted validator (e.g., UntrustedValidatorA), the node maintains several data points based on their validation messages.

A counter for the unique validation messages it has received from UntrustedValidatorA to gauge how active the validator is. A record of distinct peers that relayed validation messages for UntrustedValidatorA to understand how well-connected and widely seen that validator is.

Finally, the node resets validators progress if it does not receive a new unique validation message from the validator within 8 seconds, preventing slow or poorly connected validators from being selected.

3. Qualifying Untrusted Validators for Selection

An untrusted validator must meet several criteria to become eligible for selection.

First, the validator must have originated 20 unique timely messages. The number introduces a tradeoff similar to base squelching. A node may quickly select a less reliable validator if the threshold is low.

Second, at least 5 different peers must have sent its validation messages, indicating reasonable network propagation for its validations. If a node were to require a single peer, it may qualify as a poorly connected validator or a validator to a node to which it is directly connected. On the other hand, if the criteria were stricter, no validator could meet it.

4. Selecting a Core Set of Untrusted Validators

As validators meet qualification criteria, the node selects them one at a time, on a first-come, first-served basis, up to a maximum of 5. These become the primary untrusted validators whose validators the node will accept, process, and relay, subject to Base Squelching.

5. Squelching Other Untrusted Validators – Instructing Peers

The node suppresses all other untrusted validators by sending a squelch message to all its peers. Unlike base squelching, the duration of an untrusted validator squelch is longer and fixed: 1 hour.

When a node receives a message from a squelched validator via a new or an existing peer, it responds by sending the peer a squelch message. The node keeps track of the last time it squelched the peer to prevent excessive spam of squelch messages.

6. Applying Base Squelching to Messages from Selected Untrusted Validators

The selected validators' messages are processed through the Base Squelching algorithm, ensuring that duplicate transmissions from these untrusted sources are also effectively minimised. If the untrusted validator becomes idle, base squelching deletes the allocated slot, and the enhanced squelching algorithm picks a new validator.

Enhanced Squelching first narrows the field of untrusted validators to a small, active, and relevant set by meticulously analysing their validation message patterns. Then, Base Squelching deduplicates all messages received from this selected. This layered approach significantly reduces the processing of less relevant, unique, untrusted validator traffic and minimises the overall redundant messages a node has to handle.

Quantifying the Impact of Enhanced Squelching

Combining Enhanced Squelching to select a core set of untrusted validators and Base Squelching to reduce duplicates from all validators significantly reduces overall network traffic. Let's project the traffic based on the parameters defined earlier:

Effectively enhanced Squelching reduces the number of validators from which a node receives messages:
- 35 Trusted Validators + 5 Selected Untrusted Validators = 40 Validators
By reducing the total number of validators, the algorithm reduces the data amount generated per ledger:
- 40 validators × 432 bytes/validator = 17.3 KB per ledger
Under enhanced flooding, per day, each connection transfers:
- 17.3 KB/ledger × 20,000 ledgers/Day = 0.346 GB per connection per Day
Combined with base squelching, the projected total daily traffic is:
- 0.346 GB/day per connection × 5,075 connections = 1.78 TB per Day

This projected traffic of 1.78 TB per Day is significantly reduced from the 22.8 TB per Day calculated for the current flooding model, showcasing a potential traffic decrease of over 92%. The efficiency for processing messages from this selected set of 40 validators approaches optimal levels, significantly improving overall network resource utilisation.

What's Next? The Path to a More Efficient Network

The introduction of Base Squelching and the plans for Enhanced Squelching are steps towards a more efficient, scalable, and robust XRP Ledger network. These algorithms enhance network performance and reduce the resource burden on node operators by reducing duplicate and unnecessary unique validator messages.

Our next step is to start a slow and meticulous rollout plan for these features.

Release of Base Squelching: The upcoming rippled version 2.5.0 will include the improved Base Squelching algorithm. If node operators upgrade to this version, they can configure and activate the feature. However, these features will remain turned off by default. Furthermore, we recommend keeping Base Squelching off initially while we perform canary testing.
Concurrent Canary Testing of Base and Enhanced Squelching: Following the release of version 2.5.0, we will initiate a crucial canary testing phase.
- Base Squelching: We will test it on a controlled set of nodes running version 2.5.0, monitoring its performance and stability in a live environment.
- Enhanced Squelching: We will conduct canary testing for Enhanced Squelching. While the feature will not be released in version 2.5.0, the necessary code will be available to evaluate it on specific nodes. It will allow us to gather real-world data on its effectiveness early and refine it before a wider public release.
Public Release of Enhanced Squelching: Based on its dedicated canary testing outcomes, the Enhanced Squelching algorithm will be publicly available in a subsequent rippled release. This formal release will make the feature available to all node operators.
Gradual Network-Wide Rollout: Following respective public releases and successful canary testing of Base and Enhanced Squelching, the final phase will be a gradual, network-wide activation driven by the community. We will collaborate with infrastructure providers to support this process, recommending a cautious approach where operators enable the features on a few servers at a time. Gradual rollout allows them to monitor the real-world impact and ensures that any issues can be addressed with minimal disruption, thereby safeguarding the entire network.

This meticulous, phased strategy—releasing features, testing them in controlled live environments, and then supporting a gradual, community-involved rollout ensures a smooth and safe transition, bringing these powerful optimisations to the entire XRP Ledger ecosystem.

We are excited about the potential of these improvements and are committed to transparently sharing updates on their progress, deployment, and performance. Stay tuned for more information as we embark on this next phase of network enhancement.

If you would like to participate in testing or the rollout of the algorithms, please reach out to: vtumas@ripple.com . Your contributions are key to smooth activation! :)

DEV Community