Forward Error Correction, a story about Generation 14 PowerEdge and 25 Gigabit connectivity

#8023by #layer1 #routingswitching #vmware

25 Gigabit implementations

First of all - anyone who assumes they think Layer 1 is simple is wrong.

That being said, 25G/50/100G/QSFP28 services are different beyond simply being 2.5x faster than 10G. 802.3by (or 25 Gig for those who use it):

Full-Duplex is mandatory.
Energy Efficient Operation
Stackable Lanes supporting speeds of up to 28Gbits/s
For those who love it, Twinax maxes out at 5 meters for now (http://www.ieee802.org/3/by/P802_3by_Objectives.pdf)
Currently, cost for 25G/100G silicon appears to be less than for 10G/100G at the switch level.

What has NOT changed

BER minimums are still 10¹²
All existing 802.1* protocols remain supported

That being said, I've been working to implement 25G to the server for quite some time now, and we waited with bated breath as the new servers (sporting bcnxnet 2x25G NICs) booted up...

and proceeded not to establish any link-level connectivity.

Well, we followed the usual suspects, attempting to statically negotiate speed-duplex (which is probably a platform oddity) to no avail.

As it turns out - Forward Error Correction is the culprit. Upon reviewing the IEEE's docs on 802.3by, we found this gem, indicating the difficulties with negotiating different FEC modes:

http://www.ieee802.org/3/by/public/Jan16/hidaka_3by_01_0116.pdf

Clause 73 outlines a set of bits for FEC auto-negotiation that would allow (over 5 bits) signaling to establish a same-same connection for agreement on which mode to use - keep in mind that any active connection (all optics, twinax over 5 meters) will require some form of FEC to detect whether errors will probably occur on a link:

F0: 10G FEC Offered

F1: 10G FEC Requested

F2: 25G RS-FEC Offered ( ideal )

F3: 25G RS-FEC Requested

F4: 25G Base-R (Fire Code) FEC requested

This is important for preventing downstream failures - now that we're transmitting data at considerably higher speeds, but since 802.3by has been released as recently as 2016 (where RS-FEC came out in 2017 ) support for various modes can be a bit lopsided. Here's the order of preference with a reliability bias - invert the list if latency is the primary goal / you use really good cables:

RS-FEC
FC-FEC
FEC Disabled

Currently, Generation 14 Dell Poweredge appears to support all modes, but defaults to "disabled" and completely fails to auto-negotiate. No matter what, using the Broadcom NICs onboard, you will need to consciously select an option here, and then apply it to your switch.

In addition, early-generation 802.3by switches like Cisco's Nexus EX will not support RS-FEC on single-lane modes, but will support in multi-lane transceivers:

https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/datasheet-c78-736651.html

This can also be resolved by buying newer generation switches (FX+), but all generations appear to auto-negotiate with no issues within the switch-to-switch realm.

What is FEC?

Well, the wikipedia article is a pretty good start (https://en.wikipedia.org/wiki/Forward_error_correction) but is awfully vague. Long story short, you have the option of adding about 80-250 billionths of a second in latency to essentially achieve a "what-if" analysis on a links apparent reliability. This is great, especially with twinax, where bit errors are a bit more common than with fiber optics. FEC can also provide feedback on bit errors between destinations, allowing it to "train" or "self-heal" links - allowing for much higher link reliability.

What this means to me

In this case, the following design impacts should be made:

If it's important, use multi-lane slots for it:
- If you're egressing a fabric, you should use QSFP28 transceivers if cost allows. This will provide RS-FEC where it counts
- If you have spine switches, use QSFP28 transceivers.
If you're buying now, read the product sheets for both your servers and your switches to ensure that RS-FEC is supported, and use optical cabling