DEV Community

Ripan Deuri
Ripan Deuri

Posted on

Understanding PCIe Link Training

1. Introduction

PCIe link training is the process by which a Root Complex (RC) and an Endpoint (EP) autonomously negotiate and establish a reliable high-speed serial link. No software is involved; everything is done by the Physical Layer state machine.

The process must solve:

  • Receiver detection: Does anything exist on the other end?
  • Bit lock: Can the receiver lock its clock-data recovery (CDR) circuit to the incoming bit stream?
  • Symbol/block lock: Can the receiver identify symbol or block boundaries?
  • Link configuration: What width and lane ordering to use?
  • Speed negotiation: What is the highest mutually supported data rate?

This article focuses on the physical layer (PHY) and explains the LTSSM (Link Training and Status State Machine).

2. System Setup

Topology:

RC Lane EP Lane
Lane0 Lane0
Lane1 Lane1
Lane2 open
Lane3 open

Expected Outcome:

  • Link width: x2 (limited by the EP)
  • Final speed: Gen3 (8 GT/s)

3. Encoding Fundamentals

3.2 8b/10b Encoding (Gen1 and Gen2)

Every 8-bit byte is replaced by a 10-bit symbol. The two extra bits provide the overhead needed for DC balance and transition density.

Running disparity (RD): The encoder tracks whether the current symbol has sent more 1s or 0s. RD+ means the last symbol had a net excess of 1s; RD- means net excess of 0s. The next symbol is chosen from the RD+ or RD- variant to balance the line.

Symbol classes:

  • Data symbols Dxx.y: xx = bits [4:0], y = bits [7:5]. Value = y×32 + xx.
  • Control symbols Kxx.y: special characters outside the normal data space.

3.3 Deriving the Symbol Names (Example)

The TS1/TS2 identifier bytes are:

TS1 ID byte = 0x4A

  • 0x4A = 74 decimal = 0100_1010 binary
  • bits [4:0] = 0_1010 = 10, bits [7:5] = 010 = 2 → D10.2

TS2 ID byte = 0x45

  • 0x45 = 69 decimal = 0100_0101 binary
  • bits [4:0] = 0_0101 = 5, bits [7:5] = 010 = 2 → D5.2

K28.5 (COM) = 0xBC

  • 0xBC = 1011_1100 binary
  • Bits [4:0] = 1_1100 = 28, bits [7:5] = 101 = 5 → K28.5
  • K28.5 is the designated "comma" character used to establish symbol alignment because its 10-bit patterns (both RD+ and RD-) contain a unique 6-bit run (six consecutive same-polarity bits) not achievable in any valid data symbol.

3.4 8b/10b Encoded Bit Patterns

8b/10b encoding uses lookup tables; below are the encoded values for the symbols used in training:

Byte Symbol 10-bit (RD-) 10-bit (RD+) Notes
0xBC K28.5 0011_111010 1100_000101 COM
0x00 D0.0 1001_110100 0110_001011 Logical Idle
0xF8 K23.7 1110_101000 0001_010111 PAD symbol
0x4A D10.2 0101_010101 1010_101010 TS1 ID ('J')
0x45 D5.2 1010_100101 0101_011010 TS2 ID ('E')
0xFF D31.7 1010_111110 0101_000001 N_FTS (255)
0x07 D7.0 1110_100010 0001_011101 Rate ID
0x20 D0.1 0110_010111 1001_101000 Speed change bit5=1
0xC8 D8.6 0001_011011 1110_100100 N_FTS=200

Note on D10.2: The RD- pattern 0101_010101 happens to be an alternating sequence, and the RD+ pattern 1010_101010 is the complement.

3.5 128b/130b Encoding (Gen3 and Above)

At Gen3 (8 GT/s), 8b/10b is replaced by 128b/130b encoding:

  • Every 128-bit payload gets a 2-bit sync header prepended.
  • Sync header 10 = data block; 01 = ordered set (control) block.
  • Overhead: 2/130 ≈ 1.5%, vs. 20% for 8b/10b. This is why Gen3 at 8 GT/s delivers ~4× the effective bandwidth of Gen1 at 2.5 GT/s, not just 3.2×.
  • Scrambling uses a different LFSR polynomial than Gen1/Gen2.

Gen3 TS1/TS2 format in 128b/130b: An ordered set block (sync header = 01) carries a 128-bit payload. A TS1 or TS2 occupies one such block. The payload contains the same logical fields (Link, Lane, N_FTS, Rate, Training Control, identifier) but packed differently than the 16-symbol 8b/10b format. Specifically, a Gen3 TS1/TS2 block is:

[01][TS_ID(1B)][Link(1B)][Lane(1B)][N_FTS(1B)][Rate(1B)][TrainingCtrl(1B)][ID×10(10B)][reserved(2B)]

Total: 2 bits sync + 128 bits payload = 130 bits per ordered set block.

4. Ordered Set Structure (TS1 / TS2) — Gen1/Gen2

Each ordered set = 16 symbols × 10 bits = 160 bits at Gen1/Gen2.

TS1 Layout (pre-scramble byte values)

Symbol index Field Byte value (typical) Notes
0 COM 0xBC K28.5, always sent
1 Link Number 0xF8 (PAD) or 0x00 PAD until link assigned
2 Lane Number 0xF8 (PAD) or 0x00..n PAD until lane assigned
3 N_FTS 0xC8 (200) Receiver's FTS requirement
4 Rate Identifier 0x07 Gen1+Gen2+Gen3 support
5 Training Control 0x00 or 0x20 0x20 = speed change request
6–15 TS1 Identifier 0x4A × 10 ASCII 'J', repeated 10 times

TS2 Layout

Identical to TS1, except:

Symbol index Field Byte value
6–15 TS2 Identifier 0x45 × 10

Key Training Control Bits (Symbol 5)

Bit Name Meaning when set
0 Hot Reset Request hot reset
1 Disable Link Request link disable
2 Loopback Request loopback mode
3 Disable Scrambling Scrambling off (test/debug)
4 Compliance Receive Enter compliance mode
5 Speed Change Request transition to new speed

5. LTSSM Overview

Detect → Polling → Configuration → L0 (Gen1) → Recovery → L0 (Gen3)
Enter fullscreen mode Exit fullscreen mode
  • Training always begins at Gen1 (2.5 GT/s), regardless of device capability.
  • Speed upgrade happens later via the Recovery state.
  • The LTSSM runs independently and in parallel on the RC and the EP. They converge through the exchange of ordered sets.

6. Detect State

The transmitter output is a current-mode driver with a nominal output impedance of 50 Ω into a 50 Ω termination at the receiver. Total DC path = 100 Ω differential.

The Detect state uses a slow voltage ramp on the TX differential pair:

  • Receiver present: The 50 Ω termination loads the ramp → slow rise time detected.
  • No receiver (open circuit): No termination → fast rise to rail → detected as absent.

The spec requires the transmitter to charge the line to a voltage and measure the time to reach a threshold. If it stays below the threshold long enough (indicating a load), a receiver is declared present.

RC Detect Results (per lane):

RC Lane Detect result
Lane0 Receiver present (50 Ω load from EP Lane0)
Lane1 Receiver present (50 Ω load from EP Lane1)
Lane2 No receiver (open circuit)
Lane3 No receiver (open circuit)

RC exits Detect with 2 active lanes: Lane0 and Lane1.

Lane2 and Lane3 are deactivated for the remainder of training.

EP Detect Results (per lane):

EP Lane Detect result
Lane0 Receiver present
Lane1 Receiver present

EP exits Detect with 2 active lanes.

7. Polling State

Polling has two sub-states: Polling.Active and Polling.Configuration.

Goal of Polling

  1. Bit lock: The receiver CDR circuit locks its internal clock to the incoming bit transitions.
  2. Symbol lock: The receiver identifies COM (K28.5) symbols and aligns its symbol boundaries.
  3. Configuration capability exchange: Devices advertise their supported speeds in the Rate ID field.

7.1 Polling.Active — TS1 Transmission

Both RC and EP begin transmitting TS1 ordered sets simultaneously on all active lanes. At this stage:

  • Link Number = PAD (0xF8) — no link number assigned yet
  • Lane Number = PAD (0xF8) — no lane number assigned yet
  • Training Control = 0x00

RC → EP on Lane0, TS1 ordered set (pre-scramble bytes):

Symbol:  [ 0][ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8][ 9][10][11][12][13][14][15]
Byte:    [BC][F8][F8][C8][07][00][4A][4A][4A][4A][4A][4A][4A][4A][4A][4A]
Field:   COM  LNK LAN FTS RAT CTL <-----------TS1 ID ('J') × 10--------->
Enter fullscreen mode Exit fullscreen mode

Bit-level expansion of the first three symbols (RD- for each):

Symbol 0 — COM (K28.5, 0xBC), RD-:
  10-bit: 0011_1110 10
  Wired:  0 0 1 1 1 1 1 0 1 0   (LSB first on differential pair)

Symbol 1 — PAD (K23.7, 0xF8), RD- → RD+:
  Sending K28.5 in RD- leaves RD+, so next symbol is RD+ variant.
  K23.7 RD+: 0001_0101 11
  Wired:  0 0 0 1 0 1 0 1 1 1

Symbol 2 — PAD (K23.7, 0xF8), RD+ → RD-:
  K23.7 RD-: 1110_1010 00
  Wired:  1 1 1 0 1 0 1 0 0 0
Enter fullscreen mode Exit fullscreen mode

PCIe bit ordering: Bits are transmitted LSB-first within each 10-bit symbol. The wired sequence above shows bits as they appear on the differential pair over time, left to right.

Same TS1 on Lane1 (identical content, independent CDR):

Symbol:  [ 0][ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8][ 9][10][11][12][13][14][15]
Byte:    [BC][F8][F8][C8][07][00][4A][4A][4A][4A][4A][4A][4A][4A][4A][4A]
Enter fullscreen mode Exit fullscreen mode

Lane0 and Lane1 carry identical TS1 content during Polling. Each lane's CDR circuit locks independently.

EP → RC (simultaneously, same TS1 format):

Lane0: [BC][F8][F8][C8][07][00][4A × 10]
Lane1: [BC][F8][F8][C8][07][00][4A × 10]
Enter fullscreen mode Exit fullscreen mode

Polling.Active Exit Condition:

  • Transmit at least 1024 TS1 ordered sets on all active lanes.
  • AND receive 8 consecutive identical TS1 or TS2 on any active lane.

Both conditions must be met. The 1024-TS1 minimum ensures the receiver had enough transitions for bit and symbol lock before either side checks the received content.

On receipt of 8 consecutive valid TS1, the device transitions to Polling.Configuration.

7.2 Polling.Configuration

In Polling.Configuration, each device sends TS2 ordered sets (symbols 6–15 = 0x45). The Rate ID field is still 0x07. The device exits when it has:

  • Received 8 consecutive TS2 with matching Rate ID
  • AND transmitted at least 16 TS2

After Polling.Configuration, the devices have confirmed mutual speed support. Both transition to Configuration.

8. Configuration State

Configuration negotiates link width and lane numbering. It proceeds through several sub-states.

8.1 Configuration.LinkWidth.Start

The RC transmits TS1 on all active lanes with:

  • Link Number = PAD (0xF8)
  • Lane Number = PAD (0xF8)

This signals: "I am proposing lanes; tell me what you can accept."

RC → EP per lane:

Lane0: [BC][F8][F8][C8][07][00][4A × 10]
Lane1: [BC][F8][F8][C8][07][00][4A × 10]
Enter fullscreen mode Exit fullscreen mode

Both lanes carry PAD/PAD — the RC has not assigned numbers yet.

8.2 Configuration.LinkWidth.Accept

The EP receives the RC's PAD/PAD TS1 on Lane0 and Lane1. It assigns link and lane numbers:

EP Lane Assigned Link Number Assigned Lane Number
Lane0 0x00 0x00
Lane1 0x00 0x01

The EP sends TS1 back to the RC with these assigned values:

EP → RC on Lane0 (pre-scramble):

Symbol:  [ 0][ 1][ 2][ 3][ 4][ 5][ 6–15]
Byte:    [BC][00][00][C8][07][00][4A × 10]
Field:   COM  LNK=0 LAN=0 FTS RAT CTL  TS1_ID
Enter fullscreen mode Exit fullscreen mode

EP → RC on Lane1 (pre-scramble):

Symbol:  [ 0][ 1][ 2][ 3][ 4][ 5][ 6–15]
Byte:    [BC][00][01][C8][07][00][4A × 10]
Field:   COM  LNK=0 LAN=1 FTS RAT CTL  TS1_ID
Enter fullscreen mode Exit fullscreen mode

Bit-level for Symbol 2 (Lane Number) on Lane1 — D1.0 (0x01), assume RD-:

0x01 = D1.0
bits[4:0] = 00001, bits[7:5] = 000
D1.0 RD-: 1001_010111   (10 bits, LSB first on wire: 1 0 0 1 0 1 0 1 1 1)
Enter fullscreen mode Exit fullscreen mode

On Lane0, Symbol 2 = D0.0 (0x00):

D0.0 RD-: 1001_110100   (wire: 1 0 0 1 1 1 0 1 0 0)
Enter fullscreen mode Exit fullscreen mode

8.3 RC Echoes Back

The RC receives the EP's numbered TS1. It now echoes the same Link=0 / Lane=N assignments back in its own TS1 transmissions, confirming acceptance:

RC → EP on Lane0:

[BC][00][00][C8][07][00][4A × 10]
Enter fullscreen mode Exit fullscreen mode

RC → EP on Lane1:

[BC][00][01][C8][07][00][4A × 10]
Enter fullscreen mode Exit fullscreen mode

Agreement is reached: Link 0, x2 width, Lane0 and Lane1.

8.4 Configuration.LaneNum.Wait — Switch to TS2

Both sides transition from TS1 → TS2 to confirm lane numbering.

RC → EP on Lane0:

[BC][00][00][C8][07][00][45 × 10]
Enter fullscreen mode Exit fullscreen mode

RC → EP on Lane1:

[BC][00][01][C8][07][00][45 × 10]
Enter fullscreen mode Exit fullscreen mode

EP → RC (same, mirrored).

Symbol 15 of TS2 on Lane0, D5.2 (0x45), assume RD-:

0x45 = D5.2
bits[4:0] = 00101 = 5, bits[7:5] = 010 = 2
D5.2 RD-: 1010_100101   (wire: 1 0 1 0 1 0 0 1 0 1)
Enter fullscreen mode Exit fullscreen mode

8.5 Configuration.LaneNum.Accept

Exit condition: receive 2 consecutive TS2 with matching Link and Lane fields.

8.6 Configuration.Complete

Both sides exchange TS2 until:

  • 8 consecutive TS2 received
  • AND minimum 2 ms has elapsed since entering Configuration state

The 2 ms minimum is a guard interval to allow all lanes to settle and converge regardless of implementation variance.

8.7 Configuration.Idle

Both sides stop sending TS2 and send Electrical Idle followed by Logical Idle symbols:

Logical Idle symbol — D0.0 (0x00), RD-:

10-bit RD-: 1001_110100
Wire (LSB first): 1 0 0 1 1 1 0 1 0 0
Enter fullscreen mode Exit fullscreen mode

Both sides send 8 consecutive Logical Idle symbols on each lane. On receipt of these, both sides transition to L0.

9. L0 State — Gen1 Active Link

The link is now operational:

Parameter Value
Speed Gen1, 2.5 GT/s per lane
Width x2
Encoding 8b/10b
Raw bit rate 2.5 Gb/s per lane
Effective rate 2.5 × 0.8 = 2.0 Gb/s per lane (8b/10b overhead)
Aggregate BW 2 lanes × 2.0 Gb/s = 4.0 Gb/s = 500 MB/s (bidirectional per direction)

TLPs (Transaction Layer Packets) and DLLPs (Data Link Layer Packets) can now flow. Software is notified that a link is up.

The RC or EP (typically the RC, or the driver stack) can now initiate a speed change by entering Recovery.

10. Recovery State — Speed Upgrade to Gen3

10.1 Recovery.RcvrLock — Requesting Speed Change

The RC (or EP) initiates Recovery by sending TS1 with:

  • Speed Change bit set (Training Control bit 5 = 1 → byte = 0x20)
  • Link = 0x00, Lane = 0x00 or 0x01 per lane
  • Rate ID = 0x07 (all speeds)

RC → EP on Lane0 (pre-scramble):

Symbol:  [ 0][ 1][ 2][ 3][ 4][ 5][ 6–15]
Byte:    [BC][00][00][C8][07][20][4A × 10]
                              ^^
                        Speed Change = 1 (bit 5)
Enter fullscreen mode Exit fullscreen mode

Symbol 5, Training Control = 0x20:

0x20 = 0010_0000 binary = D0.1
bits[4:0] = 00000 = 0, bits[7:5] = 001 = 1 → D0.1

D0.1 RD-: 0110_010111   (wire: 0 1 1 0 0 1 0 1 1 1)
Enter fullscreen mode Exit fullscreen mode

RC → EP on Lane1:

[BC][00][01][C8][07][20][4A × 10]
Enter fullscreen mode Exit fullscreen mode

The EP receives these and responds with the same: TS1 with Speed Change bit = 1, its own Link=0, Lane=N.

EP → RC on Lane0:

[BC][00][00][C8][07][20][4A × 10]
Enter fullscreen mode Exit fullscreen mode

EP → RC on Lane1:

[BC][00][01][C8][07][20][4A × 10]
Enter fullscreen mode Exit fullscreen mode

Exit condition for Recovery.RcvrLock: receive 8 consecutive TS1 or TS2 (with or without speed change bit) on all active lanes.

10.2 Recovery.RcvrCfg — Confirming Speed with TS2

Both sides switch to TS2 (still Gen1, still 8b/10b, Speed Change bit = 1):

RC → EP on Lane0:

[BC][00][00][C8][07][20][45 × 10]
Enter fullscreen mode Exit fullscreen mode

RC → EP on Lane1:

[BC][00][01][C8][07][20][45 × 10]
Enter fullscreen mode Exit fullscreen mode

Exit condition: receive 8 consecutive TS2 with Speed Change bit set.

10.3 Recovery.Speed — PHY Retrain

Both sides simultaneously:

  1. Assert electrical idle (stop driving differential data).
  2. Reset the PHY PLL from 2.5 GT/s to 8 GT/s.
  3. Switch the serializer/deserializer (SerDes) to Gen3 signaling parameters (different equalization, different reference voltage levels).
  4. Switch framing from 8b/10b to 128b/130b.
  5. Reset the LFSR scrambler to the Gen3 seed.

There is a mandatory quiet time (Electrical Idle) during this phase. The timeout for this step is 24 ms.

10.4 Recovery.RcvrLock Again (at Gen3)

After the PHY switches to 8 GT/s, both sides retransmit TS1 — now using 128b/130b framing.

Each TS1 block at Gen3 is a 130-bit unit:

[sync: 01][TS1 payload: 128 bits]
 ^^^^^^^^
 Ordered set block indicator
Enter fullscreen mode Exit fullscreen mode

Payload (128 bits = 16 bytes), logical byte layout:

Byte:  [TS1_OS_ID][LNK][LAN][N_FTS][RATE][CTRL][4A×10][rsvd×2]
       [0xF0    ][0x00][0x00][0xC8][0x07][0x20][4A..4A][00 00]
Enter fullscreen mode Exit fullscreen mode

Note: The Gen3 TS1 ordered set identifier byte is 0xF0 (different from Gen1/Gen2, which uses the symbol position to indicate "this is an ordered set"). In 128b/130b, a dedicated byte in the payload identifies the ordered set type.

On the wire (Lane0, first 20 bits of a Gen3 TS1 block, before scrambling):

Sync header (2 bits): 0 1
Byte 0 = 0xF0 = 1111_0000, wire (LSB first): 0 0 0 0 1 1 1 1
Byte 1 = 0x00 = 0000_0000, wire (LSB first): 0 0 0 0 0 0 0 0
...continues for remaining 14 bytes (112 bits)
Total = 2 + 128 = 130 bits per block
Enter fullscreen mode Exit fullscreen mode

Both lanes (Lane0, Lane1) transmit identical ordered sets simultaneously.

10.5 Recovery.RcvrCfg (at Gen3) — TS2

Both sides switch to TS2 at Gen3. Same 130-bit block format, payload byte 0 = TS2 OS identifier (0xF2 for TS2 in 128b/130b), symbols 6–15 equivalent = 0x45 bytes.

Exit condition: 8 consecutive TS2 at Gen3 on all active lanes.

10.6 Transition to L0 (Gen3)

Both sides send Logical Idle in 128b/130b and transition to L0 at Gen3.

Final Link State:

Parameter Value
Width x2
Speed Gen3, 8 GT/s per lane
Encoding 128b/130b
Raw bit rate 8 Gb/s per lane
Encoding overhead 2/130 ≈ 1.54%
Effective rate 8 × (128/130) ≈ 7.877 Gb/s per lane
Aggregate BW 2 lanes × 7.877 / 8 ≈ ~1.97 GB/s (per direction)

12. Complete LTSSM State Transition Summary

State Duration / Exit Condition Lanes active
Detect.Quiet Receiver detection logic runs All (RC: 4)
Detect.Active Receiver seen on Lane0, Lane1; Lane2/3 deactivated RC: 2, EP: 2
Polling.Active TX ≥1024 TS1 AND RX 8 consecutive TS1/TS2 2 per side
Polling.Configuration RX 8 consecutive TS2 AND TX ≥16 TS2 2 per side
Configuration.LinkWidth.Start Send PAD/PAD TS1 2 per side
Configuration.LinkWidth.Accept RX TS1 with Link≠PAD, Lane≠PAD 2 per side
Configuration.LaneNum.Wait Switch to TS2 with Link/Lane assigned 2 per side
Configuration.LaneNum.Accept RX 2 consecutive matching TS2 2 per side
Configuration.Complete RX 8 consecutive TS2 AND ≥2 ms in Configuration 2 per side
Configuration.Idle RX 8 Logical Idle symbols 2 per side
L0 (Gen1) Active data transfer at 2.5 GT/s x2 2 per side
Recovery.RcvrLock RX 8 consecutive TS1/TS2 (speed change requested) 2 per side
Recovery.RcvrCfg RX 8 consecutive TS2 with speed change bit 2 per side
Recovery.Speed PHY retrains to Gen3; 24 ms timeout 2 per side
Recovery.RcvrLock (Gen3) RX 8 consecutive TS1/TS2 at 8 GT/s (128b/130b) 2 per side
Recovery.RcvrCfg (Gen3) RX 8 consecutive TS2 at 8 GT/s 2 per side
L0 (Gen3) Active data transfer at 8 GT/s x2 ≈ 1.97 GB/s 2 per side

Top comments (0)