1. Introduction
PCIe link training is the process by which a Root Complex (RC) and an Endpoint (EP) autonomously negotiate and establish a reliable high-speed serial link. No software is involved; everything is done by the Physical Layer state machine.
The process must solve:
- Receiver detection: Does anything exist on the other end?
- Bit lock: Can the receiver lock its clock-data recovery (CDR) circuit to the incoming bit stream?
- Symbol/block lock: Can the receiver identify symbol or block boundaries?
- Link configuration: What width and lane ordering to use?
- Speed negotiation: What is the highest mutually supported data rate?
This article focuses on the physical layer (PHY) and explains the LTSSM (Link Training and Status State Machine).
2. System Setup
Topology:
| RC Lane | EP Lane |
|---|---|
| Lane0 | Lane0 |
| Lane1 | Lane1 |
| Lane2 | open |
| Lane3 | open |
Expected Outcome:
- Link width: x2 (limited by the EP)
- Final speed: Gen3 (8 GT/s)
3. Encoding Fundamentals
3.2 8b/10b Encoding (Gen1 and Gen2)
Every 8-bit byte is replaced by a 10-bit symbol. The two extra bits provide the overhead needed for DC balance and transition density.
Running disparity (RD): The encoder tracks whether the current symbol has sent more 1s or 0s. RD+ means the last symbol had a net excess of 1s; RD- means net excess of 0s. The next symbol is chosen from the RD+ or RD- variant to balance the line.
Symbol classes:
- Data symbols Dxx.y: xx = bits [4:0], y = bits [7:5]. Value = y×32 + xx.
- Control symbols Kxx.y: special characters outside the normal data space.
3.3 Deriving the Symbol Names (Example)
The TS1/TS2 identifier bytes are:
TS1 ID byte = 0x4A
- 0x4A = 74 decimal = 0100_1010 binary
- bits [4:0] = 0_1010 = 10, bits [7:5] = 010 = 2 → D10.2
TS2 ID byte = 0x45
- 0x45 = 69 decimal = 0100_0101 binary
- bits [4:0] = 0_0101 = 5, bits [7:5] = 010 = 2 → D5.2
K28.5 (COM) = 0xBC
- 0xBC = 1011_1100 binary
- Bits [4:0] = 1_1100 = 28, bits [7:5] = 101 = 5 → K28.5
- K28.5 is the designated "comma" character used to establish symbol alignment because its 10-bit patterns (both RD+ and RD-) contain a unique 6-bit run (six consecutive same-polarity bits) not achievable in any valid data symbol.
3.4 8b/10b Encoded Bit Patterns
8b/10b encoding uses lookup tables; below are the encoded values for the symbols used in training:
| Byte | Symbol | 10-bit (RD-) | 10-bit (RD+) | Notes |
|---|---|---|---|---|
| 0xBC | K28.5 | 0011_111010 | 1100_000101 | COM |
| 0x00 | D0.0 | 1001_110100 | 0110_001011 | Logical Idle |
| 0xF8 | K23.7 | 1110_101000 | 0001_010111 | PAD symbol |
| 0x4A | D10.2 | 0101_010101 | 1010_101010 | TS1 ID ('J') |
| 0x45 | D5.2 | 1010_100101 | 0101_011010 | TS2 ID ('E') |
| 0xFF | D31.7 | 1010_111110 | 0101_000001 | N_FTS (255) |
| 0x07 | D7.0 | 1110_100010 | 0001_011101 | Rate ID |
| 0x20 | D0.1 | 0110_010111 | 1001_101000 | Speed change bit5=1 |
| 0xC8 | D8.6 | 0001_011011 | 1110_100100 | N_FTS=200 |
Note on D10.2: The RD- pattern
0101_010101happens to be an alternating sequence, and the RD+ pattern1010_101010is the complement.
3.5 128b/130b Encoding (Gen3 and Above)
At Gen3 (8 GT/s), 8b/10b is replaced by 128b/130b encoding:
- Every 128-bit payload gets a 2-bit sync header prepended.
- Sync header
10= data block;01= ordered set (control) block. - Overhead: 2/130 ≈ 1.5%, vs. 20% for 8b/10b. This is why Gen3 at 8 GT/s delivers ~4× the effective bandwidth of Gen1 at 2.5 GT/s, not just 3.2×.
- Scrambling uses a different LFSR polynomial than Gen1/Gen2.
Gen3 TS1/TS2 format in 128b/130b: An ordered set block (sync header = 01) carries a 128-bit payload. A TS1 or TS2 occupies one such block. The payload contains the same logical fields (Link, Lane, N_FTS, Rate, Training Control, identifier) but packed differently than the 16-symbol 8b/10b format. Specifically, a Gen3 TS1/TS2 block is:
[01][TS_ID(1B)][Link(1B)][Lane(1B)][N_FTS(1B)][Rate(1B)][TrainingCtrl(1B)][ID×10(10B)][reserved(2B)]
Total: 2 bits sync + 128 bits payload = 130 bits per ordered set block.
4. Ordered Set Structure (TS1 / TS2) — Gen1/Gen2
Each ordered set = 16 symbols × 10 bits = 160 bits at Gen1/Gen2.
TS1 Layout (pre-scramble byte values)
| Symbol index | Field | Byte value (typical) | Notes |
|---|---|---|---|
| 0 | COM | 0xBC | K28.5, always sent |
| 1 | Link Number | 0xF8 (PAD) or 0x00 | PAD until link assigned |
| 2 | Lane Number | 0xF8 (PAD) or 0x00..n | PAD until lane assigned |
| 3 | N_FTS | 0xC8 (200) | Receiver's FTS requirement |
| 4 | Rate Identifier | 0x07 | Gen1+Gen2+Gen3 support |
| 5 | Training Control | 0x00 or 0x20 | 0x20 = speed change request |
| 6–15 | TS1 Identifier | 0x4A × 10 | ASCII 'J', repeated 10 times |
TS2 Layout
Identical to TS1, except:
| Symbol index | Field | Byte value |
|---|---|---|
| 6–15 | TS2 Identifier | 0x45 × 10 |
Key Training Control Bits (Symbol 5)
| Bit | Name | Meaning when set |
|---|---|---|
| 0 | Hot Reset | Request hot reset |
| 1 | Disable Link | Request link disable |
| 2 | Loopback | Request loopback mode |
| 3 | Disable Scrambling | Scrambling off (test/debug) |
| 4 | Compliance Receive | Enter compliance mode |
| 5 | Speed Change | Request transition to new speed |
5. LTSSM Overview
Detect → Polling → Configuration → L0 (Gen1) → Recovery → L0 (Gen3)
- Training always begins at Gen1 (2.5 GT/s), regardless of device capability.
- Speed upgrade happens later via the Recovery state.
- The LTSSM runs independently and in parallel on the RC and the EP. They converge through the exchange of ordered sets.
6. Detect State
The transmitter output is a current-mode driver with a nominal output impedance of 50 Ω into a 50 Ω termination at the receiver. Total DC path = 100 Ω differential.
The Detect state uses a slow voltage ramp on the TX differential pair:
- Receiver present: The 50 Ω termination loads the ramp → slow rise time detected.
- No receiver (open circuit): No termination → fast rise to rail → detected as absent.
The spec requires the transmitter to charge the line to a voltage and measure the time to reach a threshold. If it stays below the threshold long enough (indicating a load), a receiver is declared present.
RC Detect Results (per lane):
| RC Lane | Detect result |
|---|---|
| Lane0 | Receiver present (50 Ω load from EP Lane0) |
| Lane1 | Receiver present (50 Ω load from EP Lane1) |
| Lane2 | No receiver (open circuit) |
| Lane3 | No receiver (open circuit) |
RC exits Detect with 2 active lanes: Lane0 and Lane1.
Lane2 and Lane3 are deactivated for the remainder of training.
EP Detect Results (per lane):
| EP Lane | Detect result |
|---|---|
| Lane0 | Receiver present |
| Lane1 | Receiver present |
EP exits Detect with 2 active lanes.
7. Polling State
Polling has two sub-states: Polling.Active and Polling.Configuration.
Goal of Polling
- Bit lock: The receiver CDR circuit locks its internal clock to the incoming bit transitions.
- Symbol lock: The receiver identifies COM (K28.5) symbols and aligns its symbol boundaries.
- Configuration capability exchange: Devices advertise their supported speeds in the Rate ID field.
7.1 Polling.Active — TS1 Transmission
Both RC and EP begin transmitting TS1 ordered sets simultaneously on all active lanes. At this stage:
- Link Number = PAD (0xF8) — no link number assigned yet
- Lane Number = PAD (0xF8) — no lane number assigned yet
- Training Control = 0x00
RC → EP on Lane0, TS1 ordered set (pre-scramble bytes):
Symbol: [ 0][ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8][ 9][10][11][12][13][14][15]
Byte: [BC][F8][F8][C8][07][00][4A][4A][4A][4A][4A][4A][4A][4A][4A][4A]
Field: COM LNK LAN FTS RAT CTL <-----------TS1 ID ('J') × 10--------->
Bit-level expansion of the first three symbols (RD- for each):
Symbol 0 — COM (K28.5, 0xBC), RD-:
10-bit: 0011_1110 10
Wired: 0 0 1 1 1 1 1 0 1 0 (LSB first on differential pair)
Symbol 1 — PAD (K23.7, 0xF8), RD- → RD+:
Sending K28.5 in RD- leaves RD+, so next symbol is RD+ variant.
K23.7 RD+: 0001_0101 11
Wired: 0 0 0 1 0 1 0 1 1 1
Symbol 2 — PAD (K23.7, 0xF8), RD+ → RD-:
K23.7 RD-: 1110_1010 00
Wired: 1 1 1 0 1 0 1 0 0 0
PCIe bit ordering: Bits are transmitted LSB-first within each 10-bit symbol. The wired sequence above shows bits as they appear on the differential pair over time, left to right.
Same TS1 on Lane1 (identical content, independent CDR):
Symbol: [ 0][ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8][ 9][10][11][12][13][14][15]
Byte: [BC][F8][F8][C8][07][00][4A][4A][4A][4A][4A][4A][4A][4A][4A][4A]
Lane0 and Lane1 carry identical TS1 content during Polling. Each lane's CDR circuit locks independently.
EP → RC (simultaneously, same TS1 format):
Lane0: [BC][F8][F8][C8][07][00][4A × 10]
Lane1: [BC][F8][F8][C8][07][00][4A × 10]
Polling.Active Exit Condition:
- Transmit at least 1024 TS1 ordered sets on all active lanes.
- AND receive 8 consecutive identical TS1 or TS2 on any active lane.
Both conditions must be met. The 1024-TS1 minimum ensures the receiver had enough transitions for bit and symbol lock before either side checks the received content.
On receipt of 8 consecutive valid TS1, the device transitions to Polling.Configuration.
7.2 Polling.Configuration
In Polling.Configuration, each device sends TS2 ordered sets (symbols 6–15 = 0x45). The Rate ID field is still 0x07. The device exits when it has:
- Received 8 consecutive TS2 with matching Rate ID
- AND transmitted at least 16 TS2
After Polling.Configuration, the devices have confirmed mutual speed support. Both transition to Configuration.
8. Configuration State
Configuration negotiates link width and lane numbering. It proceeds through several sub-states.
8.1 Configuration.LinkWidth.Start
The RC transmits TS1 on all active lanes with:
- Link Number = PAD (0xF8)
- Lane Number = PAD (0xF8)
This signals: "I am proposing lanes; tell me what you can accept."
RC → EP per lane:
Lane0: [BC][F8][F8][C8][07][00][4A × 10]
Lane1: [BC][F8][F8][C8][07][00][4A × 10]
Both lanes carry PAD/PAD — the RC has not assigned numbers yet.
8.2 Configuration.LinkWidth.Accept
The EP receives the RC's PAD/PAD TS1 on Lane0 and Lane1. It assigns link and lane numbers:
| EP Lane | Assigned Link Number | Assigned Lane Number |
|---|---|---|
| Lane0 | 0x00 | 0x00 |
| Lane1 | 0x00 | 0x01 |
The EP sends TS1 back to the RC with these assigned values:
EP → RC on Lane0 (pre-scramble):
Symbol: [ 0][ 1][ 2][ 3][ 4][ 5][ 6–15]
Byte: [BC][00][00][C8][07][00][4A × 10]
Field: COM LNK=0 LAN=0 FTS RAT CTL TS1_ID
EP → RC on Lane1 (pre-scramble):
Symbol: [ 0][ 1][ 2][ 3][ 4][ 5][ 6–15]
Byte: [BC][00][01][C8][07][00][4A × 10]
Field: COM LNK=0 LAN=1 FTS RAT CTL TS1_ID
Bit-level for Symbol 2 (Lane Number) on Lane1 — D1.0 (0x01), assume RD-:
0x01 = D1.0
bits[4:0] = 00001, bits[7:5] = 000
D1.0 RD-: 1001_010111 (10 bits, LSB first on wire: 1 0 0 1 0 1 0 1 1 1)
On Lane0, Symbol 2 = D0.0 (0x00):
D0.0 RD-: 1001_110100 (wire: 1 0 0 1 1 1 0 1 0 0)
8.3 RC Echoes Back
The RC receives the EP's numbered TS1. It now echoes the same Link=0 / Lane=N assignments back in its own TS1 transmissions, confirming acceptance:
RC → EP on Lane0:
[BC][00][00][C8][07][00][4A × 10]
RC → EP on Lane1:
[BC][00][01][C8][07][00][4A × 10]
Agreement is reached: Link 0, x2 width, Lane0 and Lane1.
8.4 Configuration.LaneNum.Wait — Switch to TS2
Both sides transition from TS1 → TS2 to confirm lane numbering.
RC → EP on Lane0:
[BC][00][00][C8][07][00][45 × 10]
RC → EP on Lane1:
[BC][00][01][C8][07][00][45 × 10]
EP → RC (same, mirrored).
Symbol 15 of TS2 on Lane0, D5.2 (0x45), assume RD-:
0x45 = D5.2
bits[4:0] = 00101 = 5, bits[7:5] = 010 = 2
D5.2 RD-: 1010_100101 (wire: 1 0 1 0 1 0 0 1 0 1)
8.5 Configuration.LaneNum.Accept
Exit condition: receive 2 consecutive TS2 with matching Link and Lane fields.
8.6 Configuration.Complete
Both sides exchange TS2 until:
- 8 consecutive TS2 received
- AND minimum 2 ms has elapsed since entering Configuration state
The 2 ms minimum is a guard interval to allow all lanes to settle and converge regardless of implementation variance.
8.7 Configuration.Idle
Both sides stop sending TS2 and send Electrical Idle followed by Logical Idle symbols:
Logical Idle symbol — D0.0 (0x00), RD-:
10-bit RD-: 1001_110100
Wire (LSB first): 1 0 0 1 1 1 0 1 0 0
Both sides send 8 consecutive Logical Idle symbols on each lane. On receipt of these, both sides transition to L0.
9. L0 State — Gen1 Active Link
The link is now operational:
| Parameter | Value |
|---|---|
| Speed | Gen1, 2.5 GT/s per lane |
| Width | x2 |
| Encoding | 8b/10b |
| Raw bit rate | 2.5 Gb/s per lane |
| Effective rate | 2.5 × 0.8 = 2.0 Gb/s per lane (8b/10b overhead) |
| Aggregate BW | 2 lanes × 2.0 Gb/s = 4.0 Gb/s = 500 MB/s (bidirectional per direction) |
TLPs (Transaction Layer Packets) and DLLPs (Data Link Layer Packets) can now flow. Software is notified that a link is up.
The RC or EP (typically the RC, or the driver stack) can now initiate a speed change by entering Recovery.
10. Recovery State — Speed Upgrade to Gen3
10.1 Recovery.RcvrLock — Requesting Speed Change
The RC (or EP) initiates Recovery by sending TS1 with:
- Speed Change bit set (Training Control bit 5 = 1 → byte = 0x20)
- Link = 0x00, Lane = 0x00 or 0x01 per lane
- Rate ID = 0x07 (all speeds)
RC → EP on Lane0 (pre-scramble):
Symbol: [ 0][ 1][ 2][ 3][ 4][ 5][ 6–15]
Byte: [BC][00][00][C8][07][20][4A × 10]
^^
Speed Change = 1 (bit 5)
Symbol 5, Training Control = 0x20:
0x20 = 0010_0000 binary = D0.1
bits[4:0] = 00000 = 0, bits[7:5] = 001 = 1 → D0.1
D0.1 RD-: 0110_010111 (wire: 0 1 1 0 0 1 0 1 1 1)
RC → EP on Lane1:
[BC][00][01][C8][07][20][4A × 10]
The EP receives these and responds with the same: TS1 with Speed Change bit = 1, its own Link=0, Lane=N.
EP → RC on Lane0:
[BC][00][00][C8][07][20][4A × 10]
EP → RC on Lane1:
[BC][00][01][C8][07][20][4A × 10]
Exit condition for Recovery.RcvrLock: receive 8 consecutive TS1 or TS2 (with or without speed change bit) on all active lanes.
10.2 Recovery.RcvrCfg — Confirming Speed with TS2
Both sides switch to TS2 (still Gen1, still 8b/10b, Speed Change bit = 1):
RC → EP on Lane0:
[BC][00][00][C8][07][20][45 × 10]
RC → EP on Lane1:
[BC][00][01][C8][07][20][45 × 10]
Exit condition: receive 8 consecutive TS2 with Speed Change bit set.
10.3 Recovery.Speed — PHY Retrain
Both sides simultaneously:
- Assert electrical idle (stop driving differential data).
- Reset the PHY PLL from 2.5 GT/s to 8 GT/s.
- Switch the serializer/deserializer (SerDes) to Gen3 signaling parameters (different equalization, different reference voltage levels).
- Switch framing from 8b/10b to 128b/130b.
- Reset the LFSR scrambler to the Gen3 seed.
There is a mandatory quiet time (Electrical Idle) during this phase. The timeout for this step is 24 ms.
10.4 Recovery.RcvrLock Again (at Gen3)
After the PHY switches to 8 GT/s, both sides retransmit TS1 — now using 128b/130b framing.
Each TS1 block at Gen3 is a 130-bit unit:
[sync: 01][TS1 payload: 128 bits]
^^^^^^^^
Ordered set block indicator
Payload (128 bits = 16 bytes), logical byte layout:
Byte: [TS1_OS_ID][LNK][LAN][N_FTS][RATE][CTRL][4A×10][rsvd×2]
[0xF0 ][0x00][0x00][0xC8][0x07][0x20][4A..4A][00 00]
Note: The Gen3 TS1 ordered set identifier byte is
0xF0(different from Gen1/Gen2, which uses the symbol position to indicate "this is an ordered set"). In 128b/130b, a dedicated byte in the payload identifies the ordered set type.
On the wire (Lane0, first 20 bits of a Gen3 TS1 block, before scrambling):
Sync header (2 bits): 0 1
Byte 0 = 0xF0 = 1111_0000, wire (LSB first): 0 0 0 0 1 1 1 1
Byte 1 = 0x00 = 0000_0000, wire (LSB first): 0 0 0 0 0 0 0 0
...continues for remaining 14 bytes (112 bits)
Total = 2 + 128 = 130 bits per block
Both lanes (Lane0, Lane1) transmit identical ordered sets simultaneously.
10.5 Recovery.RcvrCfg (at Gen3) — TS2
Both sides switch to TS2 at Gen3. Same 130-bit block format, payload byte 0 = TS2 OS identifier (0xF2 for TS2 in 128b/130b), symbols 6–15 equivalent = 0x45 bytes.
Exit condition: 8 consecutive TS2 at Gen3 on all active lanes.
10.6 Transition to L0 (Gen3)
Both sides send Logical Idle in 128b/130b and transition to L0 at Gen3.
Final Link State:
| Parameter | Value |
|---|---|
| Width | x2 |
| Speed | Gen3, 8 GT/s per lane |
| Encoding | 128b/130b |
| Raw bit rate | 8 Gb/s per lane |
| Encoding overhead | 2/130 ≈ 1.54% |
| Effective rate | 8 × (128/130) ≈ 7.877 Gb/s per lane |
| Aggregate BW | 2 lanes × 7.877 / 8 ≈ ~1.97 GB/s (per direction) |
12. Complete LTSSM State Transition Summary
| State | Duration / Exit Condition | Lanes active |
|---|---|---|
| Detect.Quiet | Receiver detection logic runs | All (RC: 4) |
| Detect.Active | Receiver seen on Lane0, Lane1; Lane2/3 deactivated | RC: 2, EP: 2 |
| Polling.Active | TX ≥1024 TS1 AND RX 8 consecutive TS1/TS2 | 2 per side |
| Polling.Configuration | RX 8 consecutive TS2 AND TX ≥16 TS2 | 2 per side |
| Configuration.LinkWidth.Start | Send PAD/PAD TS1 | 2 per side |
| Configuration.LinkWidth.Accept | RX TS1 with Link≠PAD, Lane≠PAD | 2 per side |
| Configuration.LaneNum.Wait | Switch to TS2 with Link/Lane assigned | 2 per side |
| Configuration.LaneNum.Accept | RX 2 consecutive matching TS2 | 2 per side |
| Configuration.Complete | RX 8 consecutive TS2 AND ≥2 ms in Configuration | 2 per side |
| Configuration.Idle | RX 8 Logical Idle symbols | 2 per side |
| L0 (Gen1) | Active data transfer at 2.5 GT/s x2 | 2 per side |
| Recovery.RcvrLock | RX 8 consecutive TS1/TS2 (speed change requested) | 2 per side |
| Recovery.RcvrCfg | RX 8 consecutive TS2 with speed change bit | 2 per side |
| Recovery.Speed | PHY retrains to Gen3; 24 ms timeout | 2 per side |
| Recovery.RcvrLock (Gen3) | RX 8 consecutive TS1/TS2 at 8 GT/s (128b/130b) | 2 per side |
| Recovery.RcvrCfg (Gen3) | RX 8 consecutive TS2 at 8 GT/s | 2 per side |
| L0 (Gen3) | Active data transfer at 8 GT/s x2 ≈ 1.97 GB/s | 2 per side |
Top comments (0)