Muhammed Shafin P

Posted on Feb 15

NDM-TCP vs Reno vs Cubic vs BBR: Testing Summary and Recommendations

#ai #discuss #hejhdiss #network

GitHub Repository: https://github.com/hejhdiss/lkm-ndm-tcp

Introduction: Multiple Localhost Tests, One Pattern

I conducted multiple 40-second localhost tests comparing NDM-TCP against Reno, Cubic, and BBR (assumed v1) under different network conditions. These tests were NOT just two scenarios—they represent various localhost environments with different artificial constraints using tc (traffic control).

Critical Context: All of these tests were conducted on localhost with artificial environment constraints. This is why the results are interesting for research, but also why community validation on real hardware is absolutely essential.

The pattern that emerged across these multiple localhost test scenarios provides insights about when each algorithm performs best—but remember, artificial localhost testing ≠ real-world performance.

Test 1: Constrained Network (Poor WiFi Equivalent)

Test Conditions

This test was one of several localhost scenarios with artificial constraints using tc:

20 Mbps bandwidth limit
±50ms jitter (very high variation)
0.5% packet loss (correlated)
Packet duplication and reordering

Real-World Equivalent: This resembles:

Poor WiFi connection with interference
3G mobile connection
Low signal 4G (2 bars) that gets bursts around 20 Mbps but lacks stability
Congested public WiFi
Networks with high noise and instability

Results Summary

Algorithm	Throughput	Retransmissions	Key Behavior
NDM-TCP	19.6 Mbps	63 ✅	Stable, adaptive
Cubic	18.7 Mbps	144	Burst retransmissions
Reno	18.5 Mbps	145	Similar to Cubic
BBR (v1)	17.4 Mbps	243 ❌	Struggled badly

Key Finding: NDM-TCP Excels in Poor Conditions

Even in what should be BBR's territory (lossy, variable networks), BBR showed poor results.

BBR had 243 retransmissions vs NDM-TCP's 63 (nearly 4x worse)
BBR achieved the lowest throughput (17.4 Mbps)
BBR struggled throughout the entire 40 seconds
Multiple intervals with 0.00 Mbps throughput

Why BBR Struggled:
BBR is designed for real WAN connections with bufferbloat, not artificial localhost constraints. The simulated high jitter and correlated packet loss confused BBR's bandwidth and RTT models, causing aggressive probing that led to excessive retransmissions.

Why NDM-TCP Succeeded:

Entropy-based noise detection distinguished random loss from real congestion
Adaptive learning handled chaotic conditions
Conservative approach avoided triggering unnecessary retransmissions
Maintained stable throughput throughout the test

Clear Conclusion: In these poor, noisy network conditions, NDM-TCP showed dramatically cleaner and more stable performance than all other algorithms, including BBR.

Test 2: Clean Network (Pure Localhost)

Test Conditions

This test had NO artificial constraints:

No bandwidth limits
No delays or jitter
No packet loss
No packet manipulation
Pure localhost loopback performance

Real-World Equivalent: This resembles:

Very good wired connection
Clean, stable LAN
High-quality fiber broadband
Ideal conditions with no interference

Results Summary

Algorithm	Throughput	Retransmissions	Cwnd Behavior
Cubic	60.6 Gbps ✅	20	Stable 1.06 MB
Reno	59.6 Gbps	8	Stable 2.19 MB
NDM-TCP	59.4 Gbps	3 ✅	Grows to 15.1 GB!
BBR (v1)	58.0 Gbps	37	Variable (256KB-1.5MB)

Key Finding: Cubic Best for Raw Throughput, NDM-TCP Still Cleanest

In Cubic's own turf (clean, stable network), Cubic won throughput—but only barely.

Cubic: 60.6 Gbps (highest throughput)
NDM-TCP: 59.4 Gbps (only 1.2 Gbps less = 2% difference)
NDM-TCP: 3 retransmissions (still the lowest by far)

Cubic's Victory:
Cubic's cubic growth function is perfectly suited for stable, high-bandwidth environments. Its conservative approach (1.06 MB cwnd) worked efficiently on clean localhost.

NDM-TCP's Behavior:

Scaled cwnd to 15.1 GB (gigabytes!) - incredibly aggressive
Still achieved only 3 retransmissions despite massive window
Showed it can scale to very high performance when conditions allow
Still demonstrated cleaner and more stable behavior (fewer retransmissions)

BBR's Continued Struggles:
Even in clean conditions, BBR had 37 retransmissions (worst of all), showing that localhost testing (constrained or not) doesn't suit BBR's design philosophy.

Practical Recommendations: When to Use What

Based on these artificial localhost test results:

Use TCP Cubic When:

✅ You have a very good, clean connection

Stable fiber broadband
Quality wired LAN
Reliable, low-jitter network
You prioritize raw throughput over everything else
Network conditions are predictable and stable

Why: Cubic performed best in clean conditions (60.6 Gbps) and its conservative approach works well when the network is reliable.

I recommend Cubic for maximum throughput on good connections.

Use NDM-TCP When:

✅ You need stability and consistency

Poor WiFi with interference
Mobile connections (3G, weak 4G)
Networks with variable quality
High-jitter or lossy environments
You prioritize connection stability over peak throughput
Retransmissions are costly (satellite, metered connections)

Why: NDM-TCP showed:

Dramatically fewer retransmissions in both tests (63 vs 243 in poor conditions, 3 vs 37 in clean)
Stable, predictable behavior
Adaptive response to network chaos
Near-Cubic performance even in clean conditions (only 2% less throughput)

NDM-TCP is for stability—and so far, it has consistently shown stability across different test scenarios.

What We Still Don't Know: Edge Cases and Limitations

Not everything is perfect. NDM-TCP has only been tested in two artificial localhost scenarios. We need community support to find:

Where NDM-TCP Can't Maintain Stability:

❓ Real hardware with real network conditions
❓ Very high-speed networks (100 Gbps+) with real NICs
❓ Long-distance connections (transcontinental, satellite)
❓ Rapidly changing networks (mobile handoffs, route changes)
❓ Extreme congestion (thousands of competing flows)
❓ Production workloads (not just iperf3)
❓ Different bandwidth ranges (1 Mbps? 10 Gbps?)

Edge Cases That Need Testing:

❓ What happens during sudden network failures?
❓ How does it handle asymmetric links (different up/down speeds)?
❓ Does it work with middleboxes, NAT, firewalls?
❓ What about wireless mesh networks?
❓ How does it coexist with other algorithms in shared networks?
❓ Long-term stability (hours, days, weeks of runtime)?

Critical Questions:

❓ Where does NDM-TCP break down?
❓ What are its failure modes?
❓ What conditions confuse its entropy detection?
❓ When does adaptive learning make wrong decisions?

We need to know what NDM-TCP cannot manage, not just what it can do well.

The BBR Observation: Even in Its Territory

One surprising finding: Even in conditions that should favor BBR (lossy, variable network), BBR showed poor results in these artificial tests.

The 20 Mbps constrained test was meant to simulate exactly the kind of environment BBR is designed for—bandwidth-limited connections with packet loss. Yet:

BBR had the worst retransmissions (243)
BBR had the lowest throughput (17.4 Mbps)
BBR never stabilized over 40 seconds

Why? Likely because:

Localhost testing doesn't represent real bufferbloat
Artificial tc constraints don't mimic real network behavior
BBR's model-based approach was confused by simulated conditions
BBR v1 (assumed) may not handle these specific patterns well

This reinforces that artificial testing has severe limitations. BBR is proven in real-world Google production—these localhost results don't invalidate that. They just show that simulation ≠ reality.

Honest Assessment: All Tests Were Localhost with Artificial Constraints

Every single test mentioned was conducted on artificial localhost environments:

Not real network hardware
Not real propagation delays
Not real packet processing
Not real congestion
Running on VMware virtualization
All network conditions created artificially using tc (traffic control)

Whether constrained (with tc rules) or unconstrained (pure loopback), these were localhost tests—not real LAN, not real WAN, not real internet connections.

These tests reveal algorithm behavior in controlled localhost conditions, but cannot predict real-world performance on actual network hardware.

This is exactly why community validation is absolutely critical. We need testing on:

Real network interfaces (not loopback)
Real routers and switches (not tc emulation)
Real propagation delays (not artificial tc delays)
Real congestion (not simulated constraints)
Real hardware packet processing
Real production networks

Community Support Needed

To truly understand NDM-TCP's capabilities and limitations, we need:

Real Hardware Testing:
- Actual network interfaces
- Physical routers and switches
- Real fiber, wireless, satellite links
Diverse Scenarios:
- Different bandwidths (1 Mbps to 100+ Gbps)
- Various latencies (sub-ms to hundreds of ms)
- Multiple loss patterns
- Production traffic (not just iperf3)
Edge Case Discovery:
- Find where it breaks
- Identify failure modes
- Determine limitations
- Map performance boundaries
Independent Validation:
- Testing by multiple researchers
- Different environments
- Statistical analysis
- Peer review

Conclusion: Patterns from Multiple Localhost Tests

Across multiple localhost test scenarios with different artificial constraints, a pattern emerged:

For maximum raw throughput on good connections: Use TCP Cubic ✅

For stability and consistency across varying conditions: Use NDM-TCP ✅

For real WAN with bufferbloat: Use BBR (but localhost testing doesn't suit its design)

NDM-TCP, Reno, Cubic, and BBR were all tested in various localhost environments—some with extreme artificial constraints (tc-created packet loss, jitter, bandwidth limits), some without any constraints (pure loopback). The results show interesting patterns, but they're all from localhost with artificial environment constraints.

This is why community validation is essential:

We need real hardware, not localhost loopback
We need real networks, not tc-emulated conditions
We need diverse real-world scenarios, not simulated constraints
We need to find NDM-TCP's real limitations, not just localhost behavior

Not everything is perfect. We need to know what NDM-TCP cannot handle in real networks, not just what it does well in localhost simulations.

Community collaboration on real hardware is absolutely critical to move beyond artificial localhost testing and discover NDM-TCP's true capabilities and limitations.

Disclaimer: All results are from multiple localhost test scenarios with artificial environment constraints created using tc (traffic control). BBR is assumed to be v1. Performance on real network hardware may differ significantly. These recommendations are based solely on localhost simulation results with artificial constraints, not production validation on real networks.

DEV Community

NDM-TCP vs Reno vs Cubic vs BBR: Testing Summary and Recommendations

Introduction: Multiple Localhost Tests, One Pattern

Test 1: Constrained Network (Poor WiFi Equivalent)

Test Conditions

Results Summary

Key Finding: NDM-TCP Excels in Poor Conditions

Test 2: Clean Network (Pure Localhost)

Test Conditions

Results Summary

Key Finding: Cubic Best for Raw Throughput, NDM-TCP Still Cleanest

Practical Recommendations: When to Use What

Use TCP Cubic When:

Use NDM-TCP When:

What We Still Don't Know: Edge Cases and Limitations

Where NDM-TCP Can't Maintain Stability:

Edge Cases That Need Testing:

Critical Questions:

The BBR Observation: Even in Its Territory

Honest Assessment: All Tests Were Localhost with Artificial Constraints

Community Support Needed

Conclusion: Patterns from Multiple Localhost Tests

Top comments (0)