Muhammed Shafin P

Posted on Feb 15

NDM-TCP: Why v1 Remains the Main Version (Delay Enhancement Experiments)

#ai #discuss #hejhdiss #network

GitHub Repository: https://github.com/hejhdiss/lkm-ndm-tcp

Important Clarification: Updated and Why?

There was a significant confusion in the original analysis. When I uploaded test results to Claude Sonnet AI, it compared v1's 6.82 Mbps result from the 100ms ±200ms extreme delay test against the 50ms ±100ms moderate delay test results (21.8-27.3 Mbps range). The AI either got the data mixed up or misunderstood which test was which.

The truth is more nuanced:

Extreme delay (100ms ±200ms): v1 performed between good and excellent
Moderate delay (50ms ±100ms): v1 performed good but not excellent

The Pure Delay Problem: Two Different Scenarios

NDM-TCP v1 has different performance levels in two distinct pure delay scenarios. Understanding these differences is critical to understanding v1's actual capabilities and limitations.

Test 1: Extreme Pure Delay (100ms ±200ms)

Test Conditions:

Delay: 100ms with ±200ms variation (0-300ms+ range)
No packet loss, no bandwidth limits
Pure delay only
Duration: 40 seconds

Results:

Algorithm	Throughput	Retransmissions	Performance
NDM-TCP v1	6.82 Mbps	34	✅ Between good and excellent
Cubic	6.87 Mbps	54	Similar throughput
Reno	7.94 Mbps	36	Slightly better throughput
BBR	48.1 Mbps	427	❌ High throughput, poor stability

Analysis:

NDM-TCP v1 achieved between good and excellent stability in this extreme delay scenario:

Lowest retransmissions: 34 (vs Cubic's 54, Reno's 36, BBR's 427)
Nearly identical throughput to Cubic: 6.82 vs 6.87 Mbps (only 0.05 Mbps difference)
Only 1.1 Mbps behind Reno: 6.82 vs 7.94 Mbps (14% difference)
Significantly more stable than BBR: 34 vs 427 retransmissions

Verdict for 100ms ±200ms: NDM-TCP v1 performed between good and excellent. Not perfect, but clearly competitive in both stability and throughput.

Test 2: Moderate Pure Delay (50ms ±100ms)

Test Conditions:

Delay: 50ms with ±100ms variation (0-150ms range)
No packet loss, no bandwidth limits
Pure delay only
Duration: 60 seconds

Results:

Algorithm	Throughput	Retransmissions	Performance
Reno	27.3 Mbps	76	Excellent
Cubic	23.5 Mbps	74	Excellent
NDM-TCP v1	21.8 Mbps	84	Good, not excellent
BBR	57.6 Mbps	1,227	❌ Very high throughput, poor stability

Analysis:

This is where v1's weakness becomes clear:

v1 throughput: 21.8 Mbps - good but not excellent
Reno throughput: 27.3 Mbps (25% higher than v1)
Cubic throughput: 23.5 Mbps (8% higher than v1)
v1 retransmissions: 84 - only slightly higher than Reno (76) and Cubic (74)
No clear advantage: v1 doesn't excel in stability OR throughput

Verdict for 50ms ±100ms: NDM-TCP v1 performed good but not excellent - and since NDM-TCP's philosophy is stability, this means it failed its purpose in cases similar to this.

Why I Said v1 "Failed" - The Philosophy Angle

Even though v1's performance was between good and excellent in extreme delay and good in moderate delay, I still say it failed its purpose. Here's why:

NDM-TCP is designed with stability as the core philosophy. For stability-critical scenarios, we need excellence, not just "good" performance.

In the 100ms ±200ms extreme delay case:

✅ v1 achieved excellent stability (34 retransmissions - lowest of all)
✅ v1 achieved good throughput (6.82 Mbps - nearly identical to Cubic)
✅ Overall: Between good and excellent performance

In the 50ms ±100ms moderate delay case:

⚠️ v1 achieved good stability (84 retransmissions - comparable to others)
⚠️ v1 achieved good throughput (21.8 Mbps - but 25% behind Reno)
❌ Overall: Good but not excellent - no clear advantage

The failure: In moderate pure delay scenarios similar to 50ms ±100ms, v1 doesn't demonstrate the excellence expected from a stability-focused algorithm. It's not bad, but it doesn't shine or excel.

High-Frequency Trading (HFT) networks - Ultra-low latency links with minimal buffering where delay variation comes from route changes
Dedicated fiber connections - Point-to-point links with minimal packet loss but variable delay from temperature or physical changes
Satellite communication during clear weather - Atmospheric delay variation without signal loss
Quality wireless links - Good signal strength but variable delay from interference patterns
Data center cross-connects - Well-maintained links where delay varies but packet loss is extremely rare

These scenarios require a very specific combination: delay variation only, with NO loss, NO queueing, NO bandwidth constraints, and NO other typical network issues. This combination almost never occurs in real-world networks.

When Do Pure Delay Scenarios Actually Occur?

These failure modes only manifest in highly specific edge cases that are extremely rare in normal network conditions:

Why Not BBR-Inspired Approaches?

I considered implementing a BBR-inspired solution since BBR excels in pure delay scenarios (81.6 Mbps compared to other algorithms).

However, BBR demonstrates extreme instability with massive retransmissions:

Pure delay test: 885 retransmissions
Extreme delay test: 1,326 retransmissions
High throughput achieved at the complete sacrifice of stability

This fundamentally conflicts with NDM-TCP's core principle: stability first.

v4 Development: Delay Enhancement Experiments

I attempted to create v4 with delay awareness to address the pure delay problem. Here's a summary of the experimental approaches:

Changes Implemented in v4

Two New Inputs Added (replacing dummy inputs 6 & 7):

Input 6 - Queuing Delay:

/* Calculate absolute difference between current RTT and minimum RTT */
u32 q_delay = (rtt_us > ca->min_rtt_us) ? (rtt_us - ca->min_rtt_us) : 0;
inputs[6] = (s32)min_t(u64, (q_delay * 1000ULL) / max(ca->min_rtt_us, 1U), 1000);

Input 7 - RTT Gradient:

/* Calculate difference between current RTT and previous RTT */
u16 last_rtt_ms = ca->rtt_history[(ca->history_index + ENTROPY_WINDOW_SIZE - 1) % ENTROPY_WINDOW_SIZE];
s32 rtt_diff = (s32)(rtt_us / 1000) - (s32)last_rtt_ms;
inputs[7] = clamp(rtt_diff * 100, -1000, 1000);

Modified Congestion Avoidance Logic

Three new response modes were added to ndm_tcp_cong_avoid():

/* 1. DELAY-ONLY DETECTION: High queuing delay but no loss/entropy signals */
if (inputs[6] > 800 && !ca->congestion_detected) {
    /* Pure delay scenario: cautious growth */
    u32 delta = max(1U, acked * cwnd_delta / 3000); 
    tcp_cong_avoid_ai(tp, tp->snd_cwnd, delta);
}

/* 2. REAL CONGESTION: Low entropy + High RTT Gradient */
else if (ca->has_data && ca->congestion_detected) {
    /* Conservative growth, extra conservative if gradient is high */
    u32 divisor = (inputs[7] > 500) ? 4000 : 2000;
    u32 delta = max(1U, acked * cwnd_delta / divisor);
    tcp_cong_avoid_ai(tp, tp->snd_cwnd, delta);
}

/* 3. NOISE/CLEAR PATH: High entropy or low delay */
else if (ca->has_data && !ca->congestion_detected) {
    /* High entropy = noise: be aggressive */
    u32 delta = max(1U, acked * cwnd_delta / 1000);
    tcp_cong_avoid_ai(tp, tp->snd_cwnd, delta);
}

The Fundamental Issues with v4

Multiple variations were tested:

BBR-like delay additions → Increased RTT and high retransmissions
Delay queue + RTT variance → Lower retransmissions but introduced other problems
RTT jitter + variance (current v4) → Low retransmissions but significantly reduced throughput

v4 Test Results (50ms ±100ms delay, 40 seconds):

Throughput: 11.6 Mbps
Retransmissions: 70

v1 Comparison (same delay conditions, 60 seconds):

v1: 21.8 Mbps with 84 retransmissions
v4: 11.6 Mbps with 70 retransmissions (scaled to 60s: ~105 retransmissions)

The Design Philosophy Conflict

Even though v4's results aren't a complete failure, they create a fundamental philosophical problem with NDM-TCP's design.

NDM-TCP prioritizes stability over raw throughput. This is why we can't simply adopt Cubic or Reno approaches here - they have their own drawbacks in many scenarios where v1 excels (as demonstrated in previous test results).

The conflict: v4 achieves the lowest retransmission count (70), which aligns with stability goals. However, its throughput (11.6 Mbps) creates a problematic trade-off.

Since NDM-TCP follows a stability-focused design and has delivered excellent results in all other tests, I initially expected it to perform well in this delay case too. That expectation was misguided.

The delay-based mathematical additions significantly reduce throughput. The acceptability of this trade-off is unclear because:

Stability requires balancing both retransmissions AND throughput
v4 has excellent retransmission stability (70) but poor throughput (11.6 Mbps)
v1 has good retransmission stability (84) with better throughput (21.8 Mbps)

I prefer v1 over v4 because v1 provides better overall balance based on all localhost tests conducted so far.

Critical Unknown: We haven't tested whether v4 causes problems in the other scenarios where v1 excels.

Why I'm Maintaining v1 as Primary

We cannot specialize for specific edge cases when it degrades general performance.

Based on all localhost tests conducted to date, NDM-TCP v1 performs well in general network scenarios and delivers better overall results than alternatives:

v1's Performance Summary:

✅ Constrained networks (loss + delay + bandwidth limits): Excellent
✅ Loss-heavy environments: Excellent (26-63 retransmissions)
✅ Mixed conditions: Good stability
✅ Extreme pure delay (100ms ±200ms): Between good and excellent (34 retransmissions - lowest of all)
❌ Moderate pure delay (50ms ±100ms): Good but not excellent - fails stability-first philosophy

Recommendations for Pure Delay Edge Cases

For Extreme Pure Delay (100ms ±200ms or similar):

NDM-TCP v1 (Recommended):

Extreme delay test: 6.82 Mbps, 34 retransmissions
Between good and excellent stability
Lowest retransmissions of all algorithms
Competitive throughput

Reno (Alternative for higher throughput):

Extreme delay test: 7.94 Mbps, 36 retransmissions
Slightly higher throughput (14% more than v1)
Still excellent stability

For Moderate Pure Delay (50ms ±100ms or similar):

When v1 fails to excel in moderate delay scenarios, users should consider:

Cubic (Recommended for Balance):

Moderate delay test: 23.5 Mbps
Retransmissions: 74 (lowest among traditional algorithms)
Proven traditional algorithm
Best compromise for pure delay scenarios

Reno (For Higher Throughput):

Moderate delay test: 27.3 Mbps
Retransmissions: 76
Balanced approach with good throughput

BBR (Maximum Throughput, Stability Sacrifice):

Moderate delay test: 57.6 Mbps
Retransmissions: 1,227 (extremely high)
Use only if raw throughput is critical

v1 Performance in Moderate Delay (For Comparison):

21.8 Mbps
84 retransmissions (comparable to Reno/Cubic)
Actually performs reasonably well, just not maximum throughput

v4 Performance in Moderate Delay:

11.6 Mbps (47% slower than v1)
70 retransmissions (lowest of all, but at significant cost)
Too much throughput sacrificed for minimal retransmission improvement

I am not modifying v1 further. It works well for general cases based on all localhost testing conducted to date.

Community Invitation: Experiment with Your Own Version

The complete v4 code is available in the GitHub repository. If you want to experiment with delay enhancements, here is the full v4 congestion avoidance logic:

Complete v4 Congestion Avoidance Implementation

/* Apply congestion control decision */
if (ca->in_slow_start) {
    /* Slow start: exponential growth */

    /* ADAPTIVE DELAY RESPONSE: If RTT gradient is high (Input 7), 
       slow down even if no loss is detected yet. */
    if (ca->congestion_detected || inputs[7] > 400) {
        /* Detected congestion or rising delay, grow slower */
        tcp_slow_start(tp, acked / 2);
    } else {
        /* Normal slow start */
        tcp_slow_start(tp, acked);
    }
} else {
    /* Congestion avoidance */

    /* 1. DELAY-ONLY DETECTION: High queuing delay but no loss/entropy signals */
    if (inputs[6] > 800 && !ca->congestion_detected) {
        /* Pure delay scenario: use a cautious delta to avoid bufferbloat */
        u32 delta = max(1U, acked * cwnd_delta / 3000); 
        tcp_cong_avoid_ai(tp, tp->snd_cwnd, delta);
    }
    /* 2. REAL CONGESTION: Low entropy + High RTT Gradient */
    else if (ca->has_data && ca->congestion_detected) {
        /* Real congestion: be conservative */
        /* If gradient (Input 7) is also high, be extra conservative */
        u32 divisor = (inputs[7] > 500) ? 4000 : 2000;
        u32 delta = max(1U, acked * cwnd_delta / divisor);
        tcp_cong_avoid_ai(tp, tp->snd_cwnd, delta);
    } 
    /* 3. NOISE/CLEAR PATH: High entropy or low delay */
    else if (ca->has_data && !ca->congestion_detected) {
        /* High entropy = noise: be aggressive */
        u32 delta = max(1U, acked * cwnd_delta / 1000);
        tcp_cong_avoid_ai(tp, tp->snd_cwnd, delta);
    } 
    /* 4. FALLBACK */
    else {
        /* Not enough data: use standard Reno */
        tcp_reno_cong_avoid(sk, ack, acked);
    }
}

My Position on v1 vs v4

Based on all localhost tests conducted so far, v1 maintains superiority for general cases.

The pure delay scenario is problematic, but we can't simply borrow approaches from Reno, Cubic, or BBR. NDM-TCP has a fundamentally different philosophy: stability first.

If minimizing retransmissions is your only goal: v4 achieves this (70 retransmissions), but at the cost of poor throughput.

If you want balanced stability: v1 is superior (84 retransmissions with 21.8 Mbps throughput).

My preference remains v1 over v4 because stability means balancing both retransmissions AND throughput, not just minimizing one metric.

Alternative Recommendation for Pure Delay Cases

For pure delay scenarios where v4 is not preferred, use Cubic:

23.5 Mbps throughput
74 retransmissions (excellent stability)
Proven traditional algorithm
Better overall balance than v4 for these specific cases

You can modify v1 with this code if you wish to experiment, or create your own version with different thresholds and approaches.

Be aware: Delay-based additions reduce throughput significantly. I cannot modify v1 further because the trade-offs remain unclear, and v1 performs well for general cases.

Final Recommendations

For General Network Use: NDM-TCP v1

Proven across multiple localhost test scenarios
Excellent stability in loss-heavy conditions
Good performance in constrained networks
Between good and excellent in extreme pure delay (100ms ±200ms): 6.82 Mbps, 34 retransmissions
Good but not excellent in moderate pure delay (50ms ±100ms): 21.8 Mbps, 84 retransmissions
Only fails to excel in moderate pure delay scenarios - still performs acceptably

For Extreme Pure Delay (100ms ±200ms or similar):

NDM-TCP v1 (Recommended) - Between good and excellent: 6.82 Mbps, 34 retransmissions (lowest of all)
Reno - Slightly higher throughput: 7.94 Mbps, 36 retransmissions

For Moderate Pure Delay (50ms ±100ms or similar):

Reno (Recommended) - Best throughput: 27.3 Mbps, 76 retransmissions
Cubic - Excellent balance: 23.5 Mbps, 74 retransmissions (lowest retransmissions)
NDM-TCP v1 - Works but doesn't excel: 21.8 Mbps, 84 retransmissions
BBR - Maximum throughput with massive retransmission cost: 57.6 Mbps, 1,227 retransmissions
v4 - Only if absolute lowest retransmissions (70) is required and very low throughput (11.6 Mbps) is acceptable

I prefer v1 over v4 because stability requires balancing retransmissions AND throughput together, not optimizing a single metric.

Understanding the AI Comparison Confusion

What Happened

When I uploaded the test results to Claude Sonnet AI for analysis, the AI compared:

v1's 6.82 Mbps from the 100ms ±200ms extreme delay test
Against the 50ms ±100ms moderate delay test results (21.8-27.3 Mbps range)

The AI concluded v1 "failed catastrophically" because 6.82 Mbps is much lower than 21.8-27.3 Mbps.

The Problem

These are two different tests with different delay ranges. The AI either:

Got the data mixed up (comparing wrong test scenarios)
Had difficulty understanding which test result belonged to which scenario

The Reality

When compared within the same test, v1's performance looks very different:

In 100ms ±200ms test: v1 got 6.82 Mbps, others got 6.87-7.94 Mbps (v1 is competitive)
In 50ms ±100ms test: v1 got 21.8 Mbps, others got 23.5-27.3 Mbps (v1 is behind)

Why This Clarification Matters

Avoiding negative reader impressions: Some readers might see "6.82 Mbps" and "failed catastrophically" and think NDM-TCP v1 is terrible in all pure delay cases. That's not accurate.

The truth is:

v1 is between good and excellent when delay variation is extreme (100ms ±200ms)
v1 is good but fails its purpose when delay variation is moderate (50ms ±100ms)
The AI comparison confused these two different scenarios

For general network use, v1 remains excellent based on all other localhost tests (loss, congestion, mixed conditions).

Critical Need: Real-World Testing

Community help is desperately needed for real-world testing. All these results come from localhost artificial testing. Real hardware validation is critical to understand:

How these algorithms actually perform on production networks
Whether v1's or v4's approach works better in real deployment scenarios
What the actual trade-offs are outside of localhost simulation
Which version should be recommended for real-world use

Community contributions are welcome if you can solve the pure delay problem without sacrificing v1's general-case performance.

Disclaimer: All results are from localhost artificial testing. None of these versions have been tested on real hardware. Real hardware validation is critically needed.

Key clarifications:

v1 performs between good and excellent in extreme pure delay (100ms ±200ms)
v1 performs good but not excellent in moderate pure delay (50ms ±100ms)
v4 is experimental and has only been tested in one scenario beyond v1's test suite
v1 remains the main version for general use based on localhost testing results across multiple scenarios

Community testing on real hardware is essential before production deployment. Pure delay-only scenarios are extremely rare edge cases - even in those, v1 performs between good and excellent in extreme conditions and acceptably (though not excellent) in moderate conditions.

DEV Community