What's New in the Optimized Build
The optimized version of NDM-TCP (ndm_tcp_lkm_optimized.c) introduces several low-level improvements over the standard v1.0 implementation while maintaining the same algorithmic approach.
Key Optimizations
1. AVX/SIMD Acceleration
The optimized version includes x86_64 AVX intrinsics for neural network forward pass computation. When available, the first hidden layer computation uses vectorized vpmaddwd and vphaddd instructions to parallelize multiply-accumulate operations across input features.
2. Reduced Memory Footprint
- RTT history window compressed from 16 to 8 slots
- Hidden layer reduced from 8 to 4 neurons
- Input features streamlined from 8 to 6
- Total struct size optimized to exactly 64 bytes (ICSK_CA_PRIV_SIZE limit)
3. LUT-Based Activation Functions
Pre-computed lookup tables replace runtime calculations for tanh and sigmoid, trading ~500 bytes of read-only memory for faster activation.
4. Fast Entropy Calculation
Histogram binning uses fixed-point arithmetic and bit shifts instead of floating-point division, with an 8-bin LUT for entropy values.
5. Computation Caching
When network conditions are stable (low entropy, high plasticity), the module reuses the previous cwnd delta for up to 8 consecutive ACKs, skipping neural network inference.
Important Disclaimers
This optimized version does NOT represent actual performance gains from the real NDM-TCP v1.0 algorithm. The optimizations here are purely implementation-level improvements for compilation and runtime efficiency. The core congestion control logic and performance characteristics remain conceptually similar to the standard version.
These changes primarily affect:
- CPU cycles per packet processing(expecting 56% to 62% total reduction)
- Memory cache efficiency
- Compilation time and binary size
They do not fundamentally alter the network throughput, latency, or congestion response that users would observe in real-world testing.
Compilation Instructions
To compile the optimized version:
# Option 1: Rename and compile
cp ndm_tcp_lkm_optimized.c ndm_tcp_lkm.c
make
# Option 2: Modify Makefile to target optimized source directly
The module requires kernel headers and FPU support configuration. AVX optimizations activate automatically on compatible x86_64 systems.
Repository
Source code and build instructions: https://github.com/hejhdiss/lkm-ndm-tcp
Note: Both versions implement the same entropy-aware, neural network-based TCP congestion control algorithm. Choose the optimized build for production deployments where CPU efficiency matters, or stick with the standard version for easier debugging and code readability.
Top comments (0)