DEV Community: Muhammed Shafin P

India Is Collapsing — And Why It Affects the Tech Industry

Muhammed Shafin P — Sun, 08 Mar 2026 12:26:38 +0000

GitHub repo: github.com/hejhdiss/Truth-Of-India
Live site: https://truthofindia.hejhdiss.workers.dev/

You might be wondering what a collection of investigative articles about Indian politics is doing on a tech platform.

Fair question. Here is the honest answer: this affects you directly — if you are a developer, startup founder, investor, or anyone building something in or with India.

India Shut Down the Internet More Than Any Country on Earth — Five Years Running

India is the world's largest democracy. It is also the world's number one executor of internet shutdowns, five consecutive years in a row. In 2022 alone, India was responsible for 84 of the 187 internet shutdowns recorded globally — nearly half the world's total. A peer-reviewed study published in the journal Democratization found this is not politically neutral: shutdowns are 3.5 times more frequent in BJP-governed states than in others.

Every time the internet goes down in a region, developers lose working hours, remote workers lose income, startups lose transactions, and businesses lose customers. The Kashmir Chamber of Commerce estimated 500,000 people lost their jobs during a single 2019 shutdown.

India's own Parliamentary Standing Committee called this "gross misuse causing untold suffering." The government's response? It told Parliament it does not even keep central records of shutdowns.

If you are building a product that serves Indian users — or building in India — this is your infrastructure risk. It is not theoretical. It has a dollar cost: $184 million in economic damage in 2022 alone, and that is a documented undercount.

Why This Is Being Posted on a Tech Site — And Why It Affects the Industry

Foreign Direct Investment into India fell from $10.1 billion to $353 million in FY2025 — a 96.5% collapse. Startup funding dropped 62%. Fintech funding dropped 63%. Indian startups raised $30.4 billion in 2024. Chinese firms raised $845 billion in a comparable period.

This does not happen in a vacuum. It happens when:

Tech companies face police raids for refusing government censorship orders
IIM and IIT directors resign over political interference in academic leadership
Research funding is redirected to government-approved topics while independent inquiry is defunded
The rule of law is applied selectively based on political loyalty

The innovation pipeline — the universities, research labs, and free intellectual environment that produces your next engineer, your next founder, your next breakthrough — is being systematically hollowed out. When the RSS directs PhD thesis topics and university seminars on democracy get cancelled, the damage shows up years later in the talent pool that never existed.

Why I Built This — And Why I'm Sharing It Here

I am Muhammed Shafin P, from Malappuram, Kerala. I am not a journalist. I am a developer and an Indian citizen who spent time reading publicly available government data, Supreme Court records, NCRB statistics, and international reporting — and could not look away from what the numbers said.

I built truthofindia.netlify.app as a simple static site: one index.html, ten .md files, deployed on Netlify. The entire repo is open: github.com/hejhdiss/Truth-Of-India. The architecture is deliberately simple so anyone can fork it, translate it, or mirror it.

The site covers ten documented topics: farmer suicides, electoral corruption, violence against minorities, children's malnutrition, digital censorship — and yes, Palestine. I included Palestine specifically because a significant number of Indians have been fed a manufactured narrative that frames the conflict as religious rather than political. Many Indians — including many in tech communities — repeat pro-Israel talking points without knowing the documented history of the Nakba, the termination of 200,000 Palestinian work permits after October 7, or the fact that Indian workers were sent to fill those jobs at Netanyahu's personal request to Modi. These are not opinions. They are documented facts in publicly available sources, and Indian citizens deserve to know them.

India is the world's largest population. What happens to Indian democracy, Indian sovereignty, and Indian civil society does not stay inside India's borders. It shapes global tech supply chains, global startup ecosystems, and global politics.

What I Am Asking

Nothing expensive. Just this:

Read it. Share it. Fork it if you want.

Every article has its sources listed. Every claim is verifiable. If you find something wrong, say so — the whole point is that people think for themselves based on actual evidence.

There are hundreds of millions of Indians — including tens of millions in tech — who have never seen this information laid out in one place, clearly sourced. The BJP's media ecosystem is enormous. This site is small. But the internet is still (for now) on.

If even a few thousand more people read this and ask harder questions of the people asking for their votes, their taxes, and their silence — that is worth every line of code.

Hosting Change — Netlify → Cloudflare Workers

This project was previously hosted on Netlify (free tier). The Netlify deployment served the article .md files as static assets directly alongside index.html — the frontend fetched them at runtime using fetch('filename.md').

The free tier bandwidth limit was hit due to reader traffic, so the site has been migrated to Cloudflare Workers (free tier), which has no bandwidth cap.

What Changed for `.md` Files

On Netlify, .md article files (e.g. betrayal-of-palastenians.md, south-india-vs-rss-brainwash.md) were deployed as plain static files and fetched directly by the browser.

On Cloudflare Workers, static files must be explicitly bundled and served through the Worker script. Each .md file is included as part of the Worker's assets so the fetch() calls in index.html continue to work without any changes to the frontend code.

Why Cloudflare

No bandwidth limits on the free tier
Global CDN edge delivery
Cloudflare's free tier handles high-traffic sites without throttling or suspension

- Zero config changes needed on the `index.html` side — the `.md` fetch logic remains identical

All articles are based on publicly verifiable sources: NCRB data, Supreme Court records, peer-reviewed research, UN reports, and internationally published journalism. Everything can be checked independently. That is the point.

Written by Muhammed Shafin P (hejhdiss) · Malappuram, Kerala · hejhdiss@gmail.com

Thinking Transformers: A Transformer That Reasons Before It Speaksking Transformer

Muhammed Shafin P — Fri, 06 Mar 2026 09:20:09 +0000

Most neural language models work the same way: take in a sequence of tokens, run one forward pass, and spit out a prediction. It's fast, it's well understood, and for many tasks it works well. But there's something fundamentally rushed about it — the model has exactly one shot to "think" before it answers, no matter how hard the problem is.

Thinking Transformers takes a different approach. Before producing any output, the model runs its hidden states through the full transformer stack multiple times — these are called think steps. Each pass lets the model refine its internal representation, catch contradictions, and build up a richer picture of the input before committing to an answer. The number of think steps is configurable, and crucially, every one of them is part of the computation graph — training uses full Backpropagation Through Time (BPTT) across all think steps and all layers simultaneously. The model doesn't just learn to predict; it learns how to think.

Alongside the reasoning loop, the architecture includes a small gated memory bank — a set of persistent slots that are read from and written to at each think step. This gives the model a lightweight working memory that can carry context forward across iterations, something a standard single-pass transformer simply cannot do.

The whole thing is built from scratch in plain C with no external dependencies beyond libm. It compiles into a shared library (transformer.so on Linux/macOS, transformer.dll on Windows) in a single GCC command. The Python layer wraps this library via ctypes, exposing a clean, minimal API through two classes: TransformerConfig, which holds all architecture hyperparameters (vocab size, embedding dimension, number of heads, feed-forward width, layers, sequence length, think steps, memory slots), and ThinkingTransformer, which is the model itself.

The Python API is deliberately straightforward. You call model.train_step(tokens, targets, lr) for a single training iteration — it handles zeroing gradients, running the full BPTT backward pass, and applying an Adam update internally. For more control, zero_grad(), backward(), and step() are all exposed separately. Inference is equally simple: model.generate(prompt, max_new_tokens) does greedy decoding, while model.generate_with_thinking(prompt) wraps the prompt in explicit THINK and PLAN tokens and returns not just the output tokens but the full reasoning structure and logits. Checkpoints save and load cleanly with model.save(path) and model.load(path).

Everything — architecture, training, and the reasoning loop — is open source and available at https://github.com/hejhdiss/Thinking-Transformers. The project is licensed under GNU GPL v3.

CodeLearn AI: Learn by Building

Muhammed Shafin P — Wed, 04 Mar 2026 08:29:01 +0000

CodeLearn AI is an innovative, AI-powered educational platform designed to help users master programming through a hands-on, "Learn by Building" approach. Created by developer hejhdiss, the platform integrates modern AI capabilities directly into a coding environment to provide real-time assistance, explanations, and interactive learning tools.

What is CodeLearn AI?

CodeLearn AI is a comprehensive coding playground and tutor that leverages Large Language Models (LLMs) to guide users through the development process. It serves as an all-in-one workspace where you can write code, chat with an AI assistant, and track your learning progress through a gamified interface.

The platform is highly customizable, allowing users to connect their own AI providers such as OpenAI, Anthropic, Google Gemini, or even local models via Ollama.

Key Features

AI Chat System: Includes a "Global Chat" for general inquiries and "Project Chats" specifically tailored to the context of your current coding project.
Integrated Code Editor: A built-in editor that supports multiple programming languages and stacks, allowing you to build and manage files within the browser.
Live Preview: For web projects (HTML/CSS/JS), the platform offers a real-time preview mode to see your changes instantly.
Gamified Learning: Tracks your progress using an XP (Experience Points) system, levels, and skill bars that visualize your proficiency in different technologies.
AI-Powered Code Analysis: Features tools to "Explain" specific files, "Optimize Code," and generate context-aware tips to improve your build.
Interactive Quizzes: Automatically generates quizzes based on your code or specific topics to test and reinforce your knowledge.
Modern UI: Offers a sleek, responsive interface with both Dark and Light theme options to suit your coding preference.

What is it for?

The primary goal of CodeLearn AI is to lower the barrier to entry for new programmers while providing a powerful sandbox for experienced developers to experiment with AI-driven development. It is designed for:

Beginners who need clear explanations of complex code structures.
Students who want to test their knowledge via AI-generated quizzes and track their skill growth.
Developers looking for a quick, AI-integrated environment to prototype ideas and optimize snippets.

By combining an editor, a chat assistant, and a gamified progression system, CodeLearn AI turns the solitary act of coding into an interactive, guided experience.

Qeltrix V6 — A Practical Guide to Seekable Encrypted Containers

Muhammed Shafin P — Mon, 02 Mar 2026 07:12:32 +0000

Full interactive documentation: https://qeltrix-v6.hejhdiss.workers.dev/

What Is Qeltrix V6?

Most encryption tools work the same way they did decades ago. You hand over a file, it gets locked, and to read a single byte of it you must decrypt the whole thing first. Seeking through an encrypted video file meant downloading the entire thing. Extracting a single log entry from a 500 GB encrypted backup meant unpacking the whole archive.

Qeltrix V6 is a new encrypted container format that throws out that assumption. It is built around one core idea: an encrypted file should behave like a seekable stream, not a locked safe. You should be able to jump to any position, request any byte range, and receive exactly those bytes — decrypted — without the system touching anything else.

The full technical documentation, CLI reference, Python API guide, deployment patterns, and security analysis live at qeltrix-v6.netlify.app. This article explains the ideas behind it and why they matter.

The Core Idea: Independent Blocks

The reason traditional encryption can't be seeked is that it treats the entire file as one object. Decrypt the end of the file and you've necessarily processed everything before it.

V6 takes a different approach. Every file is split into fixed-size blocks — 1 MB by default — and each block is encrypted independently with its own unique key, its own IV, its own authentication tag. The blocks are then packed into a .qltx container with a footer index that maps original byte offsets to block positions.

Traditional:  [  one big encrypted blob — all or nothing  ]

Qeltrix V6:   [ Header ] [ Block 0 ] [ Block 1 ] ... [ Block N ] [ Footer/Index ]
                          each block independently encrypted + authenticated

This changes what seeking costs. Instead of "decrypt everything up to the point you want," the cost becomes "look up the block containing your byte offset in the footer index, decrypt that block, slice out the bytes you need." For a 20 GB video, seeking to the 18-minute mark might decrypt two or three blocks — a few megabytes — instead of gigabytes.

The Key Hierarchy

The encryption design uses two layers of keys. Understanding both is important for understanding what V6 actually protects.

The Master Key (MK) is a 32-byte symmetric secret that acts as the root of trust for the entire container. It is never written to disk in plaintext. The container header stores an encrypted copy of it (wrapped with a passphrase-derived key), but that wrapped blob is useless without the passphrase. Every operation that touches the container — packing, unpacking, seeking, serving — requires the caller to supply the MK at runtime.

Content Derived Keys (CDKs) are per-block keys derived from the MK using HKDF-SHA256. Each CDK is derived from a combination of the master key, the block's sequential index, a SHA-256 hash of the block's raw content, and a random salt. This dual binding — to both position and content — has a meaningful security consequence: two identical blocks at different positions will have different keys, and if a block is moved to a different position without re-encryption, authentication will fail. The system detects it.

Passphrase
    ↓  PBKDF2-SHA256
Master Key (MK)
    ↓  HKDF-SHA256 + block_index + data_hash + salt
CDK for Block N                 CDK for Block M
    ↓  AES-256-GCM or ChaCha20-Poly1305
Encrypted Block N               Encrypted Block M

Critically, compromising one block's CDK reveals nothing about any other block's key. The CDKs are mathematically independent.

What the V6-C Metadata Actually Is

Each block in a .qltx container carries a structure called V6-C metadata alongside the ciphertext. This metadata holds everything needed to independently decrypt and verify that one block:

The per-block IV (nonce for the AEAD cipher)
The HKDF salt used to derive this block's CDK
The SHA-256 hash of the block's original (unencrypted) content
The CDK itself, wrapped (re-encrypted) with the master key
The cipher identifier (AES-GCM or ChaCha20)
The original and compressed block sizes

The V6-C metadata is itself encrypted with the master key. This means that without the MK, you can't even learn anything about the block's internal structure — the IV, the salt, and the content hash are all opaque to an observer who doesn't hold the key.

This is what makes the container format genuinely seekable without compromising security: each block is a self-contained, independently verifiable unit, and the footer index is the table of contents that maps byte offsets to those units.

Cipher Choices: AES-256-GCM and ChaCha20-Poly1305

V6 supports two AEAD ciphers, and the choice matters depending on your hardware.

AES-256-GCM is the default. On any CPU with AES hardware acceleration (AES-NI instructions, present in virtually all processors manufactured after 2010), it runs at several gigabytes per second. If you're deploying on modern server or desktop hardware, this is almost certainly the right choice.

ChaCha20-Poly1305 is the mobile-friendly alternative. It was designed to be fast in pure software, without hardware acceleration. On IoT devices, older ARM processors, or embedded systems where AES-NI isn't available, ChaCha20 can be meaningfully faster. It provides the same security guarantees — 256-bit keys, 16-byte AEAD authentication tags, identical tamper-detection properties.

Both ciphers authenticate every block with a 16-byte tag. Any modification to any byte of any block — even flipping a single bit — causes the authentication check to fail and decryption to abort with an error. This tamper-evidence is not a separate step or an add-on; it is baked into the encryption itself.

HTTP Range Requests: Seeking Over the Network

The most practically significant feature of V6 is its native HTTP Range Request support. Range Requests are the mechanism that allows you to seek in a YouTube video without buffering from the beginning — your browser sends a request for bytes=50000000-60000000 and the server returns just those bytes with a 206 Partial Content response.

V6's SeekServer brings this capability to encrypted files. The server receives a Range Request, consults the footer's VFS (Virtual File System) index to identify which blocks cover the requested byte range, fetches and decrypts only those blocks, and returns exactly the requested bytes. The client — a media player, a browser, a download manager — never needs to know about the block structure. It just gets its bytes.

Seek cost for a 20 GB encrypted video:

Byte range:     bytes=16,384,000,000–16,385,048,575
Blocks needed:  2  (2 MB of 20 GB read from disk)
Data touched:   0.01% of the container
Response time:  milliseconds

The documentation at qeltrix-v6.netlify.app covers the full SeekServer setup, including how to put it behind nginx or Caddy for HTTPS, and how to test it with curl's -H "Range: bytes=X-Y" flag.

The Gateway: Encrypting Live Streams

Beyond static files, V6 includes a GatewayServer — a TCP encryption router that can wrap any stream of data into V6's block-encrypted format in real time. Raw bytes come in one side; authenticated, block-framed V6 ciphertext goes out the other.

The Gateway runs in one of three topologies:

Reflect mode is for testing. The gateway encrypts incoming data and sends the V6 stream back to the same sender. Useful for benchmarking and verifying that the encryption pipeline is working.

Router mode is the production pattern. A data source sends a raw TCP stream to the gateway; the gateway encrypts it and forwards the V6 stream to a destination. The destination — a storage backend, another service, a relay — receives only ciphertext. Even if the destination is compromised, it has no key.

Chained mode connects multiple gateways in series. Each gateway wraps the stream with its own master key. Data passing through three gateways is triple-encrypted under three independent keys; compromising any one node doesn't expose the plaintext.

The GatewayServer uses a thread pool to handle simultaneous connections, with each connection maintaining its own independent block counter and CDK chain. Encryption throughput scales approximately linearly with available CPU cores up to the pool limit.

How Parallel Processing Works

Both the packer and the gateway are designed for throughput. Packing a file doesn't encrypt blocks one at a time in sequence — it dispatches block encryption jobs to a ThreadPoolExecutor, letting multiple blocks be encrypted simultaneously across all available CPU cores. On modern hardware with AES-NI, a well-parallelised V6 pack operation can saturate memory bandwidth rather than CPU compute.

The same design applies to unpacking and to the gateway. Each worker thread maintains its own state — its own cipher instance, its own block counter, its own CDK derivation — so threads never need to coordinate or synchronise on cryptographic operations. The shared state is limited to the master key, which is read-only after startup.

What Makes V6 Useful

The combination of seekable blocks, AEAD authentication, and the HTTP Range Request interface makes V6 a practical building block for a specific set of problems that conventional encryption doesn't handle well.

Encrypted video streaming. Store video libraries in encrypted .qltx containers on S3, GCS, or any cloud object store. Serve them with the SeekServer. Users can seek and play. The storage provider sees only ciphertext. CDN nodes can cache and deliver the encrypted blocks without ever holding a key.

Partial restore from encrypted backups. Back up large datasets into .qltx containers. When you need to restore a single file from a 500 GB backup, seek directly to that file's blocks and decrypt only them. No full extraction needed.

Real-time encrypted logging. Route log streams through the GatewayServer before writing to disk or forwarding to a log aggregator. Logs are stored in V6 format. Analysis tools that hold the key can seek to any time window. Anyone without the key sees ciphertext.

Zero-trust microservice data buses. In a microservices environment, service pairs can route their data traffic through V6 gateway pairs. The internal network fabric sees only V6 ciphertext. A compromised internal service cannot read traffic addressed to other services.

Tamper-evident archival. Every block carries a SHA-256 content hash and an AEAD authentication tag. Any modification to any byte of a container — even a single bit flip — causes the affected block's decryption to fail. This makes V6 containers a natural fit for compliance environments (HIPAA, PCI-DSS, SOC 2) where tamper-evidence is a requirement.

Capacity Overview

Block-level overhead (per block):
  Block prefix:     28 bytes
  V6-C metadata:    ~128 bytes (encrypted)
  AEAD tag:         16 bytes
  ──────────────────────────────
  Overhead/block:   ~172 bytes

For 1 MB blocks:
  Overhead ratio:   ~0.017%  (172 bytes / 1,048,576 bytes)

Seek cost (20 GB file, 1 MB blocks):
  Worst case:       2 blocks decrypted (~2 MB read)
  Best case:        1 block decrypted  (~1 MB read)
  Index lookup:     O(1) via footer VFS index

Container index (footer):
  Per-block entry:  ~96 bytes
  1000-block file:  ~96 KB index

The overhead is low by design. The goal is to make encryption not a performance penalty for seekable access — and for most workloads, the block overhead is negligible compared to the data being stored.

The PoC Status: What to Know Before Deploying

V6 is explicitly a proof of concept. The block encryption, CDK derivation, AEAD authentication, gateway architecture, and Range Request support are all production-quality in design. But there are three things that must change before real deployment:

The PBKDF2 salt is hardcoded. In the current codebase (cli.py), the passphrase-to-MK derivation uses a fixed salt (b"QeltrixV6Salt"). This means two users with the same passphrase derive the same master key, and an attacker could precompute a table of common passphrases. The fix is simple — generate a random 32-byte salt per container and store it in the header — but it hasn't been implemented yet.

There is no MK distribution mechanism. V6 does not solve the problem of how two nodes (a GatewayServer and a SeekServer, for example) securely share the same master key. That's deliberate — it's a routing component, not a key authority — but it means you need to design this layer yourself. The documentation covers three approaches: ECDH ephemeral key exchange, RSA key encapsulation, and integration with an external KMS (AWS KMS, HashiCorp Vault, etc.).

There is no key rotation. A compromised master key exposes all data encrypted under it, past and future, with no re-key mechanism in place. This is the most significant operational gap between the PoC and a production system.

The full PoC-to-production checklist is in the Security Notes section of the documentation at qeltrix-v6.netlify.app.

Getting Started

The quickest way to try V6 is via PyPI:

pip install qeltrix-v6

Pack a file, inspect the container, and unpack it:

python -m qeltrix_v6 pack   myfile.mp4 myfile.qltx  --passphrase secret
python -m qeltrix_v6 info   myfile.qltx
python -m qeltrix_v6 unpack myfile.qltx recovered.mp4 --passphrase secret

Serve it over HTTP with Range Request support:

python -m qeltrix_v6 serve myfile.qltx --port 7621 --passphrase secret

Then seek to any position with any HTTP client that supports range requests — curl, VLC, a browser, ffmpeg. The documentation has full examples for all of these.

The full CLI reference, Python API documentation, deployment guides, and security analysis are at https://qeltrix-v6.hejhdiss.workers.dev/.

Qeltrix V6 — by Muhammed Shafin P (@hejhdiss) · Licensed CC BY-SA 4.0

Qeltrix V6: Rethinking Encrypted Storage for the Streaming Era

Muhammed Shafin P — Sun, 01 Mar 2026 14:32:05 +0000

GitHub Repository: https://github.com/Qeltrix/Qeltrix-v6

PyPI:https://pypi.org/project/qeltrix-v6/

⚠️ Proof-of-concept. Not for production or security-critical use without an independent cryptographic audit.

The Problem with Encryption Today

Most encryption tools work the same way they did decades ago: you hand over a file, it gets locked, and to access any part of it — even a single byte — you must decrypt the entire thing first. Want to stream a 4K video stored securely in the cloud? Download it all. Want to jump to chapter 12 of an encrypted audiobook? Decrypt everything up to that point. This "all-or-nothing" model was acceptable when files were small and bandwidth was cheap. Neither of those things is true anymore.

Qeltrix V6 is a new kind of encrypted container format that breaks this assumption entirely. It is built from the ground up to behave like a live, seekable stream — meaning you can jump to any point inside an encrypted file and begin reading immediately, without touching the rest of the data.

What Makes V6 Different: Stream-First Architecture

The core idea behind Qeltrix V6 is deceptively simple: instead of encrypting a file as one monolithic blob, V6 splits data into discrete, independently encrypted blocks. Each block carries its own cryptographic identity — its own key, its own nonce, its own integrity tag, and its own metadata. The result is a container that behaves less like a locked safe and more like an encrypted database: you can query exactly the part you need.

Traditional Encryption:
┌─────────────────────────────────────────────┐
│         ONE BIG ENCRYPTED BLOB              │
│  (Must decrypt all to access any part)      │
└─────────────────────────────────────────────┘

Qeltrix V6 Block Architecture:
┌──────────┬──────────┬──────────┬──────────┐
│ Block 0  │ Block 1  │ Block 2  │ Block N  │
│ [Header] │ [V6-C]  │ [V6-C]  │ [Footer] │
│ [MK Env] │ [Data]  │ [Data]  │ [Index]  │
└──────────┴──────────┴──────────┴──────────┘
       ↑ Each block independently seekable and verifiable

How the System is Organized

Qeltrix V6 uses a hybrid C + Python architecture that gives you the best of both worlds: raw C speed for the heavy structural work, and Python's mature cryptography ecosystem for the security layer.

┌──────────────────────────────────────────────────────────┐
│                    Qeltrix V6 System                     │
│                                                          │
│  ┌─────────────────────┐    ┌──────────────────────────┐ │
│  │  C Shared Library   │    │  Python Crypto Layer     │ │
│  │  (libqeltrix_v6)    │    │                          │ │
│  │                     │    │  ● AES-256-GCM           │ │
│  │  ● Block framing    │◄──►│  ● ChaCha20-Poly1305      │ │
│  │  ● Permutation      │    │  ● HKDF-SHA256 (CDK)     │ │
│  │  ● Header/footer    │    │  ● SHA-256 hashing       │ │
│  │  ● TCP networking   │    │  ● Master key wrapping   │ │
│  │  ● HTTP parsing     │    │  ● Metadata encryption   │ │
│  │  ● Seek math        │    │                          │ │
│  └─────────────────────┘    └──────────────────────────┘ │
│                                                          │
│  ┌───────────────┐  ┌───────────────┐  ┌──────────────┐  │
│  │  pack/unpack  │  │ GatewayServer │  │  SeekServer  │  │
│  │  (container)  │  │ (TCP encrypt) │  │ (HTTP+Range) │  │
│  └───────────────┘  └───────────────┘  └──────────────┘  │
└──────────────────────────────────────────────────────────┘

The C library handles everything that needs to be fast: splitting data into blocks, framing them for the wire, computing the mathematics of seeking, and managing TCP connections. Python takes care of everything that needs to be correct: key derivation, AEAD ciphers, and the authentication chain that ensures no one has tampered with your data.

The Security Model: Layers All the Way Down

V6 doesn't rely on a single encryption step. It implements a dual-layer key hierarchy that binds every block of data to both its content and its position in the file:

Security Hierarchy:

  Passphrase / User Credential
         │
         ▼ (PBKDF2 / HKDF)
  ┌─────────────────────┐
  │   Master Key (MK)   │  ← Root of trust; stored encrypted in container header
  └─────────────────────┘
         │
         ▼ (HKDF-SHA256 + block index + data hash + salt)
  ┌─────────────────────┐
  │  Content Derived    │  ← Per-block key, unique to BOTH this block's
  │  Key (CDK)          │    position AND its content
  └─────────────────────┘
         │
         ▼ (AES-256-GCM or ChaCha20-Poly1305 AEAD)
  ┌─────────────────────┐
  │  Encrypted Block    │  ← Authenticated ciphertext; tamper-evident
  │  + V6-C Metadata    │
  └─────────────────────┘

What makes this powerful is the Content Derived Key (CDK). This per-block key is derived using HKDF and incorporates both the block's sequential index and a SHA-256 hash of its raw content. This means two identical blocks in different positions get different keys, and moving a block to a different position without re-encrypting will cause authentication to fail — the system detects it.

The V6-C metadata — which holds the per-block IV, salt, integrity hash, and the wrapped CDK — is itself encrypted with the master key. Even an attacker who intercepts the container cannot learn anything about its internal structure without the master key.

Real-Time Seekable Access: How Range Requests Work

The most practically powerful capability of V6 is its native support for HTTP Range Requests. This is the same mechanism that allows you to seek in a YouTube video without buffering from the start. V6 brings this capability to encrypted files.

HTTP Range Request Workflow:

  Client                        SeekServer                    Container
    │                               │                              │
    │  GET /video.qltx              │                              │
    │  Range: bytes=50000000-       │                              │
    │──────────────────────────────►│                              │
    │                               │  1. Parse range header       │
    │                               │  2. Calculate which blocks   │
    │                               │     cover byte 50,000,000    │
    │                               │─────────────────────────────►│
    │                               │  3. Read only those blocks   │
    │                               │◄─────────────────────────────│
    │                               │  4. Decrypt blocks           │
    │                               │  5. Slice to exact range     │
    │  206 Partial Content          │                              │
    │◄──────────────────────────────│                              │
    │  (Only requested bytes)       │                              │

The container's footer stores a VFS (Virtual File System) index — a table that maps byte ranges in the original file to their corresponding block positions in the container. When a range request arrives, the SeekServer consults this index, fetches only the minimum necessary blocks, decrypts them, and returns exactly the bytes that were requested. A 20 GB encrypted video file can serve a seek to the 18-minute mark in milliseconds.

The Gateway: A Full Encryption Router for the Network Layer

Beyond static files, Qeltrix V6 includes a GatewayServer — a standalone TCP encryption router that can wrap any stream of data in V6's block-encrypted format as it moves across the network. This is not a VPN, and it is not a proxy in the traditional sense. It is something more specific and more powerful: a transparent encryption boundary that can be inserted between any two points in a network topology without modifying the applications on either side.

This capability elevates V6 from a file format to a network security primitive — a building block for constructing encrypted data infrastructure.

What the Gateway Actually Does

When data arrives at the GatewayServer over TCP, it does not simply forward it. It performs the full V6 encryption pipeline — in real time, on every byte — before the data leaves the gateway's other side. The receiving end gets a proper V6 block stream: framed, authenticated, and encrypted. If the receiving end is another V6-aware component (such as a storage backend or another gateway), it can verify and decrypt each block independently as it arrives.

The V6 Gateway Processing Pipeline (per connection):

  Incoming raw bytes
         │
         ▼
  ┌──────────────────────────────────────────────────────┐
  │                  GatewayServer                        │
  │                                                       │
  │  ┌─────────────┐                                      │
  │  │ Accumulate  │  Buffer incoming bytes until a full  │
  │  │ into blocks │  block boundary is reached           │
  │  └──────┬──────┘                                      │
  │         │                                             │
  │         ▼                                             │
  │  ┌─────────────┐                                      │
  │  │ Hash block  │  SHA-256 of raw block content        │
  │  │ content     │                                      │
  │  └──────┬──────┘                                      │
  │         │                                             │
  │         ▼                                             │
  │  ┌─────────────┐                                      │
  │  │ Derive CDK  │  HKDF(MasterKey, block_idx, hash,    │
  │  │             │  salt) → unique per-block key        │
  │  └──────┬──────┘                                      │
  │         │                                             │
  │         ▼                                             │
  │  ┌─────────────┐                                      │
  │  │  AEAD       │  AES-256-GCM or ChaCha20-Poly1305    │
  │  │  Encrypt    │  → ciphertext + 16-byte auth tag     │
  │  └──────┬──────┘                                      │
  │         │                                             │
  │         ▼                                             │
  │  ┌─────────────┐                                      │
  │  │ Frame as V6 │  Write block prefix + V6-C metadata  │
  │  │ wire block  │  + ciphertext to output stream       │
  │  └──────┬──────┘                                      │
  └─────────┼────────────────────────────────────────────┘
            │
            ▼
  Encrypted V6 block stream (to destination or storage)

This pipeline runs concurrently across multiple connections using a thread pool, meaning a single gateway instance can serve many simultaneous clients without serializing their encryption work.

Gateway Topology: Three Deployment Modes

The GatewayServer supports three distinct deployment topologies, selected at startup.

Mode 1 — Reflect Mode (Loopback Encryption Testing)

In reflect mode, the gateway receives a raw stream, encrypts it into V6 blocks, and sends the encrypted stream back to the sender. This is useful for testing, benchmarking, and any scenario where you want to measure V6 overhead in isolation without setting up a full two-sided deployment.

  ┌──────────────────────────────────────────────────┐
  │               Reflect Mode                       │
  │                                                  │
  │   Client App                GatewayServer        │
  │       │                          │               │
  │       │    raw TCP stream ───────►│               │
  │       │                          │  encrypt       │
  │       │◄─── encrypted V6 ────────│               │
  │       │     stream back          │               │
  │                                                  │
  │  Use case: local testing, encryption benchmarks  │
  └──────────────────────────────────────────────────┘

Mode 2 — Forward/Router Mode (Encryption Proxy)

In router mode, the gateway sits between a data source and a destination. Raw data comes in from the source, gets encrypted into V6 blocks, and is forwarded onward to the configured destination host and port. The destination receives only ciphertext — it never sees the plaintext.

  ┌────────────────────────────────────────────────────────────┐
  │                    Router Mode                             │
  │                                                            │
  │  Data Source      GatewayServer          Destination       │
  │  (app / service)      │                 (storage, relay)   │
  │        │              │                       │            │
  │        │──raw TCP────►│                       │            │
  │        │              │──V6 encrypted stream─►│            │
  │        │              │                       │            │
  │        │         [encrypt + frame]        [receives        │
  │        │                                   ciphertext]     │
  │                                                            │
  │  Use case: zero-trust data pipelines, encrypted relay      │
  └────────────────────────────────────────────────────────────┘

Mode 3 — Chained Gateway Mode (Multi-Hop Encryption)

Multiple GatewayServer instances can be chained together. The output of one gateway (an encrypted V6 stream) feeds into the input of the next, which re-encrypts it under a different master key. This creates a layered encryption topology where no single node holds the full decryption capability — each hop peels one layer.

  ┌────────────────────────────────────────────────────────────────┐
  │                 Chained Gateway Mode                           │
  │                                                                │
  │  Source     Gateway A        Gateway B        Destination      │
  │    │       (MK-Alpha)       (MK-Beta)             │           │
  │    │            │               │                 │           │
  │    │──raw──────►│               │                 │           │
  │    │            │──V6[MK-α]────►│                 │           │
  │    │            │               │──V6[MK-β]──────►│           │
  │    │            │               │     (double      │           │
  │    │            │               │      encrypted)  │           │
  │                                                                │
  │  Each gateway adds an independent encryption layer.            │
  │  Compromise of one node does not expose plaintext.             │
  └────────────────────────────────────────────────────────────────┘

The Gateway as an Encryption Router: Network-Level Placement

One of the more significant capabilities of the GatewayServer is where it can be placed in a network topology. Because it operates at the TCP socket level and is completely transparent to the applications on either side, it can be dropped into virtually any network path without code changes.

  ┌───────────────────────────────────────────────────────────────────┐
  │         V6 Gateway as a Network Encryption Router                │
  │                                                                   │
  │                         UNTRUSTED NETWORK                        │
  │                      ┌─────────────────────┐                     │
  │                       │  (ISP / Cloud / WAN)│                    │
  │                       └─────────────────────┘                    │
  │                                ▲  ▼                              │
  │  TRUSTED ZONE A                │                TRUSTED ZONE B   │
  │  ┌────────────────┐            │           ┌────────────────┐    │
  │  │  App Server    │            │           │  Storage /     │    │
  │  │  Database      │            │           │  Analytics     │    │
  │  │  Log Source    │            │           │  Backup Node   │    │
  │  └───────┬────────┘            │           └────────┬───────┘    │
  │          │ raw                 │                raw │            │
  │          ▼                     │                    ▼            │
  │  ┌───────────────┐             │           ┌───────────────┐     │
  │  │  V6 Gateway   │─────────────┘           │  V6 Gateway   │     │
  │  │  (Encrypt)    │  encrypted V6 stream    │  (Decrypt)    │     │
  │  └───────────────┘ ───────────────────────►└───────────────┘     │
  │                                                                   │
  │  ● The untrusted network sees only V6 ciphertext                 │
  │  ● No application changes required on either side                │
  │  ● Each block is independently authenticated                      │
  └───────────────────────────────────────────────────────────────────┘

This pattern is particularly relevant for cloud deployments where data must traverse shared infrastructure. The V6 gateway pair creates an encrypted tunnel at the application-data level — distinct from TLS (which secures the transport) in that the data itself is encrypted in a structured, auditable format that can be stored, replayed, and verified independently of the connection.

Concurrent Connection Handling

The GatewayServer uses a ThreadPoolExecutor to handle multiple simultaneous connections. Each connection gets its own encryption context — its own block counter, its own per-block key derivation chain — so connections are completely isolated from each other at the cryptographic level.

  GatewayServer — Concurrent Architecture:

  ┌─────────────────────────────────────────────────────────┐
  │                                                         │
  │  Connection 1 ──► Worker Thread 1 ──► V6 Stream Out 1  │
  │  Connection 2 ──► Worker Thread 2 ──► V6 Stream Out 2  │
  │  Connection 3 ──► Worker Thread 3 ──► V6 Stream Out 3  │
  │  Connection N ──► Worker Thread N ──► V6 Stream Out N  │
  │                                                         │
  │  Each worker maintains:                                 │
  │  ● Independent block counter (block_index)              │
  │  ● Independent CDK derivation chain                     │
  │  ● Independent AEAD cipher state                        │
  │  ● Independent stats (bytes_in, bytes_out)              │
  │                                                         │
  │  Shared across all workers (read-only after init):      │
  │  ● Master Key                                           │
  │  ● Cipher selection (AES or ChaCha20)                   │
  │  ● Block size configuration                             │
  └─────────────────────────────────────────────────────────┘

This design means encryption throughput scales linearly with the number of available CPU cores up to the thread pool limit. A gateway instance running on an 8-core server can simultaneously encrypt 8 independent streams at full CPU speed.

Gateway vs. SeekServer: Complementary Roles

It is worth distinguishing the two server components clearly, since they solve related but different problems:

  ┌───────────────────────────────────────────────────────────────┐
  │          GatewayServer vs. SeekServer                        │
  │                                                               │
  │  ┌──────────────────────┐   ┌──────────────────────────────┐  │
  │  │   GatewayServer      │   │        SeekServer            │  │
  │  │                      │   │                              │  │
  │  │  Protocol:  TCP      │   │  Protocol:  HTTP/1.1         │  │
  │  │  Direction: Encrypt  │   │  Direction: Decrypt + Serve  │  │
  │  │  Input:     Raw      │   │  Input:     .qltx container  │  │
  │  │             stream   │   │  Output:    Plaintext bytes  │  │
  │  │  Output:    V6       │   │  Seeking:   Range Requests   │  │
  │  │             stream   │   │                              │  │
  │  │  Use when:           │   │  Use when:                   │  │
  │  │  ● Encrypting data   │   │  ● Serving encrypted files   │  │
  │  │    in transit        │   │    to media players,         │  │
  │  │  ● Building zero-    │   │    browsers, apps            │  │
  │  │    trust pipelines   │   │  ● Partial extraction from   │  │
  │  │  ● Transparent       │   │    large archives            │  │
  │  │    encryption proxy  │   │                              │  │
  │  └──────────────────────┘   └──────────────────────────────┘  │
  │                                                               │
  │  Together: Encrypt at source (Gateway) → Store → Serve       │
  │            on demand with seeking (SeekServer)                │
  └───────────────────────────────────────────────────────────────┘

In a complete deployment, the GatewayServer encrypts data as it is produced and streams it to storage. The SeekServer then serves that stored, encrypted data to consumers — decrypting only the portions they request. The two components form a complete end-to-end pipeline.

Full End-to-End Pipeline Diagram

  ┌──────────────────────────────────────────────────────────────────────┐
  │              Complete V6 Encrypted Data Infrastructure               │
  │                                                                      │
  │  PRODUCTION          │  UNTRUSTED        │  CONSUMER                │
  │  SIDE                │  STORAGE/NET      │  SIDE                    │
  │                      │                   │                          │
  │  ┌──────────────┐    │                   │    ┌──────────────────┐  │
  │  │ App / Camera │    │                   │    │ Media Player /   │  │
  │  │ Sensor / DB  │    │                   │    │ Browser / Client │  │
  │  └──────┬───────┘    │                   │    └────────┬─────────┘  │
  │         │ raw data   │                   │             │ HTTP Range  │
  │         ▼            │                   │             ▼ Request     │
  │  ┌──────────────┐    │   ┌───────────┐   │    ┌──────────────────┐  │
  │  │  V6 Gateway  │────┼──►│  .qltx    │◄──┼────│   SeekServer     │  │
  │  │  (Encrypt)   │    │   │  Storage  │   │    │   (Decrypt +     │  │
  │  └──────────────┘    │   │  (Cloud / │   │    │    Serve range)  │  │
  │                      │   │   Disk)   │   │    └──────────────────┘  │
  │  Master Key held     │   │           │   │    Master Key held       │
  │  by Gateway only     │   │ Sees only │   │    by SeekServer only    │
  │                      │   │ ciphertext│   │                          │
  └──────────────────────┴───┴───────────┴───┴──────────────────────────┘

The storage layer — whether a cloud object store, a NAS, or a CDN — holds only encrypted V6 containers. It has no access to the master key, and therefore no ability to read the data it stores. The encryption and decryption capabilities are held exclusively by the gateway (on ingress) and the SeekServer (on egress), both of which operate within the trusted boundary.

Gateway-Specific Use Cases

Zero-Trust Internal Data Bus

In a microservices architecture, services that exchange sensitive data (user records, financial transactions, health data) can route their traffic through V6 gateway pairs. The internal network sees only V6 ciphertext. Even if an internal service is compromised, it cannot read traffic it wasn't issued a key for.

  ┌──────────────────────────────────────────────────────────┐
  │            Zero-Trust Microservice Mesh                  │
  │                                                          │
  │  Service A      GW-A     Network Fabric    GW-B  Svc B  │
  │    │             │            │             │      │    │
  │    │──plaintext─►│            │             │      │    │
  │    │             │──V6 crypt─►│──V6 crypt──►│      │    │
  │    │             │            │             │─plain►│    │
  │                                                          │
  │  Service C      GW-C          │            GW-D  Svc D  │
  │    │             │            │             │      │    │
  │    │──plaintext─►│──V6 crypt─►│──V6 crypt──►│─plain►│   │
  │                                                          │
  │  Each service pair has its own Master Key.               │
  │  GW-A cannot decrypt GW-C's traffic.                     │
  └──────────────────────────────────────────────────────────┘

Encrypted Sensor / IoT Data Collection

IoT devices and sensors often transmit data over networks with little to no security. A V6 gateway deployed at the network edge collects raw sensor streams, encrypts them in real time into V6 containers, and forwards them to a cloud backend. The backend stores only ciphertext. Analysis infrastructure that holds the key can then seek into specific time windows without pulling entire archives.

  Sensors ──► Edge Gateway (V6 Encrypt) ──► Cloud Storage (.qltx)
                                                      │
                                          Analytics seeks specific
                                          time windows via SeekServer

Encrypted Audit Trail / Compliance Logging

Compliance environments (HIPAA, PCI-DSS, SOC 2) require that sensitive logs be tamper-evident and access-controlled. Running audit log streams through a V6 gateway produces containers where every block is authenticated — any deletion or modification breaks the authentication chain and is immediately detectable. Auditors with the key can seek to any time window; no one else can read the logs at all.

Possible Use Cases

1. Secure Cloud Video Streaming

Store large video libraries encrypted in cloud storage (S3, GCS, etc.) and serve them with HTTP Range Request support. Users can seek and play without the server — or the storage provider — ever seeing the plaintext. V6's block structure means scrubbing to any point in a 2-hour film requires decrypting at most a few hundred kilobytes.

2. Encrypted Media CDN

Build a content delivery network where the CDN nodes hold only encrypted blocks. The keys never leave your infrastructure. Even if a CDN node is compromised, an attacker gets only ciphertext — and individual blocks are useless without the master key.

3. Secure Backup with Partial Restore

Back up large datasets block by block. When you need to restore a single file from a 500 GB backup archive, seek directly to that file's blocks and decrypt only those. No full-archive extraction needed.

4. Real-Time Encrypted Logging

Use the Gateway Server in front of a log aggregation system. Log data is encrypted in transit and stored in V6 containers, but log analysis tools that hold the key can still seek to specific time windows within a day's logs without downloading the full day.

5. Secure File Sharing with Selective Access

Distribute an encrypted container where different authorized parties can access different byte ranges (corresponding to different logical sections). Since each block has its own CDK derived from the master key, access control can be scoped to specific block ranges.

6. Encrypted Game Asset Streaming

Game engines stream assets from disk or network as they're needed. V6 enables those assets to be stored encrypted without performance penalties — the engine requests the exact byte range for a texture or audio clip, and only that range is decrypted.

7. Tamper-Evident Archival

Every block carries a SHA-256 integrity hash and an AEAD authentication tag. Any modification to any byte of a V6 container — even a single bit flip — causes decryption of the affected block to fail with an authentication error. This makes V6 containers a natural fit for archival storage where data integrity must be provable.

8. Privacy-Preserving Data Pipelines

In data processing pipelines, intermediate datasets can be stored as V6 containers. Each pipeline stage holds only the master key it needs to access its assigned blocks, and the encrypted metadata ensures that even the schema and structure of the data remain confidential.

Performance Design

V6 is built to be fast. The block architecture is inherently parallelizable: each block is independent, so packing and unpacking use a ThreadPoolExecutor to process multiple blocks simultaneously across all CPU cores. On modern hardware, this approach can fully saturate the memory and I/O bandwidth available, rather than being bottlenecked by a single encryption thread.

For cipher choice, V6 offers:

AES-256-GCM — the hardware-accelerated default. On any CPU with AES-NI instructions (most devices manufactured after 2010), this is extremely fast.
ChaCha20-Poly1305 — the mobile-friendly alternative. Designed to be fast in software on CPUs without AES hardware acceleration, making it ideal for IoT devices or older mobile hardware.

Both options provide AEAD (Authenticated Encryption with Associated Data), meaning every decrypt operation simultaneously verifies that the data has not been altered.

The Master Key Problem: How Does V6 Actually Share It?

This is the most important practical question about the system, and it deserves a direct answer.

The Master Key (MK) is a 32-byte symmetric secret. It lives only in process memory — never written to disk in plaintext, never transmitted by V6 itself. The container header stores an encrypted copy of the MK (wrapped with AES-GCM using a passphrase-derived key), but that encrypted blob is useless without the passphrase or raw key that created it. Every operation — pack, unpack, seek_extract, GatewayServer, SeekServer — requires the caller to supply the MK at runtime.

This raises an unavoidable question: how do two parties — a GatewayServer on one machine and a SeekServer on another — both end up holding the same MK without transmitting it over the wire in a way an attacker could intercept?

V6 intentionally does not solve this problem. The gateway is a routing and encryption component. Key distribution is a separate concern, and deliberately left to the operator. This is the right separation of responsibilities — but it does mean you need to design the key exchange layer yourself when moving toward production.

The Gateway Does Not Manage Keys — By Design

  ┌──────────────────────────────────────────────────────────────────┐
  │         What V6 Gateway Does vs. Does NOT Do                     │
  │                                                                  │
  │  ✓ DOES:                       ✗ DOES NOT:                      │
  │  ● Accept MK at startup        ● Generate or negotiate MK        │
  │  ● Hold MK in process memory   ● Transmit MK over the wire       │
  │  ● Derive CDKs from MK         ● Store MK anywhere on disk       │
  │  ● Encrypt/decrypt streams     ● Authenticate remote parties     │
  │  ● Route encrypted data        ● Manage key rotation             │
  │  ● Track stats per connection  ● Distribute MK to new nodes      │
  │                                                                  │
  │  The MK must arrive from OUTSIDE, via an external mechanism.     │
  └──────────────────────────────────────────────────────────────────┘

This design is correct for a routing component — it mirrors how systems like nginx handle TLS certificates: the server doesn't generate your CA; you bring the cert to it. The gateway is an encryption engine, not a key authority.

How to Distribute the MK Safely: Three Approaches

When two V6 nodes need to share the same MK, the MK itself must travel over some channel. That channel needs to be secure. Here are the standard approaches, from simplest to most production-grade:

Approach 1 — Diffie-Hellman Key Exchange (ECDH)

Two parties can arrive at a shared secret without ever transmitting the secret itself. Using Elliptic Curve Diffie-Hellman (ECDH), each side generates an ephemeral key pair. They exchange only public keys. Each side independently computes the same shared secret, which is then used as (or to derive) the MK.

  ECDH Master Key Exchange for V6:

  Node A (Gateway)              Node B (SeekServer)
       │                               │
       │  Generate ephemeral           │  Generate ephemeral
       │  keypair (privA, pubA)        │  keypair (privB, pubB)
       │                               │
       │────── send pubA ─────────────►│
       │◄───── send pubB ──────────────│
       │                               │
       │  shared = ECDH(privA, pubB)   │  shared = ECDH(privB, pubA)
       │  MK = HKDF(shared, context)   │  MK = HKDF(shared, context)
       │                               │
       │  Both now hold identical MK.  │
       │  The MK was never on the wire.│

This is how TLS establishes session keys. The public key exchange happens in the clear; the secret never leaves either machine. For V6, a pre-connection ECDH handshake between gateway nodes could establish the MK before the stream begins.

Approach 2 — RSA Key Encapsulation

If one party has an RSA public key for the other, they can encrypt the MK under that public key and transmit the ciphertext. Only the holder of the RSA private key can recover the MK. This is simpler to implement than a full DH exchange but requires a pre-existing PKI (public key infrastructure) — each node needs a key pair, and the parties need a way to trust each other's public keys.

  RSA MK Distribution:

  Key Authority / Operator
         │
         │  Encrypt MK with Node B's RSA public key
         │  → sends ciphertext to Node B
         │
  Node B decrypts with private key → recovers MK
  Node A receives MK via secure out-of-band channel
         │
  Both nodes now hold MK. RSA ciphertext was safe to transmit.

Approach 3 — External Key Management Service (KMS)

In cloud environments, services like AWS KMS, HashiCorp Vault, or Google Cloud KMS act as trusted third parties that hold and distribute secrets. Neither the gateway nor the seek server ever holds the MK long-term; they request it from the KMS at startup, authenticated by their cloud identity (IAM role, service account, etc.).

  KMS-Based MK Distribution:

  ┌─────────────┐       ┌─────────────┐       ┌─────────────┐
  │  V6 Gateway │       │     KMS     │       │ V6 SeekSrv  │
  │             │       │  (Vault /   │       │             │
  │  startup:   │──────►│  AWS KMS /  │◄──────│  startup:   │
  │  "give me   │       │  etc.)      │       │  "give me   │
  │  MK for     │       │             │       │  MK for     │
  │  stream X"  │◄──────│  verifies   │──────►│  stream X"  │
  │             │  MK   │  identity   │  MK   │             │
  └─────────────┘       └─────────────┘       └─────────────┘
       MK held in memory only. Never on disk. Rotatable on demand.

The PoC Hardcoding: What It Is and Why It Matters

The current V6 codebase is explicitly a proof of concept, and it contains several hardcoded values that are safe for testing and demonstration but would create real vulnerabilities in production. It is worth being specific about what these are, why they exist, and what would need to change.

  PoC Hardcoding Inventory:

  ┌────────────────────────┬──────────────────────────┬──────────────────────┐
  │ What's hardcoded       │ Where                    │ Risk if kept in prod │
  ├────────────────────────┼──────────────────────────┼──────────────────────┤
  │ PBKDF2 salt            │ cli.py:                  │ Two users with same  │
  │ b"QeltrixV6Salt"       │ _mk_from_passphrase()    │ passphrase get same  │
  │                        │                          │ MK. Rainbow tables   │
  │                        │                          │ become possible.     │
  ├────────────────────────┼──────────────────────────┼──────────────────────┤
  │ HKDF info strings      │ crypto.py: multiple      │ Low risk alone, but  │
  │ e.g. "QeltrixV6-CDK"   │ derivation calls         │ domain separation    │
  │                        │                          │ relies on these      │
  │                        │                          │ being unique/secret  │
  ├────────────────────────┼──────────────────────────┼──────────────────────┤
  │ PBKDF2 iteration count │ cli.py: 200,000           │ Acceptable today,   │
  │                        │                          │ should be tunable    │
  │                        │                          │ and higher for HSMs  │
  ├────────────────────────┼──────────────────────────┼──────────────────────┤
  │ No MK rotation         │ Entire codebase          │ Compromised MK =     │
  │                        │                          │ all data exposed.    │
  │                        │                          │ No re-key mechanism. │
  └────────────────────────┴──────────────────────────┴──────────────────────┘

The hardcoded PBKDF2 salt is the most significant issue. A salt's purpose is to make every passphrase derivation unique, so that the same passphrase on two different containers produces two different MKs. With a fixed salt, if an attacker knows a user's passphrase (or can guess common ones), they can precompute a table of passphrase → MK mappings and apply it to any V6 container ever created with that passphrase. The fix is simple: generate a random 32-byte salt per container, store it in the header alongside the encrypted MK, and use it during derivation.

The symmetric MK itself is the second structural concern. AES-256-GCM is a symmetric cipher — the same key encrypts and decrypts. This means every party that can decrypt V6 data can also encrypt it and forge valid blocks. For many use cases (a single operator controlling both gateway and storage) this is fine. For use cases where you want to separate write and read authorization — for example, a sensor that can only encrypt, never decrypt — symmetric encryption is insufficient. RSA or an ECDH-based scheme with separate keys for encryption and decryption roles would be needed.

The Security That Exists Despite These Constraints

It is important to be precise: the PoC hardcoding creates theoretical vulnerabilities, but the system is not simply insecure. Several strong protections remain regardless:

  Security Analysis: What Holds Even with PoC Constraints

  ┌──────────────────────────────────────────────────────────────────┐
  │  STRONG regardless of PoC status:                               │
  │                                                                  │
  │  ● AES-256-GCM / ChaCha20-Poly1305 are cryptographically sound  │
  │  ● Per-block CDK derivation: compromising one block's key does  │
  │    not expose any other block's key                              │
  │  ● AEAD authentication: any tampering with any block is         │
  │    detected and decryption fails with an error                   │
  │  ● V6-C metadata encryption: container structure is opaque      │
  │    without the MK                                                │
  │  ● MK never written to disk in plaintext                        │
  │                                                                  │
  │  WEAKER due to PoC choices:                                      │
  │                                                                  │
  │  ● Fixed PBKDF2 salt weakens passphrase-to-MK derivation        │
  │  ● No MK distribution mechanism: operator must solve this       │
  │  ● Symmetric MK: no separation between encrypt/decrypt roles    │
  │  ● No key rotation: long-lived MK increases exposure window     │
  └──────────────────────────────────────────────────────────────────┘

The bottom line: if you control the MK — if it was generated securely and distributed through a trusted channel — the data is strongly protected. The PoC weaknesses are primarily in the convenience layers around the MK (the passphrase-to-key derivation and the lack of a key exchange protocol), not in the core encryption itself.

What Production Would Look Like

Moving V6 from proof of concept to production-grade would require changes in two areas: the key derivation layer and the key distribution layer. The block encryption, block framing, CDK hierarchy, and AEAD authentication are already production-quality in design.

  PoC → Production: Required Changes

  PoC (current)                     Production
  ─────────────────────────────────────────────────────────────
  Hardcoded PBKDF2 salt         →   Random per-container salt,
                                    stored in header
  Passphrase via CLI arg        →   MK via KMS / HSM / ECDH
                                    handshake at startup
  No key distribution           →   ECDH or RSA key encapsulation
                                    between gateway nodes
  No key rotation               →   MK rotation via re-key API;
                                    old containers re-encrypted
                                    on a schedule
  Symmetric-only MK             →   Optional asymmetric wrap:
                                    encrypt-only role vs. full
                                    decrypt role
  Single MK per deployment      →   Per-stream or per-container
                                    MK, managed by KMS

The architecture of V6 — its block model, its CDK hierarchy, its gateway topology — is fully compatible with all of these production hardening steps. The PoC is not a wrong design that needs to be replaced; it is a correct design that needs its key management layer completed.

Summary

Qeltrix V6 is a practical answer to a real problem: how do you store large, sensitive data in a way that is both strongly encrypted and efficiently accessible? By treating encryption not as a one-time transformation but as a structured, seekable container format, V6 makes it feasible to build encrypted video platforms, secure backup systems, privacy-preserving data pipelines, and real-time network encryption gateways — all from a single, unified architecture.

Its block encryption, CDK key hierarchy, and AEAD authentication are production-quality in design. What remains for a real-world deployment is completing the key management layer: replacing the hardcoded PBKDF2 salt with per-container random salts, adding an ECDH or RSA-based MK exchange between gateway nodes, and integrating with a KMS for key lifecycle management. The routing gateway intentionally leaves this to the operator — it is an encryption engine, not a key authority — which means the design is open to being paired with any key distribution mechanism that fits the deployment context.

The combination of a fast C core, a rigorous Python cryptography layer, HTTP Range Request support, and a fully separable key management model makes Qeltrix V6 a strong foundation for the next generation of secure streaming infrastructure.

Explore the project: https://github.com/Qeltrix/Qeltrix-v6

Created by Muhammed Shafin P (@hejhdiss) | License: CC BY-SA 4.0

Qeltrix V6 — Network-Native Encrypted Streaming Container

Muhammed Shafin P — Sun, 01 Mar 2026 13:48:54 +0000

License: CC BY-SA 4.0

Architecture: C shared library (.so/.dll) + Python cryptography library

Repository: https://github.com/Qeltrix/Qeltrix-v6

⚠️ PROOF-OF-CONCEPT ONLY. Not for production or security-critical use without an independent cryptographic audit. Created fully with Claude Sonnet 4.6.

What is Qeltrix V6?

Qeltrix V6 represents the network-native evolution of the Qeltrix encrypted archiving system. Unlike traditional encryption methods that require an entire file to be processed before it can be used, V6 is designed to turn any data stream into a live, seekable, and encrypted container in real-time. This "stream-first" approach allows for the encryption of data as it arrives from any source, making it an ideal foundation for modern cloud storage and secure file-sharing applications.

The system’s standout capability is its native support for HTTP Range Requests. This allows users to "seek" into massive encrypted files—such as high-definition videos—and begin playback or extract specific data segments instantly without downloading or decrypting the entire container. By breaking data into discrete, authenticated blocks, Qeltrix V6 ensures that only the requested information is processed, drastically reducing latency and bandwidth consumption in media streaming scenarios.

System Architecture

The project utilizes a hybrid architecture that combines the performance of low-level C with the robust security ecosystem of Python. The C core handles the "heavy lifting," including high-speed block framing, memory-efficient permutations, and the complex mathematics required for random-access seeking. Meanwhile, the Python layer leverages the cryptography library to manage high-level logic, such as key derivation and the orchestration of AES-256-GCM or ChaCha20-Poly1305 ciphers.

┌──────────────────────────────────────────────────────────┐
│                    Qeltrix V6 System                     │
│                                                          │
│  ┌─────────────────────┐    ┌──────────────────────────┐ │
│  │  C Shared Library   │    │  Python (cryptography)   │ │
│  │  libqeltrix_v6.so   │    │                          │ │
│  │  qeltrix_v6.dll     │    │  ● AES-256-GCM           │ │
│  │                     │    │  ● ChaCha20-Poly1305      │ │
│  │  ● Block framing    │◄──►│  ● HKDF-SHA256 (CDK)     │ │
│  │  ● Permutation      │    │  ● SHA-256 hashing       │ │
│  │  ● Header/footer    │    │  ● Master key wrapping   │ │
│  │  ● TCP networking   │    │  ● V6-C metadata crypto  │ │
│  │  ● HTTP parsing     │    │                          │ │
│  │  ● Seek math        │    │                          │ │
│  └─────────────────────┘    └──────────────────────────┘ │
│                                                          │
│  ┌───────────────┐  ┌───────────────┐  ┌──────────────┐  │
│  │ pack/unpack   │  │ GatewayServer │  │  SeekServer  │  │
│  │ container.py  │  │ gateway.py    │  │ (HTTP+Range) │  │
│  └───────────────┘  └───────────────┘  └──────────────┘  │
└──────────────────────────────────────────────────────────┘

Performance and Security Features

Qeltrix V6 is engineered for high-performance environments. It utilizes a ThreadPoolExecutor to provide true parallel processing across all platforms, including Windows. This multi-threaded approach allows the system to encrypt or decrypt multiple data blocks simultaneously, fully saturating modern multi-core CPUs. On Windows, the system is fully supported via a .dll built with MinGW, ensuring consistent performance across different operating systems.

Security is implemented through a multi-layered approach. Each data block maintains its own integrity via SHA-256 hashing and AEAD (Authenticated Encryption with Associated Data) tags. The system employs a dual-layer key hierarchy where a Content Derived Key (CDK) is wrapped within a Master Key (MK). Furthermore, the V6-C metadata is itself encrypted, ensuring that even the internal structure and properties of the container remain hidden from unauthorized parties. Whether using the hardware-accelerated AES-256-GCM or the mobile-friendly ChaCha20-Poly1305, Qeltrix V6 provides a flexible and powerful toolkit for the next generation of secure networking.

Running Machine Learning on Microcontrollers — A Sample Usage of embml

Muhammed Shafin P — Sun, 01 Mar 2026 10:06:28 +0000

Most embedded developers have heard the pitch for "TinyML" by now. Train a model in Python, quantize it, convert it, flash a frozen blob to your device. The microcontroller runs inference. It never learns. It never adapts. It just executes.

That's fine for a class of problems — but it leaves a lot on the table. What if your sensor drifts after six months in the field? What if you want the device to tune itself to the specific motor it's attached to, not a generic one from a training dataset? What if there's simply no server in the loop?

embml is a sample repository exploring what it looks like to do machine learning on the device itself — in pure C, with no dynamic allocation, no external dependencies beyond the standard library, and no Python runtime anywhere in the chain.

📦 Sample Repo: https://github.com/hejhdiss/embml

It is not a production framework. It is a well-structured, readable starting point — a reference that embedded developers can clone, read, understand, and adapt. Every algorithm is implemented from scratch in C99, with the caller owning every buffer.

What's in the Repo

The library covers eight modules, all in src/:

Module	Algorithm
`embml_linear`	Online linear regression via SGD
`embml_logistic`	Binary logistic regression via SGD
`embml_lms`	LMS and Normalised LMS adaptive filter
`embml_rls`	Recursive Least Squares with forgetting factor
`embml_iqr`	Incremental QR via Givens rotations
`embml_nn`	Feedforward MLP — backprop, Xavier init, gradient clipping
`embml_gru`	Minimal GRU cell for time-series inference
`embml_esn`	Echo State Network — fixed reservoir, RLS-trained readout

Each module is a .c and .h pair. Drop them directly into your firmware project.

Sample Usage

The examples below show what real usage looks like. These aren't pseudocode — they compile and run on ESP32, STM32F4, RP2040, and Arduino Mega class hardware.

Linear Regression — On-Device Temperature Compensation

A sensor reading drifts linearly with board temperature. Train a correction model live, sample by sample, with no server in the loop.

#include "embml.h"

#define N_FEAT 2   /* [raw_reading, board_temp] → corrected_value */

float weights[N_FEAT];
LinearModel model;

void setup(void) {
    linear_init(&model, N_FEAT, 0.01f, weights);
}

void loop(void) {
    float x[N_FEAT] = { read_sensor(), read_board_temp() };
    float y_true    = read_reference();   /* calibration reference */

    /* learn from each sample — no batch needed */
    linear_update(&model, x, y_true);

    float corrected = linear_predict(&model, x);
    log_value(corrected);
}

After a few hundred samples the model converges to the compensation curve. No laptop. No Python. The device taught itself.

Logistic Regression — Fault Detection

Classify whether a motor is healthy (0) or showing early fault signs (1) from two vibration features.

#include "embml.h"

#define N_FEAT 3   /* [rms_vibration, peak_freq, temp] */

float weights[N_FEAT];
LogisticModel model;

void setup(void) {
    logistic_init(&model, N_FEAT, 0.005f, weights);
}

void loop(void) {
    float x[N_FEAT] = { rms(), peak_freq(), motor_temp() };

    /* During a known-good commissioning window, label = 0 */
    logistic_update(&model, x, 0.0f);

    /* In operation: */
    uint8_t fault = logistic_classify(&model, x);
    float   prob  = logistic_predict(&model, x);

    if (prob > 0.75f)
        trigger_alert();
}

LMS — Background Noise Cancellation

The Least Mean Squares filter adapts to reject a periodic noise source from a signal, updating every sample with a single multiply-accumulate per weight — the lightest possible online learner.

#include "embml.h"

#define FILTER_LEN 16

float weights[FILTER_LEN];
LMSModel model;

void setup(void) {
    /* Normalised LMS: stable without tuning step size manually */
    lms_init_nlms(&model, FILTER_LEN, 0.5f, 1e-6f, weights);
}

void loop(void) {
    float noisy_signal[FILTER_LEN] = { /* circular buffer of ADC samples */ };
    float desired = read_reference_mic();

    lms_update(&model, noisy_signal, desired);
    float clean = lms_predict(&model, noisy_signal);

    output_audio(clean);
}

RLS — Fast Converging System Identification

RLS converges far faster than SGD with no learning rate to tune. Here it identifies the coefficients of an unknown plant (e.g. a motor transfer function) in real time.

#include "embml.h"

#define N 5

float weights[N], P[N * N], k_scratch[N];
RLSModel model;

void setup(void) {
    /* lambda=0.98: moderate forgetting for a slowly drifting system */
    /* delta=1000: weak prior — trust the data quickly               */
    rls_init(&model, N, 0.98f, 1000.0f, weights, P);
}

void loop(void) {
    float x[N]  = { u_delayed(1), u_delayed(2),
                     y_delayed(1), y_delayed(2), 1.0f };
    float y_now = read_plant_output();

    rls_update(&model, x, y_now, k_scratch);

    /* weights[] now approximate the ARX model coefficients */
    float y_pred = rls_predict(&model, x);
    float residual = y_now - y_pred;
}

Incremental QR — Numerically Robust Least Squares

When the input data is poorly conditioned (e.g. highly correlated features), RLS can lose numerical stability. Incremental QR via Givens rotations avoids this by never forming the covariance matrix directly.

#include "embml.h"

#define N 6

float R[N * N], f[N];
float w[N], scratch[2 * N];
IQRModel model;

void setup(void) {
    /* ridge=1e-4: small regularisation until enough samples arrive */
    iqr_init(&model, N, 0.99f, 1e-4f, R, f);
}

void loop(void) {
    float x[N] = { feature_1(), feature_2(), feature_3(),
                    feature_4(), feature_5(), feature_6() };
    float y = read_target();

    iqr_update(&model, x, y, scratch);

    /* re-solve periodically — O(n^2) back-substitution */
    if (sample_count % 50 == 0) {
        iqr_solve(&model, w, scratch);
    }
    float yhat = iqr_predict(w, x, N);
}

Feedforward MLP — Small Neural Net, On-Device Training

A 3-layer net with 4 inputs, 8 hidden neurons, and 1 output. Xavier-initialised. Trains with backpropagation + gradient clipping — all on the MCU.

#include "embml.h"

#define L0 4
#define L1 8
#define L2 1

float W0[L1*L0], b0[L1], a1[L1], d1[L1];
float W1[L2*L1], b1_[L2], a2[L2], d2[L2];
float input_buf[L0];

NNLayer layers[2] = {
    { W0, b0,  a1, d1, L0, L1, EMBML_ACT_RELU    },
    { W1, b1_, a2, d2, L1, L2, EMBML_ACT_SIGMOID },
};
NNModel net;

void setup(void) {
    nn_init(&net, layers, 2, input_buf, L0, 0.01f, 1.0f);
}

void loop(void) {
    float x[L0]      = { s1(), s2(), s3(), s4() };
    float target[L2] = { ground_truth() };

    nn_train_sample(&net, x, target);

    /* Or just inference: */
    const embml_float_t *out = nn_forward(&net, x);
    float prediction = out[0];
}

GRU — Time-Series Inference

A Gated Recurrent Unit cell processes sequential sensor data step by step. Weights are loaded from flash (trained offline on a host), and the hidden state persists across time steps.

#include "embml.h"

#define X_SZ 4
#define H_SZ 8

/* Weights trained offline, stored as const arrays in flash */
#include "gru_weights.h"   /* defines Wz, Wr, Wn, Uz, Ur, Un, bz, br, bn */

float h_state[H_SZ];
float scratch[3 * H_SZ];
GRUCell cell;

void setup(void) {
    gru_init(&cell, X_SZ, H_SZ,
             Wz, Wr, Wn, Uz, Ur, Un,
             bz, br, bn, h_state, scratch);
}

void loop(void) {
    float x_t[X_SZ] = { accel_x(), accel_y(), accel_z(), gyro_z() };

    gru_step(&cell, x_t);

    /* Hidden state in cell.h[] — pass to a classifier or threshold */
    float anomaly_score = cell.h[0];
    if (anomaly_score > 0.8f)
        flag_anomaly();
}

Echo State Network — On-Device Training, No Backprop

The reservoir (random weights) is fixed and stored in flash. Only the linear readout layer is trained — via RLS, one sample at a time. This is the best balance of adaptability and compute cost for embedded time-series learning.

#include "embml.h"
#include "esn_reservoir.h"  /* const W_in[H*X], const W_res[H*H] in flash */

#define X_SZ  4
#define H_SZ 32
#define Y_SZ  1

float W_out[Y_SZ * H_SZ];
float state[H_SZ], scratch[H_SZ];
float P[H_SZ * H_SZ], k[H_SZ];

ESNModel esn;
RLSModel rls;

void setup(void) {
    esn_init(&esn, X_SZ, H_SZ, Y_SZ,
             W_in, W_res, 0.9f,
             state, scratch, W_out);
    esn_rls_init(&esn, &rls, 0.98f, 1000.0f, P, k);
}

void loop(void) {
    float x[X_SZ] = { s1(), s2(), s3(), s4() };
    float y[Y_SZ] = { read_target() };

    /* Training mode */
    esn_update_state(&esn, x);
    esn_rls_update(&esn, y);

    /* Inference mode */
    float y_out[Y_SZ];
    esn_update_state(&esn, x);
    esn_predict(&esn, y_out);
}

Why This Repo Exists

This is a sample — a proof of concept that these algorithms fit cleanly in embedded C, that the APIs are usable by firmware engineers without an ML background, and that on-device learning is not science fiction for mid-range MCUs.

If you're building something with it, adapting it, or just reading the source to understand how RLS or Givens rotations actually work in flat C arrays — that's exactly what it's here for.

📦 Sample Repo: https://github.com/hejhdiss/embml

MIT License · Author: @hejhdiss · Generated with Claude Sonnet 4.5

Why Embedded Systems Deserve Their Own Machine Learning Library

Muhammed Shafin P — Fri, 27 Feb 2026 11:41:36 +0000

By @hejhdiss

Sample Repo:https://github.com/hejhdiss/embml

The embedded world has always been about doing more with less. Less RAM, less flash, less clock speed — and yet the demand for intelligence at the edge is growing faster than ever. We squeeze RTOS kernels into 64KB, hand-tune ISRs for microsecond response times, and we've gotten very good at writing C that doesn't waste a single cycle. So why are embedded developers still expected to port Python-first ML frameworks — designed for server racks — just to run a simple regression on a microcontroller?

They shouldn't be. And that's exactly the argument for a dedicated ML library built for embedded systems, from scratch, on our terms.

The Problem with "TinyML" as It Stands

Tools like TensorFlow Lite for Microcontrollers and Edge Impulse have done useful work. But they're fundamentally top-down: design in Python, train on a server, quantize, convert, deploy a frozen model blob to the device. The microcontroller is just a runtime. It has no agency. It cannot learn.

That's acceptable for a narrow class of applications, but it closes the door on anything that needs on-device adaptation — predictive maintenance that improves over time, sensor fusion that adjusts to component drift, control loops that tune themselves in the field. For those, we need an embedded-native ML library: designed around hardware constraints, not retrofitted onto them.

Scope It Right: Mid-Range MCUs Are the Target

Let's be precise about the target hardware. This isn't about squeezing transformers into an ATtiny85. The realistic and immediately useful scope is mid-range microcontrollers — devices like the ESP32, STM32F4/F7 series, RP2040, and similar parts that offer 128KB–512KB SRAM, hardware floating-point, and clock speeds in the 80–240 MHz range.

It's not impossible to go lower — but on sub-32KB SRAM devices, memory pressure becomes the real bottleneck, not compute. You can optimize arithmetic all day, but if your covariance matrix doesn't fit in SRAM, the algorithm simply doesn't run. Mid-range parts sidestep that wall cleanly. They have enough headroom for meaningful models while still being the kind of hardware that ends up in real products: industrial sensors, motor controllers, wearables, edge gateways.

Arduino Uno-class hardware (2KB SRAM) is a different conversation entirely — not excluded, but scoped separately, with stripped-down variants that make explicit trade-offs.

What Can Actually Be Built

Most classical ML algorithms are not inherently heavy. Their Python implementations are heavy because Python is heavy. Strip that away and what you have is math — and math runs fine on an ESP32.

Linear and Logistic Regression are a weight vector and a dot product. With online SGD and a fixed learning rate, you can train a linear model in real time with negligible memory overhead. Logistic regression adds a sigmoid activation — a lookup table handles it efficiently in fixed-point.

Small Feedforward Neural Networks with compact topologies — 4 inputs, 8 hidden neurons, 1 output — fit entirely in SRAM on mid-range hardware. Inference is matrix multiplication and activation. Backpropagation is heavier, but gradient clipping and fixed-point arithmetic make it workable on hardware with an FPU.

Recurrent Neural Networks are viable in minimal form. A single GRU cell for time-series prediction — temperature trends, vibration signatures, current draw anomalies — requires only a few weight matrices and a hidden state vector. The operations are repetitive and friendly to loop unrolling.

Neural manifold and ODE-inspired methods are worth keeping on the roadmap. They're not day-one targets for constrained hardware, but as the library matures and targets higher-spec parts, these become tractable. The key principle stays the same: implement the version that fits your problem and your flash budget, not the full general case.

Training Without Backprop: The Algorithms That Actually Fit

Gradient descent is not the only path to a trained model. On embedded hardware it's often not even the best path. There's a family of numerically stable, low-memory update algorithms that are much better suited to MCU constraints — and they deserve to be first-class citizens in this library.

Recursive Least Squares (RLS) solves the linear regression problem incrementally, sample by sample, without storing the full dataset. It maintains a covariance matrix and updates it with each new observation, converging faster than SGD and with no learning rate to tune. On a mid-range MCU with 10–20 features, the covariance matrix is small enough to live comfortably in SRAM. RLS is the right default for any regression task where fast convergence and numerical stability matter more than raw throughput.

Incremental QR decomposition takes this further. Rather than maintaining and inverting a covariance matrix directly — which can become ill-conditioned — incremental QR updates a factored representation of the data matrix as new samples arrive. It's more numerically robust than plain RLS and still runs sample-by-sample. For embedded systems where you might be training on noisy sensor data over long periods, that stability is worth the slightly higher per-update cost.

LMS-style updates (Least Mean Squares) are at the other end of the complexity spectrum: a single weight update per sample, one multiply-accumulate per feature, no matrix state. Convergence is slower and noisier than RLS, but the memory footprint is essentially zero beyond the weight vector itself. LMS is the right tool for the most constrained targets, or for problems where you want continuous, lightweight adaptation running indefinitely in the background.

Together, RLS, incremental QR, and LMS cover a range from "fast and stable" to "minimal overhead, always on." A well-designed library exposes all three and lets the developer choose based on their hardware and application — not based on what was easiest to port from Python.

Pure C: The Only Reasonable Implementation Language

This library should be written in pure C — not C++, not Rust, not a thin wrapper around a Python-generated blob. C is the lingua franca of embedded systems. It compiles cleanly on every toolchain from GCC-ARM to SDCC to the Arduino AVR compiler. It gives the developer full control over memory layout, alignment, and register usage. And it interoperates with existing firmware without friction.

The dependency list should be as short as possible. Ideally: the C standard library (stdint.h, string.h, math.h) and nothing else for the core algorithms. Platform-specific acceleration — CMSIS-DSP on Cortex-M, the ESP-IDF DSP extensions on ESP32 — can be offered as optional back-ends behind a thin abstraction layer, but the pure-C fallback must always exist and always compile cleanly on any target. Use as little of even those as possible. Every external dependency is a maintenance burden and a porting tax.

No dynamic allocation in the core path. No malloc, no free. The caller provides buffers; the library uses them. This is non-negotiable for embedded code that needs to run reliably over months without heap fragmentation killing a device in the field.

// Caller provides all memory — no hidden allocation
RLSModel model;
float weights[N_FEATURES];
float cov[N_FEATURES * N_FEATURES];

rls_init(&model, N_FEATURES, FORGETTING_FACTOR,
         weights, cov);

// Per-sample update — can run in interrupt context
rls_update(&model, feature_vector, label);

// Inference
float prediction = rls_predict(&model, new_features);

The API should be flat, explicit, and boring. Boring embedded code is correct embedded code.

Why This Matters

The gap between embedded developers and ML practitioners is largely a tools gap. Embedded engineers understand their hardware deeply but may not know how to implement incremental QR. ML engineers can train excellent models but may not know how to write interrupt-safe code or work within a 256KB flash budget. A well-designed embedded ML library bridges that gap — meeting embedded developers where they already are: in C, on the hardware, thinking in terms of registers and cycles.

The hardware is already capable. The algorithms exist and are well-understood. What's missing is a library that takes both the hardware constraints and the algorithmic options seriously, without requiring a server in the loop.

That library should exist. It's time to build it.

@hejhdiss writes about embedded systems, signal processing, and the intersection of hardware and machine intelligence.

What NDM-TCP's Stability Reveals About the Gap Between Theory and Practice

Muhammed Shafin P — Thu, 19 Feb 2026 02:03:58 +0000

Read on Why a Stable Sawtooth from a Nonlinear System Matters

Disclaimer: This Is About Research Potential, Not Superiority Claims

Before we begin: this article is not claiming NDM-TCP is better than CUBIC, BBR, or Reno. Those algorithms are production-grade, formally analyzed, and battle-tested. They work. They are good at what they do.

This article is about something else entirely: why the fact that NDM-TCP produces a stable sawtooth pattern suggests there is research-grade content worth investigating — even though it has only been tested in simulations (using tc) and one real-world case so far.

The point is not "existing algorithms are bad." The point is "something unexpected happened that existing theory does not fully explain."

The Core Tension: Provable Theory vs. Real-World Complexity

For 30 years, TCP congestion control has been built on 20th-century calculus-based models. The network is treated like a fluid pipe: if pressure (delay) goes up, you turn the valve (congestion window) down. The math is clean. The equations are linear or near-linear. The behavior is predictable.

This approach has produced algorithms like Reno, CUBIC, and BBR — all of which have formal stability proofs. A stability proof is a mathematical guarantee (usually using something called a Lyapunov function) that the algorithm will never spiral out of control, oscillate forever, or crash the network.

CUBIC and Reno are mathematically simple enough to prove stable. They are like a predictable pendulum. Their behavior can be fully characterized with differential equations.

NDM-TCP is different. It is a recurrent nonlinear system. These are notoriously difficult to prove stable because the internal state (the "hidden state" array) is constantly changing based on feedback. Nonlinear systems can exhibit chaos, unpredictable oscillations, and sensitive dependence on initial conditions.

There is no formal proof that NDM-TCP is stable.

And yet — in both tc-based simulations and one real-world test — it produced a clean, stable sawtooth pattern.

That tension is what makes this interesting.

Two Ways of Seeing the Network

The "Old" Way: Calculus-Based Control

RTT is a number — a single scalar value representing delay
The network is modeled as a continuous system with smooth dynamics
Congestion is detected by crossing a threshold (delay > baseline) or losing packets
The goal is to solve for the optimal "rate" using differential equations

This works. It is elegant. It has decades of theory behind it.

But it struggles with modern networks: 5G with variable latency, satellite links with jitter, Wi-Fi with random bursts of interference. These networks are noisy — and noise looks like congestion to a calculus-based controller.

The NDM-TCP Way: Information-Theoretic Control

RTT is a probability distribution with measurable entropy
The network is modeled as a chaotic signal with patterns hidden in the noise
Congestion is detected by analyzing the structure of delay variation (low entropy = stable pattern = real congestion; high entropy = noisy pattern = interference)
The goal is to find "meaning" in the signal using information theory

This is a fundamentally different approach. Instead of asking "what is the delay?", it asks "what does the pattern of delays tell us?"

Why a Stable Sawtooth from a Nonlinear System Is Unusual

In the world of neural networks and recurrent controllers, "unstable" looks like a jagged, vibrating mess. Small changes in input cause wild swings in output. The system hunts around chaotically without ever settling into a rhythm.

NDM-TCP produced a clean, rhythmic sawtooth.

This means:

The system has reached an emergent equilibrium. The recurrent nonlinear controller and the TCP framework's native functions (tcp_cong_avoid_ai, tcp_slow_start) are working together, not fighting each other.
The "neural dynamics" have synchronized with the "physical network." The hidden state is adapting in a way that matches the network's actual behavior, producing predictable recovery patterns.
Nonlinear memory (recurrence) can be just as stable as linear math in practice — even if the formal proof is still missing.

This is not guaranteed. This is not trivial. Most adaptive nonlinear controllers fail at exactly this point.

The fact that it worked — in simulation and in one real-world test — suggests something is there.

The "Poor Man's Proof"

NDM-TCP does not have a 50-page mathematical stability proof. It does not have a formal Lyapunov analysis. It does not have eigenvalue decomposition showing bounded trajectories.

But it does have empirical evidence of stability: a clean sawtooth pattern that repeats consistently across test conditions.

In research terms, this is what you might call a "poor man's proof" — not formal mathematics, but strong empirical evidence that something real is happening.It suggests the approach is not fundamentally broken. It suggests there is structure worth studying.

It does not prove the algorithm is optimal, or even good. But it proves it is stable enough to investigate further.

What This Means: Two Paradigms

Stable by Design (CUBIC, Reno)

Simple, provable algorithms
Mathematically elegant
Blind to noise — delay variation from wireless interference looks the same as delay from congestion
Predictable, but sometimes overly conservative in noisy environments

Stable by Emergence (NDM-TCP)

Complex, adaptive algorithm
No formal proof (yet)
Sensitive to patterns in noise — uses entropy to distinguish real congestion from random jitter
Potentially more adaptive, but harder to analyze

Neither is "better." They are different approaches to the same problem.

The research question is: can information-theoretic feedback (like entropy) combined with recurrent nonlinear control produce stable, adaptive congestion control that handles modern noisy networks better than threshold-based approaches?

NDM-TCP does not answer that question definitively. But it suggests the question is worth asking.

What This Is Not Saying

This article is not saying:

CUBIC is bad
BBR is outdated
Formal proofs do not matter
NDM-TCP is production-ready
Existing algorithms should be replaced

What it is saying:

Current theory is built on calculus-based models that assume relatively clean signals
Modern networks (5G, satellite, wireless) are noisier than those models anticipated
Information theory (like entropy analysis) might offer a different lens for understanding congestion
Recurrent nonlinear systems can be stable in practice even without formal proofs — but we do not understand why yet
The gap between "provable on paper" and "works in practice" is worth investigating

Why This Matters for Other Researchers

If you are a networking researcher, control theorist, or machine learning researcher, here is why NDM-TCP's results are interesting:

1. It Worked in Simulation (tc-based)

tc (traffic control) is a standard Linux tool for simulating network conditions — bandwidth limits, delay, packet loss, jitter. NDM-TCP showed stable sawtooth behavior across multiple tc scenarios. This is reproducible. Anyone with a Linux machine can test it.

2. It Worked in a Real-World Test (One Case)

One real-world deployment test also showed stable behavior. This is limited evidence — one test is not enough to generalize — but it suggests the simulation results are not just artifacts of the testing environment.

3. The Combination Is Unusual

Entropy-based delay analysis + recurrent nonlinear controller + adaptive plasticity + framework-aware modulation = not a common combination in congestion control research. The fact that this combination produces stability suggests there is an interaction worth studying.

4. It Identifies a Theoretical Gap

If NDM-TCP is stable in practice but unprovable in theory, that tells us something about the theory. Either:

Existing theory does not yet fully explain this behavior (we need better tools for analyzing recurrent nonlinear systems)
The assumptions are too restrictive (real networks have structure that our models ignore)
"Stability" in practice is more forgiving than "stability" in formal analysis

Any of these would be a research contribution.

What Needs to Happen Next

If this is genuinely research-grade content, here is what proper investigation looks like:

Third-party testing — independent researchers should reproduce the results in different environments
Formal stability analysis — someone with control theory expertise should attempt to model and analyze the system
Comparison with state-of-the-art — benchmark against CUBIC, BBR, Reno, Vegas in identical conditions
Fairness testing — test NDM-TCP against competing flows to see if it starves or gets starved
Theoretical entropy study — prove (or disprove) that Shannon entropy on RTT history is a valid congestion signal

None of this has been done yet. The current results are self-conducted, limited in scope, and not peer-reviewed.

But the fact that a stable pattern emerged from a nonlinear system suggests it is worth doing.

Final Thought: Performance vs. Elegance

The history of computer science is full of examples where practical performance outpaced mathematical elegance:

Neural networks worked for decades before we understood why (and we still do not fully understand)
Quicksort is not optimal in the worst case, but it is the default sorting algorithm because it is fast in practice
Heuristic search (like A*) often outperforms provably optimal search because real-world problems have structure the theory does not capture

NDM-TCP might be another example of that tension. Or it might not. That is what research is for.

What we can say right now is this: a recurrent nonlinear congestion controller produced stable behavior in simulation and in one real-world test. That is unusual enough to be worth investigating properly.

Not because it proves existing algorithms are wrong. But because it suggests existing theory is incomplete.

Written to clarify what the stability results reveal about the gap between formal theory and practical systems — and why that gap is worth studying, even if NDM-TCP itself is just a prototype.

Why NDM-TCP's Entropy-Guided Sawtooth Pattern Reveals Real Research Potential

Muhammed Shafin P — Wed, 18 Feb 2026 14:47:44 +0000

What the Results Actually Mean — Beyond "It Works"

Introduction

NDM-TCP has been tested in multiple conditions: tc-based simulations, varied network scenarios, and one real-world deployment test. The results showed something unexpected — a stable sawtooth pattern. Not chaotic oscillations. Not erratic behavior. A clean, predictable sawtooth wave in the congestion window evolution.

For anyone who understands TCP congestion control, this should raise eyebrows. This is not a trivial outcome. This is a signal that something interesting is happening beneath the surface — something worth investigating properly, even if the current implementation is experimental and AI-assisted.

This article explains why that sawtooth matters, what it reveals about the system, and why NDM-TCP has genuine research potential — even though no one has tested it beyond my own self-conducted experiments yet.

What Is the "Entropy-Guided Sawtooth"?

Traditional TCP Sawtooth

In traditional TCP (like Reno or CUBIC), the sawtooth pattern is simple and mechanical:

Increase phase: cwnd grows linearly (additive increase)
Loss event: a packet is lost
Hard reset: cwnd is cut in half (multiplicative decrease)
Repeat

This creates a characteristic sawtooth wave. But it is "dumb" — it's a blind reaction to a binary signal (loss or no loss). There is no learning. No memory. No prediction. Just a hard-coded response.

NDM-TCP's Entropy-Guided Sawtooth

NDM-TCP showed a stable sawtooth pattern in testing — but the mechanism behind it is fundamentally different.

The sawtooth is not a hard reset triggered by packet loss. It is the result of:

Entropy-based RTT analysis detecting congestion before loss
A recurrent nonlinear controller inspired by neural architectures maintaining memory across time
Adaptive plasticity adjusting sensitivity dynamically
Heuristic congestion detection influencing cwnd decisions

Critically: the sawtooth is not purely emergent from the neural network — it is co-produced by the TCP framework. The Linux kernel's tcp_cong_avoid_ai() and tcp_slow_start() functions provide the underlying structure. The recurrent controller modulates those increments based on entropy feedback.

This can be more accurately described as: Entropy-Guided Congestion Detection with Recurrent Nonlinear Control.

Why This Pattern Is Impressive (And Unexpected)

1. Controlled Oscillation Through Framework Cooperation

Most adaptive congestion controllers suffer from "jittery" outputs. The nonlinear mappings (tanh, sigmoid, recurrent feedback) introduce unpredictability. Small changes in input can cause large swings in output. This leads to chaotic cwnd behavior — rapid spikes, sudden drops, erratic patterns.

NDM-TCP showed a clean sawtooth. That means:

The tanh/sigmoid approximations are properly tuned for the Linux kernel's cwnd increment model
The recurrent controller is not introducing runaway feedback when modulating TCP's native functions
The TCP framework's structure (tcp_cong_avoid_ai, tcp_slow_start) is working in cooperation with the entropy-based modulation, not fighting against it
The system is stable enough to produce predictable oscillations rather than noise

This cooperation between framework and controller is not guaranteed. This is not trivial. Many adaptive algorithms fail exactly here — they try to override TCP's behavior entirely and lose stability.

2. Predictable Recovery

The sawtooth pattern repeats consistently across cycles. That suggests:

The Heuristic Congestion Detection (entropy-based) is not getting "confused" by the recurrent hidden state
The "memory" of the recurrent network is helping the system find the peak bandwidth faster each cycle
The system is not just reacting — it is adapting and converging toward optimal behavior over time

In traditional TCP, recovery after congestion is slow. The system has no memory of where the ceiling was. NDM-TCP's stable recovery suggests it is remembering and learning.

3. Protocol Fairness Indication

A stable sawtooth is a good sign for fairness with other TCP flows (CUBIC, BBR, Reno). Here's why:

Other TCP variants also produce rhythmic patterns
If NDM-TCP follows a predictable sawtooth, it can interleave cleanly with other flows
Chaotic or aggressive behavior would starve competing flows or cause instability

The fact that NDM-TCP's sawtooth is stable suggests it will "play nice" in mixed-traffic environments — though this needs formal testing with competing flows to confirm.

What the Test Results Actually Showed

Test Conditions

NDM-TCP was tested in:

tc-based simulations with varying bandwidth, latency, and loss rates
Multiple network scenarios (low latency, high latency, wireless-like noise, buffer-heavy links)
One real-world deployment on an actual network connection

All tests were self-conducted. No third-party validation has been done yet.

Results Summary

In most cases: NDM-TCP showed the stable neural sawtooth pattern described above. Throughput was competitive. Latency behavior was reasonable. The system did not crash, hang, or behave erratically.

In one specific case (pure delay-only simulation): NDM-TCP showed a design limitation. In a scenario with delay variation but no actual congestion or loss, the entropy-based detection struggled. The system did not have an upper hand over traditional algorithms in this edge case.

This is expected. No algorithm is optimal in all conditions. The important part is that the failure mode was predictable and understandable — not a mysterious crash or runaway behavior.

Why This Is Researchable Content

1. The Combination Works — And That Is Not Obvious

NDM-TCP combines:

Shannon entropy as a congestion signal
A recurrent nonlinear controller inspired by neural architectures with hidden state
Adaptive plasticity with decay
Heuristic decision logic mixing delay-based and loss-based signals
Co-production with TCP framework functions rather than replacing them

There is no guarantee this combination should produce stable behavior. Each piece introduces nonlinearity. The interactions are complex. The fact that it produces a clean sawtooth — rather than chaos — suggests there is structure worth studying.

The key insight is: the controller modulates TCP's native increment functions (tcp_cong_avoid_ai, tcp_slow_start) based on entropy feedback, rather than trying to compute cwnd independently. This framework-aware design may be why stability emerges.

This is not "it worked by accident." This is "something interesting is happening that we do not fully understand yet."

2. It Crosses Disciplinary Boundaries

NDM-TCP sits at the intersection of:

Networking theory (congestion control, queueing, RTT dynamics)
Control theory (feedback systems, stability, oscillation)
Machine learning (recurrent networks, nonlinear mappings, adaptation)
Information theory (entropy as a signal, noise vs. information)

Research potential exists precisely because no single field fully explains it. A networking researcher sees heuristic congestion control. An ML researcher sees a recurrent model without training. A control theorist sees an unproven feedback system. All of them have questions.

3. The Failure Mode Is Informative

The pure delay-only case where NDM-TCP struggled is not a bug — it is a research direction.

It tells us:

Entropy alone is not sufficient for all network conditions
Delay variation without congestion confuses the heuristic
The system needs additional signal sources or logic to handle this edge case

A production-grade algorithm would need to solve this. But for a research prototype, identifying the boundary condition is valuable. It shows where the approach works and where it breaks — which is exactly what early-stage research should do.

4. It Can Be Formally Analyzed — But Hasn't Been Yet

NDM-TCP currently lacks:

Stability proof (Lyapunov analysis, eigenvalue study)
Fairness proof (Nash equilibrium, rate convergence)
Convergence analysis (does it reach optimal cwnd given enough time?)
Formal model (state-space representation, transfer function)

All of these are possible to do. The system is well-defined. The code is open. The behavior is observable.

This is not "impossible to analyze." This is "nobody has done the formal analysis yet." That is exactly what makes it researchable.

5. The Implementation Exists — Which Is Rare

Most congestion control research stays at the simulation or theoretical level. NDM-TCP is:

A working Linux kernel module
Written in C
Tested in real kernel environments
Available as actual code (not just a mathematical model)

This is a huge advantage for research. You can test it. You can modify it. You can run experiments. You do not need to reimplement it from a paper. The artifact already exists.

Why Someone Should Care (From a Research Perspective)

For Networking Researchers

Question: Can entropy-based delay analysis improve congestion detection over pure RTT thresholding?

NDM-TCP provides a working testbed. You can compare it against BBRv2's delay gradient or CUBIC's RTT-based heuristics. The fact that it produces stable behavior suggests entropy might be a viable signal — but needs formal validation.

For Control Theory Researchers

Question: Can a recurrent nonlinear system with heuristic feedback achieve stability without formal tuning?

NDM-TCP is an accidental proof-of-concept. The tanh/sigmoid mappings and recurrent state were designed heuristically, not mathematically. Yet the system is stable. Understanding why would contribute to control theory knowledge — especially for systems that need to adapt without retraining.

For Machine Learning Researchers

Question: Can a recurrent nonlinear structure be useful without training, purely from heuristic design?

NDM-TCP does not train. There is no gradient descent. The weights are pseudo-random based on indices. Yet the recurrent structure seems to contribute to adaptive behavior (the predictable recovery pattern). This challenges assumptions about what "learning" means in practical systems — perhaps structured recurrence alone, without training, can provide useful memory effects when coupled with the right feedback signals (like entropy).

For Systems Researchers

Question: How do you build adaptive algorithms in constrained environments (like the Linux kernel)?

NDM-TCP is implemented in kernel space with strict memory limits, no floating point, and hard real-time constraints. The design choices (16-bit storage, fixed-point arithmetic, simplified activations) are all compromises. Studying how those compromises affect behavior is valuable for any adaptive system in constrained environments.

What Needs to Happen Next (If Anyone Cares)

If NDM-TCP has real research potential, what would proper investigation look like?

1. Third-Party Testing

Self-conducted tests are valuable but limited. Independent testing would:

Verify the results are reproducible
Test in conditions I did not think of
Identify failure modes I missed
Provide unbiased performance comparisons

2. Formal Stability Analysis

Someone with control theory background should:

Model the system as a dynamical system
Analyze eigenvalues of the recurrent feedback loop
Prove or disprove stability under bounded inputs
Identify conditions where oscillations grow unbounded

3. Fairness Analysis

Test NDM-TCP against competing flows:

Does it starve CUBIC flows?
Does it get starved by BBR?
Does it converge to fair bandwidth sharing?
How does it behave in multi-flow scenarios?

4. Theoretical Entropy Study

Prove (or disprove) that Shannon entropy on RTT history is a valid congestion signal:

Under what network conditions does it work?
When does it fail?
Can it be combined with other signals (loss rate, queue delay) for better detection?

5. Comparison with State-of-the-Art

Benchmark against:

CUBIC (current Linux default)
BBRv2 (Google's delay-based approach)
Reno (baseline)
Vegas (another delay-based algorithm)

Run the same test scenarios. Measure throughput, latency, fairness, and stability. Publish results.

Honest Positioning: What This Is and Isn't

This is:

A working prototype with interesting emergent behavior
A research artifact that can be studied and extended
Evidence that entropy + recurrence + plasticity can produce stable congestion control
A starting point for formal investigation

This is not:

Production-ready code
Academically validated
Proven stable under all conditions
Better than existing algorithms (not claimed, not tested rigorously)

The research potential comes from the fact that something unexpected works — and we do not fully understand why yet.

Final Thought: Why This Matters

Most congestion control research is incremental. Take an existing algorithm, tweak one parameter, publish a paper showing 5% improvement in one metric. That is valuable. That is how fields advance.

But occasionally, something weird happens. Someone mixes ideas that should not obviously work together — entropy as a signal, recurrent nonlinear control, framework-aware modulation — and they do work. The result is not optimal. It is not proven. But it is interesting — in the sense that it raises new questions.

NDM-TCP is that kind of thing.

The stable entropy-guided sawtooth is not just "it works." It is "this combination of techniques produced stable behavior through co-production with the TCP framework, and we did not design that interaction formally." That is worth investigating properly — by someone with the mathematical tools and research resources to do it right.

I built a prototype. Someone else can turn it into science.

Written to clarify what the results actually mean, and why that matters for research — even if I am not the one to carry it forward.

Building with AI: What I Know, What I Built, and Where I Stand

Muhammed Shafin P — Tue, 17 Feb 2026 17:40:09 +0000

A Personal Experience with AI-Assisted System Development — Using NDM-TCP as a Case Study

An honest technical reflection — not a research paper.

Introduction

This article is a transparent look at:

What it actually feels like to build a real system with heavy AI assistance
Where AI genuinely helps, and where it quietly fails you
What I understand, what I built, and how honest I can be about the gap between those two things
Where I stand right now, and what I plan to do next

The system I built is called NDM-TCP — a Linux kernel TCP congestion control module. But this article is not really about NDM-TCP. NDM-TCP is just the thing that happened when I mixed curiosity, AI assistance, limited time, and a willingness to experiment. The real subject is what that process taught me about using AI as a building tool — what it gives you, what it takes from you, and how to use it without losing the thing that matters most: your own understanding.

This is not a research paper. It is an honest reflection written by a teenager who built something real and wants to be clear about how.

What I Know and Where I Stand

My Background

I am not at advanced engineering level. I am not yet in college. I am still a teenager.

But I understand things at an abstract and conceptual level — and I have built real things with that understanding. My Python and C knowledge can be considered intermediate. I have already built functional, real-world applications without AI assistance — so my experience is genuine, not just theoretical exposure.

I have self-taught some basics of x86/x64 assembly, working from documentation. On the networking side, I have self-studied ARP, VLANs, and STP (Spanning Tree Protocol), which gave me a practical mental model of how networks actually function below the application layer.

My interest in all of this came from curiosity and self-motivation, not a curriculum.

What I Know Clearly (About Congestion Control)

I understand:

Basic concept of cwnd (congestion window)
Relationship between throughput and RTT
Basic differential-style modeling such as:

  dW/dt = 1/R − (W/2)·p

AIMD (Additive Increase, Multiplicative Decrease)
Queue growth and buffer behavior
RTT as a delay signal
Difference between delay-based and loss-based control
Bufferbloat concept
Basic idea of AQM (like RED/CoDel, conceptually)
That congestion is a feedback system

I understand these at a conceptual and intuitive level — not at the level of deep mathematical proof.

What I Do NOT Know (Yet)

Engineering-level linear algebra
Eigenvalues and eigenvectors in depth
Formal control theory
Rigorous stability proofs
Advanced queueing theory
Formal ML theory

I am working mostly from abstraction and intuition, not full mathematical rigor. I know that. Being transparent about it is the point of this article.

What I Built: NDM-TCP

NDM-TCP is a Linux kernel congestion control module, implemented in C.

It combines:

Entropy-based RTT analysis
Adaptive congestion window logic
A small recurrent neural-style structure
Plasticity decay mechanism
Heuristic congestion detection

How It Works (Technically)

Entropy-Based RTT Analysis: The module stores a small window of RTT samples and computes Shannon entropy over the distribution. Low entropy → stable delay → likely real congestion. High entropy → noisy delay → possibly wireless fluctuation. This is a hypothesis. Not formally proven.

Adaptive cwnd Behavior: Slows growth when entropy suggests congestion, becomes more aggressive when entropy suggests noise, adjusts reduction factor in the ssthresh phase. Mixes delay-based and loss-based thinking.

Recurrent Neural Structure: Includes a hidden state array, recurrent update, tanh approximation, and sigmoid output mapping. This introduces memory across time and nonlinear feedback. Not mathematically proven stable. This is experimental.

Plasticity Concept: A variable that increases during congestion and decays slowly over time — simulating adaptive sensitivity. Heuristic-based, not derived from control theory.

What Is Proven vs. Experimental

Proven components: AIMD concepts, cwnd mechanics, TCP congestion avoidance principles, RTT measurement logic, Linux TCP integration model.

Experimental: entropy as congestion classifier, recurrent hidden state influence, plasticity-based adaptation, neural-style nonlinear mapping. None of these are backed by formal proofs. They are engineering experiments.

Why It Has Unpredictable Behavior

Recurrent systems create dynamic feedback loops. Nonlinear functions introduce oscillation possibilities. No eigenvalue stability analysis was done. No formal Lyapunov proof exists. So theoretically: it may be stable, it may oscillate, or it may overreact in delay-only environments. This is expected and known.

Published Results

All results I have published are from my own tc-based network simulations(one real world case also incuded). They are honest and accurate to my testing conditions. I am not claiming they generalize beyond those conditions. NDM-TCP showed promising results in those simulations — which is meaningful even for a hobby experiment at this stage.

The AI-Assisted Build: What Really Happened

My Honest Contribution: 20–30%

I need to be clear about this: my personal contribution to the actual coding of NDM-TCP was roughly 20–30% of the full process. The implementation relied heavily on AI assistance.

The reason is straightforward. The Linux kernel TCP congestion control API involves headers like net/tcp.h, tcp_cong.h, and other low-level kernel interfaces. Manually reading through all of that documentation from scratch — while having limited time — was not realistic for me at this stage. I did not want to spend weeks navigating kernel API structures before getting to the part I actually cared about.

If I had the time — if I had first completed the mathematics properly, then studied the kernel internals deeply, then built — that is the order I would have followed. But that window may not come on a predictable schedule. More on that below.

What AI did: handled the kernel API boilerplate, translated my conceptual intentions into valid kernel C, and helped me navigate documentation I didn't have time to absorb manually.

What I did: provided the ideas, the design decisions, the conceptual structure (entropy, recurrence, plasticity), the understanding of what I was building, and the judgment to evaluate whether results made sense.

The concepts are mine. The implementation process was heavily assisted. That distinction matters.

The Usefulness of AI in Building Systems

When used correctly, AI assistance is genuinely powerful.

It removes the blank page problem. Starting a kernel module from scratch requires knowing where to begin — which structs to register, which callbacks to implement, how the module lifecycle works. AI can generate a valid skeleton in seconds. That is real value, especially when you understand what the skeleton is doing.

It compresses documentation. Reading through net/tcp.h line by line takes time. AI can answer targeted questions about it and let you understand what you need without wading through everything at once.

It accelerates the feedback loop. Instead of spending two days wiring up boilerplate before you can test an idea, you spend two hours. More ideas get tested. More things get learned from doing.

It keeps your curiosity alive. For someone like me — a teenager with limited time but genuine curiosity — AI let me actually build the thing I was thinking about, instead of watching the window close before I ever started.

It is genuinely experience-building when used right. Working with AI-generated code that you then read, understand, modify, and test is not the same as copy-pasting code you don't understand. The former builds real capability. I came out of this project knowing significantly more about kernel module structure, TCP internals, and congestion control mechanics than when I started — because I engaged with the code even though I didn't write all of it from scratch.

The Drawbacks of AI in Building Systems

These are real and worth naming clearly.

You can build faster than you understand. This is the core risk. AI can generate code that works — compiles, runs, produces results — faster than your understanding of that code can keep up. If you are not careful, you end up with something functional that you cannot fully explain. That is a fragile position to be in.

It can mask gaps in your knowledge. If I had written every line manually, I would have immediately hit walls that told me exactly what I didn't know. With AI assistance, those walls become invisible. You bypass them — and the gap stays.

The code may be correct without you understanding why. This is especially dangerous in systems work. A kernel module that runs without crashing is not the same as a kernel module you understand. AI-generated low-level code can pass surface-level checks while containing subtle assumptions you are unaware of.

You cannot debug what you don't understand. This is where AI assistance most often comes back to hurt people. When something breaks — and it will break — you need to understand the system to fix it. If your understanding is shallow because AI did the heavy lifting, debugging becomes guesswork.

It can create false confidence. Building something that works feels good. It should. But the feeling of "I built this" can blur the line between "I designed and implemented this" and "I directed an AI to build this while I supervised." Both have value, but they are not the same thing. Confusing them leads to overestimating where you actually stand.

What This Means: How to Use AI Without Losing Yourself

The lesson from building NDM-TCP is not "AI is bad" or "AI is great." It is more specific.

AI is a tool that amplifies what you bring to it. If you bring ideas, conceptual understanding, and critical judgment — AI makes you faster and more capable. If you bring nothing but a vague goal — AI produces something you cannot own, cannot debug, and cannot build on.

At minimum, understand at the abstract level. I built NDM-TCP without full mathematical rigor, but I understood what entropy measures, what a recurrent structure does, what plasticity is trying to simulate. That abstract understanding was what made the project real rather than just generated code. You do not need a PhD to build something meaningful. But you need something — some genuine understanding of what you are building and why.

Build things yourself without AI too. The fact that I had already built real-world applications in Python and C without AI assistance meant I had a reference point. I knew what it felt like to actually write code, hit real errors, navigate real documentation. That context made the AI-assisted experience useful rather than just a shortcut. Without it, I would have had no baseline.

The boilerplate reduction is real value — use it. Nobody needs to manually write the same Makefile structure every time, or look up every kernel callback signature from scratch. AI handling that is a genuine productivity gain. The key is knowing that what AI is doing is boilerplate reduction — and staying mentally engaged with everything above boilerplate level.

Where I Stand Now

I am currently at:

Abstract theoretical understanding of congestion control
Basic mathematical modeling (conceptual, not rigorous)
Functional kernel implementation level (with heavy AI assistance)
Intermediate Python and C
Basic assembly, self-taught from documentation
Self-taught networking fundamentals: ARP, VLANs, STP
Not at advanced engineering math level

And that is okay — as long as I am honest about it.

What Comes Next

I am freezing NDM-TCP for now. This is a deliberate choice.

What comes next is completing self-study in the foundational areas I am missing — mathematics, linear algebra, calculus, and eventually control theory and stability analysis. After completing that self-study, may I revisit NDM-TCP and rewrite it properly? Maybe. Maybe not. I am not making that promise to myself or anyone else.

There is an honest tension worth naming: I do not know if formal college study will take me where I actually want to go. Curricula have their own direction. The things I am genuinely curious about — kernel internals, eBPF, network systems theory, low-level AI — may not be on the syllabus. My curiosity pulls toward technical depth that a standard engineering program might not allow. So "I will do this properly after formal study" is not a reliable plan. It might never happen if I leave it entirely to the system.

That is part of why I built NDM-TCP now, imperfectly, with heavy AI assistance — because the curiosity was here, the time was limited, and waiting for perfect conditions is how ideas die.

Self-study areas planned:(adds more as needed)

Linear algebra (Gilbert Strang, MIT OpenCourseWare)
Calculus (properly)
Eventually: eigenvalues, stability analysis, control theory basics

Future domains I want to explore:

Operating systems and Linux internals
eBPF
Networking systems
AI systems
Cybersecurity and reverse engineering

C remains my core language for OS-level work.

Honest Positioning of NDM-TCP

To be clear about what this project is:

Experimental — built to explore ideas, not to deploy
Educational — I learned more from building it than from any documentation alone
Heavily AI-assisted in implementation — concepts mine, coding process roughly 20–30% mine
Not academically proven — no stability proof, no fairness analysis, no convergence guarantee
Not production-certified — not meant to compete with Reno, CUBIC, or BBR
Promising in self-tested simulations — results are honest within their scope

It is a learning system built from curiosity. That is what it is.

For Anyone Reading This

If you are self-taught, still young, curious about systems and networking and AI, and wondering whether you can build something real — you can. AI makes that more accessible than ever before.

But understand clearly what you are doing when you use it.

Functional code is not the same as theoretical proof. Building something that works does not mean you fully understand it. Both things can be true at once — that is fine — but do not confuse them.

Using AI to build is experience — if you stay engaged. If you read the code, modify it, test it, break it, and understand what each part is trying to do — even at an abstract level — you are learning. If you just copy and run output without engaging, you are not.

Know where you actually stand. This is the hardest and most important thing. I know roughly what I understand deeply, what I understand abstractly, and what I assisted rather than authored. That clarity is more valuable than the project itself.

Building something is step one. Understanding it properly is step two. Both matter. Neither replaces the other.

Final Reflection

NDM-TCP showed promising results in simulation. It is a real kernel module running on a real Linux system. The results I published are honest.

But what I am most proud of is not the module — it is that I can write this article. That I can say exactly where the AI contribution ends and mine begins. That I know what the gaps in my understanding are and can name them. That I built something while being fully aware of what I was and was not doing.

That clarity is the most important thing a self-taught builder can develop.

Real-World Analysis of TCP Congestion Control: Reno vs. NDM TCP vs Cubic in a Home Network Environment

Muhammed Shafin P — Mon, 16 Feb 2026 16:31:03 +0000

1. Introduction

This report documents a real-world comparative analysis of three TCP congestion control algorithms: TCP Reno, TCP Cubic, and ndm_tcp. Unlike theoretical simulations or controlled lab environments, this test was conducted within a live home network during active usage. The primary objective was to observe the impact of high-throughput iperf3 transfers on simultaneous real-time traffic—specifically a streaming YouTube video—to evaluate fairness and aggression in standard consumer-grade hardware.

2. Experimental Setup

The test utilized a client-server architecture within a standard residential layout.

Hardware and Software Configuration

Server: Laptop running Debian 13 (Trixie).
Client: Laptop running Windows Subsystem for Linux (WSL).
Router: Scopus Router (Version 3.0), operating on the 2.4 GHz Wi-Fi band.(in hall)
Cabling: The client was connected via a 15-meter Fedus Cat 6 Ethernet cable to eliminate wired bottlenecks.
Network Path: The server was located in a separate room from the router, introducing a physical wall as a signal attenuation factor for any wireless legs of the path.The client was connected through Fedus 15m cat 6 cable and also was in another room.(since ethernet connected , room doesn't matter).
Measurement Tool: iperf3 version 3.12.(client version) , (server version) - 3.18.

Layout

                [ Room 1 ]
    +----------------------------------+
    |                                  |
    |  Server Laptop                   |
    |  Debian 13 (Trixie)              |
    |                                  |
    +---------------+------------------+
                    )))))
                2.4 GHz Wi-Fi
                    )))))
            (Physical Wall Barrier)
                    )))))
    +---------------+------------------+
    |       Scopus Router v3.0        |
    |        (Located in Hall)        |
    +---------------+------------------+
                    |
                    | 15m Fedus Cat 6 Ethernet Cable
                    |
    +---------------+------------------+
    |                                  |
    |  Client Laptop                   |
    |  WSL (Windows Subsystem Linux)  |
    |  iperf3 v3.12 (Client)          |
    |                                  |
    +----------------------------------+
                [ Room 2 ]

Real-World Background Traffic

The test was conducted while a high-priority background task was active: a streaming YouTube class for SSLC (10th grade) exam preparation(my sister was studying). This provided a realistic metric for "fairness"—if the iperf3 test caused the stream to buffer, the algorithm was considered highly aggressive or "unfair" to existing flows.

3. Test Scenarios and Observations

Scenario A: ndm_tcp (Initial Test)

The first test was conducted using the ndm_tcp algorithm at the server side.

Average Bitrate: ~101 Mbits/sec.(sender), (reciever) - 101 Mbits/s
Total Retransmissions: 1,158.
Stability: The congestion window (Cwnd) fluctuated between 300 KB and 1.2 MB.

Impact on Background Traffic: The YouTube stream remained stable throughout the 100-second transmission phase.

Scenario B: TCP Cubic

TCP Cubic, the modern standard for Linux, was tested for baseline comparison.

Average Bitrate: ~99.9 Mbits/sec.(sender), (reciever) - 99.6 Mbits/s.
Total Retransmissions: 1,164.

Behavior: Showed consistent throughput, slightly lower than ndm_tcp in this specific instance, with retransmissions remaining comparable.

Scenario C: TCP Reno (The Aggressive Phase)

The test was then switched to the legacy TCP Reno algorithm.

Average Bitrate (Sender): 19.0 Mbits/sec.(sender) , (reciever) - 0 Mbits/s
Observations: While Reno initially showed high Cwnd values (up to 2.97 MB), it experienced a massive surge in retransmissions (1,197 within the first 6 seconds). After the 18-second mark, the receiver reported 0.00 bits/sec, suggesting a significant collapse or stall in the link under the specific congestion conditions of the router's buffer.

4. Key Findings: The "YouTube Stalls"

A critical observation occurred regarding the fairness of these algorithms:

Reno Aggression: Approximately 7–11 seconds after the Reno test completed, the YouTube class on the sister's device stopped entirely, showing a loading/buffering spinner. Simultaneously, attempts to load YouTube on the client system also failed. This suggests that Reno's behavior (likely through bufferbloat or aggressive window scaling prior to the stall) saturated the Scopus router's resources to the point where existing flows were starved.
Recovery with ndm_tcp: Once the system was switched back to ndm_tcp, the YouTube class resumed playback within approximately 5–8 seconds.
Hardware vs. Simulation: It is important to emphasize that these results were obtained on real hardware. The Scopus router's version and the physical wall between the router and server likely contributed to the packet loss and latency patterns that triggered these behaviors.

5. Statistical Summary (100s Transmissions)

Algorithm	Avg Bitrate (Sender)	Retransmissions	Key Event
ndm_tcp	101 Mbits/sec	1,158	Stable background stream.
Cubic	99.9 Mbits/sec	1,164	Minimal impact on stream.
Reno	19.0 Mbits/sec*	1,264	YouTube stream stalled; link collapsed.

Reno bitrate is averaged over the full duration, but actual transmission stalled after 18 seconds.

iperf3 (last part in results)

1.ndm_tcp

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-100.00 sec  1.18 GBytes   101 Mbits/sec  1158             sender
[  5]   0.00-100.01 sec  1.17 GBytes   101 Mbits/sec                  receiver

2.cubic

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-100.00 sec  1.16 GBytes  99.9 Mbits/sec  1164             sender
[  5]   0.00-100.03 sec  1.16 GBytes  99.6 Mbits/sec                  receiver

3.reno

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-100.00 sec   226 MBytes  19.0 Mbits/sec  1264             sender
[  5]   0.00-100.00 sec  0.00 Bytes  0.00 bits/sec                  receiver

6. Limitations and Conclusion

Test Limitations

Sample Size: This report is based on a single real-world test case.
Environment: Testing was limited to a residential 2.4 GHz Wi-Fi environment. Results may differ significantly in 5 GHz, WiFi-6, or high-performance enterprise/telecom infrastructures.
Lack of 5G/Data Center Data: I do not have access to large-scale data center environments or 5G infrastructure to validate if these findings scale.This is where community support needed.(even this testing was with creating problem for others)

Conclusion

Based on this hardware test, ndm_tcp appeared to maintain better fairness toward background real-time traffic (YouTube) compared to TCP Reno. Reno's behavior under these specific conditions was highly aggressive, leading to a temporary denial of service for other devices on the network. However, further community validation and large-scale testing are required before concluding that ndm_tcp is excellent across various networking scenarios.

Full Data Access

The complete raw iperf3 logs (TXT format) can be accessed here:
Full Test Results - Google Drive