Thuangf45

Posted on Apr 12 • Edited on Apr 20

Stop Thinking of HTTP as Request/Response. It's a Universal Data Layout — and It's Faster Than Binary Protocol.

#dotnet #performance #networking #architecture

SIMD parsing and schema-less flexibility

Everyone "knows" binary protocol is faster than HTTP.

I used to believe that too. Until I stopped looking at HTTP as a wire protocol and started looking at it as what it actually is — a layout engine for the CPU.

That reframe changed everything.

0. The Mental Model That's Costing You Performance

The industry standard narrative goes like this:

"Binary protocols are fast because they're compact and machine-readable. HTTP is slow because it's human-readable text."

On the surface, it sounds reasonable. So engineers reach for Protocol Buffers, MessagePack, custom binary frames — anything to "get away from HTTP overhead."

But here's the question nobody asks: What exactly is the CPU doing when it parses your binary protocol?

Let's answer that honestly.

1. How Your CPU Actually Reads Data

Modern CPUs don't read byte by byte. They don't read int by int. They read 128 to 512 bits at a time — thanks to SIMD (Single Instruction, Multiple Data) registers. AVX2 can scan 256 bits per instruction. AVX-512 does 512 bits. The hardware wants to eat data in large chunks and run fast.

Now look at what a typical binary protocol demands:

// Hypothetical fixed binary protocol — 20 fields
[offset 0,  size 4]  = MessageType
[offset 4,  size 4]  = SequenceId
[offset 8,  size 2]  = Version
[offset 10, size 2]  = Flags
[offset 12, size 4]  = Timestamp
[offset 16, size 8]  = UserId
// ... 14 more fields

To read this, the CPU must:

Go to offset 0, read 4 bytes → MessageType
Go to offset 4, read 4 bytes → SequenceId
Go to offset 8, read 2 bytes → Version
...repeat 17 more times

That's 20 sequential reads, each guided by hardcoded human-defined offsets. The CPU is operating at human speed — one field at a time, exactly as the programmer specified.

And if those 20 fields are pointers rather than inline values? The CPU reads a pointer at offset N, then jumps to a completely different memory location to fetch the actual value. Minimum 40 memory operations — each with potential cache misses. The pipeline stalls. Performance collapses.

Binary protocol is forcing a 512-bit-capable CPU to act like a human reading a checklist.

2. What HttpModel Does Instead

HttpModel is not HTTP/1.1. Let me be clear about this upfront.

RequestModel and ResponseModel are just two specific instances of HttpModel. The model itself is general. Its structure is:

[Token1] [Token2] [Token3]\r\n
[Key]: [Value] \r\n
[Key]: [Value] \r\n
\r\n
[Body]

That's it. A start-line with three tokens, an arbitrary number of key-value header pairs, and a body. Nothing is fixed. Nothing is locked.

For HTTP requests, the three tokens happen to be Method, URL, Protocol. For HTTP responses, they're Protocol, StatusCode, StatusPhrase. For a game server? They could be Hp, Atk, Def. For a custom RPC? They could be ServiceName, MethodName, RequestId. The layout is yours to fill.

This is what I mean by HttpModel as a universal data layout.

3. The Parse That Runs at Machine Speed

Now here's where it gets interesting. When HttpModel receives bytes, this is what happens:

// From ReceiveHeader() in HttpModel.cs
var headerEnd = Lucifer.IndexOf(span, CrLfCrLf);      // One SIMD scan
var firstLineEnd = Lucifer.IndexOf(span, CrLf);         // One SIMD scan

// Split start-line into three tokens
Lucifer.TrySplitAt(startLine, Space, out var first, out var rest1);
Lucifer.TrySplitAt(rest1, Space, out var second, out var third);

// Mark positions — no allocation
_first  = new Position(offset, length);
_second = new Position(offset, length);
_third  = new Position(offset, length);

The CPU runs IndexOf on the full buffer using SIMD. It doesn't inspect each byte individually — it sweeps 256 or 512 bits at a time looking for \r\n. The delimiter pattern is simple enough that the hardware can detect it at maximum throughput.

What does "marking positions" mean? It means storing an (offset, size) pair — two integers — pointing into the existing buffer. No string is created. No object is allocated. No copy happens. The data lives where it landed in the receive buffer.

// Position is just two ints
private Position _first;   // (Offset: 0, Size: 3)   → "GET"
private Position _second;  // (Offset: 4, Size: 1)   → "/"
private Position _third;   // (Offset: 6, Size: 8)   → "HTTP/1.1"

Accessing any field is a ReadOnlySpan<byte> slice — zero cost, zero allocation:

public ReadOnlySpan<byte> FirstSpan
{
    get => Cache.AsSpan(_first.Offset, _first.Size);  // span slice, no copy
}

4. Binary Protocol vs. HttpModel — The Real Comparison

Let me make this concrete.

Binary protocol, 20 fixed fields:

CPU reads field 1 at offset 0
CPU reads field 2 at offset 4
...
CPU reads field 20 at offset N
20 sequential operations, pace set by the programmer

Binary protocol, 20 pointer fields:

CPU reads pointer at offset 0, jumps to memory address X to get value
CPU reads pointer at offset 8, jumps to memory address Y to get value
...
40+ operations, many with cache misses, pipeline stalls

HttpModel, 20 headers:

CPU sweeps the entire buffer in one SIMD pass
\r\n delimiters are found; positions are marked
One fast scan, pace set by the hardware

Then the human reads:

// You asked for 3 specific headers. You jump to exactly those positions.
// You read zero-copy span slices. The other 17 headers? Never touched.
model.TryGetHeader(0, out var contentType, out var ctValue);
model.TryGetHeader(5, out var userId, out var userIdValue);
model.TryGetHeader(11, out var requestId, out var reqIdValue);

This is the key insight:

Parse is the machine's job. Read is the human's job. Don't mix them.

Binary protocol conflates the two. The programmer decides what to read, and that decision dictates how the CPU must parse. The machine works at human pace.

HttpModel separates them. The machine scans everything at full hardware speed — SIMD, no branching, no tiny offset math. The programmer then reads only what they need, from marked positions, with zero allocation.

5. Zero Allocation by Architecture

Most parsers produce objects. You send bytes in, you get a ParsedMessage struct out — with strings, arrays, boxed values, GC pressure.

HttpModel produces nothing. The parsed "result" is just a set of (offset, size) pairs sitting in the same memory as the received bytes.

// No alloc. Just two ints per token/header, pointing into Cache.
internal List<(Position, Position)> _headers = [];

// Reading is a span slice — no heap involved
public bool TryGetHeader(int i, out ReadOnlySpan<byte> key, out ReadOnlySpan<byte> value)
{
    key   = _headers[i].Item1.GetData(Cache);   // span slice
    value = _headers[i].Item2.GetData(Cache);   // span slice
    return true;
}

This is what Buffer-Model Architecture means. The buffer is the source of truth. The model is a set of rules — offsets and sizes — layered on top. No materialization. No copying. Virtualized access to the same memory region.

Even the Clone() operation is clean:

// Clone copies the cache bytes once, then shares position metadata
clone.Cache.Append(Cache.AsSpan());   // one memcpy
clone._first   = _first;              // two ints
clone._second  = _second;             // two ints
clone._headers = [.. _headers];       // list of int pairs

6. Unlimited Extensibility — The Part Binary Protocol Can Never Match

Here's the thing about a fixed binary schema: it's fixed. If you need a new field, you version the protocol, update all clients, deploy everywhere, handle backward compatibility. It's an engineering project just to add a field.

HttpModel has no schema. Adding a header is one line:

model.SetHeader("X-Game-Season"u8, "3"u8);
model.SetHeader("X-Player-Guild"u8, "ShadowBlade"u8);
model.SetHeader("X-Latency-Budget-Ms"u8, "50"u8);

These headers exist if present. If absent, they're absent. No version bump. No migration. No client update. The layout is infinitely extensible because it is not a schema — it's a pattern.

The same parser handles all of it. One function. ReceiveHeader() doesn't care how many headers you have or what they're called. It scans, marks, returns.

7. Nested Models — One Parser for Everything

Here's a capability that surprises people: HttpModel supports nesting.

The body of any HttpModel can itself contain one or more HttpModel instances. Each sub-model follows the same structure — start-line, headers, body. The body of a sub-model can contain further sub-models.

[Root Model]
  Token1: BatchRequest
  Token2: /api/game/sync
  Token3: v2
  Content-Type: multipart/model
  Model-Count: 3
  \r\n
  [Sub-Model 1]
    Hp: 80
    Atk: 120
    Def: 60
    \r\n
    [body data]
  [Sub-Model 2]
    ...
  [Sub-Model 3]
    ...

The same ReceiveHeader() / position-marking logic applies at every level. You don't write a new parser per payload type. You write one parser and reuse it recursively.

This means: one TCP connection, one buffer, one parse pass, heterogeneous payload types, multiplexed — and zero allocation on the parse side.

8. Real-World Demo: RequestModel and ResponseModel

The two most familiar instances of HttpModel are RequestModel and ResponseModel. Here's how they work in practice:

Building a request:

using var req = Lucifer.Rent<RequestModel>();

req.SetBegin("POST"u8, "/api/score/submit"u8)
   .SetHeader("X-Player-Id"u8, "player_98421"u8)
   .SetHeader("X-Season"u8, "3"u8)
   .SetBody("{\"score\":9800,\"level\":42}"u8);

// req.Cache is now the complete HTTP-formatted bytes
// ready to send over the wire — no serialization, no allocation

Building a response:

using var res = Lucifer.Rent<ResponseModel>();

res.SetBegin(200)
   .SetHeader("X-Request-Id"u8, requestId)
   .SetBody("{\"accepted\":true}"u8);

Parsing incoming bytes — the zero-alloc path:

// Incoming raw bytes land in a receive buffer
// ReceiveHeader() does one SIMD scan, marks positions, returns
bool headersDone = model.ReceiveHeader(buffer, offset, size);

// Now read only what you need — zero copy, zero alloc
var method  = model.MethodSpan;    // span slice into receive buffer
var url     = model.UrlSpan;       // span slice
var body    = model.BodySpan;      // span slice

// Need a specific header?
model.TryGetHeader(i, out var key, out var value);  // span slices

The tokens can be anything. This is what makes HttpModel general:

// For a game server session protocol:
model.SetBegin("CONNECT"u8, "room_44"u8, "GameProto/1"u8);

// For a pub/sub event stream:
model.SetBegin("PUBLISH"u8, "topic/sensor/temp"u8, "EventStream/2"u8);

// For a custom RPC:
model.SetBegin("CALL"u8, "UserService.GetProfile"u8, "RPC/1"u8);

Same model. Same parser. Same zero-alloc path. Different semantics — yours.

9. The Truth Nobody Says Out Loud: HttpModel IS a Binary Protocol

Here's the reframe that changes everything.

People draw a sharp line: "HTTP is text. Binary protocols are bytes."

That line is wrong.

HttpModel is all bytes. The receive buffer is raw bytes. The Position struct is two integers pointing into raw bytes. ReadOnlySpan<byte> accesses raw bytes. There is no string, no text, no Unicode anywhere in the hot path. HttpModel is a binary protocol.

The only difference from a "traditional" binary protocol is this:

	Traditional Binary	HttpModel
Offset	Hardcoded constant	Computed dynamically by SIMD scan
Size	Hardcoded constant	Computed dynamically by delimiter
Schema	Fixed at compile time	None — unlimited headers
Fields	N fixed fields	Infinite

Traditional binary protocol: Offset = constant. Carved into the code at design time.

HttpModel: Offset = variable. Computed by the CPU at maximum hardware speed via delimiter scanning.

Same concept. Dynamic execution. Zero schema constraint.

This is the real definition of HttpModel: a binary protocol with variable offset/size, where the CPU computes the layout at runtime instead of the programmer hardcoding it at compile time.

10. Why HTTP Got a Bad Reputation — and Who's Actually Responsible

If HttpModel is this fast and this flexible, why does everyone say HTTP is slow?

Because the world turned it into an OOP nightmare.

Look at what a standard framework does when an HTTP request arrives:

// The OOP way — what most frameworks actually do
HttpContext context = new HttpContext(request);         // heap allocation
HttpRequest req     = new HttpRequest(context);         // heap allocation
Dictionary<string, string> headers = new();             // heap allocation
foreach (var header in rawHeaders)
{
    string key   = Encoding.UTF8.GetString(keyBytes);   // heap allocation
    string value = Encoding.UTF8.GetString(valueBytes); // heap allocation
    headers[key] = value;                               // heap allocation
}
string body = await new StreamReader(req.Body).ReadToEndAsync(); // heap allocation
MyDto dto   = JsonSerializer.Deserialize<MyDto>(body);           // heap allocation

Every single line allocates on the heap. One HTTP request → dozens of objects → GC pressure → slowdown.

Then engineers benchmark this and conclude: "HTTP is slow. Binary protocol is faster."

That conclusion is comparing the wrong things. They are not comparing HTTP vs. binary protocol. They are comparing OOP HTTP vs. DOD binary protocol. The variable being changed is not the wire format — it is the programming paradigm.

Flip it. Implement the binary protocol in OOP style:

// Binary protocol, OOP style — equally slow
var message = new Message();
message.Type      = BitConverter.ToInt32(buffer, 0);
message.UserId    = BitConverter.ToInt64(buffer, 4);
message.Timestamp = BitConverter.ToInt64(buffer, 12);
message.Name      = Encoding.UTF8.GetString(buffer, 20, nameLen); // heap
// ...20 more fields, all materialized into object properties

Same allocation pattern. Same GC pressure. Same performance collapse. Because the bottleneck was never the wire format. The bottleneck was object allocation.

Now implement HttpModel in DOD style — which is exactly what LuciferCore does — and the allocation count drops to zero. No strings. No dictionaries. No objects. Just two integers per field, pointing into a buffer that already exists.

HTTP was never slow. The OOP wrapper around it was slow. Those are not the same thing.

The binary protocol community earned its performance reputation by adopting DOD early — no alloc, no copy, span-based access. HttpModel takes that same DOD discipline and applies it to a layout that is infinitely more extensible. You get the speed of DOD binary protocol. You get the freedom of an unlimited schema. You get both, simultaneously.

11. Layout Is Not Just a Frontend Concept

We talk about "layout" in UI all the time — flex, grid, constraints, templates.

Backend engineers rarely use that word. But a data layout is precisely what HttpModel provides. It's a template with named slots: three start-line tokens, N key-value pairs, a body. Your job is to fill the slots. The model handles everything else — parsing, position tracking, span access, memory management.

Frontend has layout engines. Backend has binary protocols. HttpModel brings layout thinking to the backend — and that's why it parses at machine speed.

The layout doesn't constrain you. It liberates you. You define what the three tokens mean. You define what headers exist. You define the body format. The infrastructure never changes — only the semantics you layer on top.

12. Summary: Let the Machine Do Machine Work

Binary protocols make the programmer define exactly how the CPU reads memory. This is the CPU working at human speed — structured by human decisions, capped by human granularity.

HttpModel inverts this. The CPU scans at full hardware throughput, guided only by delimiters. The programmer reads from marked positions, on demand, touching only what they need.

	OOP HTTP (frameworks)	DOD Binary Protocol	HttpModel (DOD)
Parse unit	Deserialize to object	Per field (fixed offset)	Per delimiter (SIMD scan)
Parse speed	Slow (alloc-bound)	Fast	Machine-paced
Allocation	Massive (string, dict, object)	Low	Zero by architecture
Extensibility	Limited by object model	Schema change required	Add a header, done
Nesting	Framework-dependent	Requires new parser	Recursive, one parser
Universality	HTTP semantics only	Fixed per protocol	Tokens are yours to define
Offset/Size	Object properties	Hardcoded constants	Dynamically computed at runtime

The core philosophy:

HttpModel is a binary protocol. The offset and size are not hardcoded by the programmer — they are computed by the CPU at full SIMD speed. That is the only difference. And that difference gives you infinite extensibility for free.

Parse is the CPU's job. Do it at CPU speed — all at once, no human-defined order.
Read is the programmer's job. Do it at human speed — lazily, only what you need.

Binary protocol merges these two jobs. OOP HTTP drowns both of them in allocation. HttpModel keeps them separate and lets each run at its natural speed.

Implementation

Everything described here is implemented in LuciferCore — HttpModel, RequestModel, ResponseModel, Buffer, Position, and the full Buffer-Model Architecture.

"Let the machine scan. Let the human choose."

Top comments (11)

Archit Mittal • Apr 17

Intriguing reframe — HTTP as a "universal data layout" lines up with how HTTP/2 and HTTP/3 actually behave on the wire with stream multiplexing and HPACK/QPACK header compression. One important edge case to flag before readers rip out their gRPC: the latency win disappears on high-RTT mobile links where TLS handshake setup dominates over payload size, and binary protocols with connection reuse still win there. Would love to see a benchmark across LTE / 5G / wired to show where the crossover is.

Thuangf45 • Apr 18

Great points — TLS handshake dominance on high-RTT links is a real constraint, and connection reuse absolutely matters there. That's a fair benchmark challenge. Honestly these posts are just me sharing ideas during short breaks, so a full LTE/5G/wired benchmark isn't on my plate right now — but it's a good one to revisit. If anyone wants to run it first, I'd genuinely love to see the numbers. 😄

Xiao Man • Jul 7

The reflection-at-startup-then-cache-as-delegate pattern is smart. You get the flexibility of dynamic discovery without the runtime overhead. I've seen frameworks that pay that cost on every call — brutal for hot paths.

The "shared channel is the single source of truth after publish" model reminds me of event sourcing but without the persistence baggage. Clean.

Curious about one thing though: when the core re-invokes a producer to recover (consumer asks for re-delivery), how do you handle idempotency? If the producer already partially processed before the failure, you'd want to avoid duplicate side effects. Or is the assumption that producers are purely functional?

Will definitely check out the gitbook. Thanks for sharing the docs.

Thuangf45 • Jul 7

That's a great question, and it's intentionally outside the framework's responsibility.

LuciferCore provides the infrastructure—networking, pipelines, routing, middleware, module composition, and utilities such as the scheduler for ordered and priority-based execution. It gives you the building blocks, not the business policy.

Whether a producer should be retried, how many times, with what backoff strategy, or whether an operation must be idempotent depends entirely on the application's business requirements. Those decisions are highly domain-specific, so I don't hard-code them into the framework.

My goal is to provide a fast, decoupled foundation so developers can focus on implementing their own business logic and recovery strategies where they belong.

Thuangf45 • Apr 12

FlatBuffers and Cap'n Proto are excellent — and yes, they achieve zero-copy too. The key difference is schema flexibility. Those formats require a compiled schema and a version contract. HttpModel requires neither. You add a header in one line, no codegen, no recompile, no client update. The zero-alloc guarantees are comparable; the extensibility is not.

Thuangf45 • Apr 12

Raw Span gives you the memory access but not the structure. You still need to decide how to scan, where delimiters are, which offsets map to which fields, and how to handle nesting. HttpModel is that structure — a reusable, zero-alloc layout engine built on top of spans. It's the difference between having a fast car and having a road to drive it on.

Thuangf45 • Apr 12

Worth clarifying: HttpModel here is not tied to HTTP/1.1 the transport protocol. It's the layout pattern — start-line, headers, body, delimited by \r\n. HTTP/2 and HTTP/3 changed the framing layer, but the layout thinking described here operates one level below that. You can apply this same DOD approach regardless of which version sits underneath.

Hoàng Ngọc Tùng • Apr 12

imma keep thinking of HTTP as Request/Response, LOL🗿

Thuangf45 • Apr 12

That's completely fine — Request/Response is a great application of the layout. I just don't want the application to limit how you think about the foundation. Enjoy your requests and responses 😄