Price feeds hit your server. About forty thousand messages per second, each one carrying an order ID, timestamp, price and quantity. Node.js receives them over TCP, and before your app logic gets to touch them, you're already racking up a cost, and several times over as well. Understanding what that cost is, and how to stop it, is what separates a node.js service that struggles under load from one that holds up.
What happens when data arrives?
When a TCP packet lands, your OS writes it into a kernel buffer. libuv (makes Node not single lane), copies that packet from the kernel buffer to a Node buffer. The data event fires for one. So far, so good. Most code then does something like this:
That is fine.
Under load, it's a problem. chunk.toString allocates a new String, fire for two, JSON.parse goes through that string for each character, allocating new Objects for each key-value pair, fire three, four, five, and counting. You get the gist. Each allocation goes on the V8 heap and each object is something the garbage collector has to deal with.
For a low-frequency API, this is a non-issue, but for a market feed with tens of thousands of messages per second, you're working the memory faster than you're processing data and your garbage collector will screech.
We have to stop copying.
Buffers, ArrayBuffers and the Memory model
Before working with binary data directly, let's understand how Node organizes memory. A buffer in Node is a piece of memory allocated outside the V8 heap. Important info because off-heap memory doesn't trigger the same garbage collection cycles as regular JS objects. Internally, every buffer is backed by an ArrayBuffer.
allocUnsafe skips setting the memory to zero, and appropriate when you are about to overwrite every byte anyway. ArrayBuffer itself has no methods for reading or writing. It's just raw memory. What gives you access to it are TypedArrays and DataView. These are views over an existing ArrayBuffer, they don't own or copy data, they simply reference a region of memory and give you typed access to the bytes in it.
This single pattern which constructs a DataView over a Buffer's underlying ArrayBuffer is the base of zero-copy parsing in Node.
Parsing a binary frame without copying
A simplified binary order on a market feed might look like this:
| Field | Type | Size |
|---|---|---|
| Message type | Uint8 | 1 byte |
| Timestamp | BigUint64 | 8 bytes |
| Order id | Uint32 | 4 bytes |
| Price | Float64 | 8 bytes |
| Quantity | Uint32 | 4 bytes |
Total: 25 bytes for each message
The simpler approach would be to convert this string and split on delimiters, or to wrap it in JSON on the producer side (that is, where the data is coming from). Neither help performance at scale. Here's the zero-copy approach:
There is no string conversion or JSON parsing. DataView reads bytes directly from the memory allocation, interpreting them as the type you specify. getUint32, getFloat64 and similar methods handle endianness for you.
A caveat is the returned object literal itself is still allocated on the V8 heap. If you're allocating a new object on every message, at forty thousand messages per second, you still have load on the heap, just less of it.
Sharing memory across threads with SharedArrayBuffer
Zero-copy becomes truly powerful in a parallelism context. Normally, when you pass data to a worker_thread, Node serialises it, which is the same as JSON.stringifyor .parse, with the same copy cost. For a 25-byte message that cost is small, but dealing with a large orderbook or a batch of historical tick data, it compounds fast.
SharedArrayBuffer removes that cost entirely. Instead of copying data into the worker, both threads access the same physical memory.
The postMessage call transfers the SharedArrayBuffer reference, an object pointer, not 25 bytes of market data. The worker reads directly from the same physical memory the main thread wrote into. No serialization and no copy.
If you need true zero-copy transfer (relinquishing ownership rather than sharing it), a regular ArrayBuffer can be transferred with postMessage:
The tradeoff is transferred buffers are immediately unusable in the main thread. Shared buffers require you to manage concurrent access between the main and worker threads, if they both read and write simultaneously, you get data races. For a consumer pattern where the main thread writes and the worker thread reads, data races are typically avoidable by design
Pre-allocation: The other half of the equation
Zero-copy parsing eliminates unnecessary data duplication, but the allocation cost is still real. If parseOrderMessage returns a new object literal on every call, V8 is creating and discarding forty thousand objects per second. The garbage collector will pause to collect them.
The solution is pre-allocation: create your result object once and mutate it in place.
Now your hot path mutates a stable object rather than building and discarding new ones. V8's hidden class optimizations also apply more to objects with a fixed shape; property reads and writes become significantly cheaper.
Where this leaves you
Zero-copy buffer manipulation in Node doesn't make V8's garbage collector disappear, the garbage collector pauses are still the limit on what latency guarantees you can make. What zero-copy does is remove the overhead you can control: serialization costs, redundant data copies, and allocation churn on the hot path.
For a high-throughput market data pipeline, that matters. DataView over a Buffer's underlying ArrayBuffer gives you direct binary access at almost zero cost. SharedArrayBuffer lets you pass that data to worker threads and pre-allocated result objects stop you generating garbage collector work on every message.
Together, these patterns get Node as close to its latency floor as the runtime allows. It is not a replacement for C++ where deterministic microsecond execution is a hard requirement, but a substantial, practical improvement over the string-and-JSON default that most services start with.

Top comments (0)