Ishtmeet Singh

Posted on May 17

Buffer: Bytes, Encoding, and TypedArray Memory

#buffers #memory #node #nodebook

Note, this is a chapter from the Volume 1 of my book - NodeBook. You can read the entire Volume 1 by visiting the site for free!

A Buffer is not a string with different methods. It is a fixed-size, mutable sequence of bytes, meant for the part of I/O that exists before data has been interpreted as text, JSON, an image, a hash, a protocol field, or anything else.

That byte-level boundary appears everywhere. Files, TCP sockets, TLS records, compressed payloads, database frames, and image headers all cross into the runtime as bytes first. A Buffer keeps those bytes intact: it stores values from 0 to 255, exposes direct indexed access, supports explicit encoding and decoding, and inherits from Uint8Array so it fits the modern typed-array model.

The hard part is rarely the call to Buffer.from() or buf.toString(). It is knowing whether the value in front of you is still raw data, or whether some earlier API has already assigned it a meaning.

Bytes Before Meaning

A byte is eight bits, which gives it 256 possible values: 0 through 255. That is why byte-oriented APIs keep returning numbers in that interval. At this level, the number is only a stored value.

For example, the byte 0x49 is the decimal value 73. Under ASCII, 73 maps to the character I. In an image format, the same value could be a color component. In a compressed stream, it could be part of a length code. In a network packet, it could be part of an address or checksum. The byte does not carry that meaning by itself. The format supplies it.

Hexadecimal is the usual way to print bytes because two hex digits line up with one byte:

0x00 -> 0
0x0a -> 10
0x49 -> 73
0xff -> 255

That is why Node prints Buffer.from("HI") as <Buffer 48 49>. The Buffer has not become a hex string; Node is displaying the two bytes in fixed-width hexadecimal notation. Under UTF-8 or ASCII, 0x48 is H and 0x49 is I.

As soon as a format combines bytes into larger values, one more rule appears: byte order. A two-byte integer made from 0x12 and 0x34 can be read as 0x1234 or 0x3412, depending on whether the format uses big-endian or little-endian order. Node exposes that choice directly with methods such as buf.readUInt16BE() and buf.readUInt16LE(). The bytes are the same; the interpretation changes.

Characters Are Not Bytes

Most JavaScript application code works after that interpretation has already happened. A request body has become a string, JSON has become an object, and form fields have been normalized by a framework. Buffer code usually sits one layer earlier, where the runtime still has to preserve the bytes exactly as they arrived.

A JavaScript string is an immutable sequence of UTF-16 code units. That language-level model is useful for text, but it is not a raw-byte container. Some characters occupy one code unit, others occupy two, and their UTF-8 byte length is a separate measurement again.

import { Buffer } from "node:buffer";

const text = "\u00e9\u{1f642}";

console.log(text.length); // -> 3 UTF-16 code units
console.log(Buffer.byteLength(text, "utf8")); // -> 6 bytes
console.log(Buffer.from(text, "utf8").length); // -> 6 bytes

Buffer.byteLength() answers the byte question: how much space the string will occupy under a specific encoding. text.length answers the string question: how many UTF-16 code units the language sees.

Bugs begin when code treats those two questions as interchangeable. If arbitrary file bytes are decoded into a string, the runtime has to apply text rules to data that may not be text at all.

Text Decoding Corrupts Binary Data

The next example writes a tiny binary fixture that starts with the PNG file signature. It is not a complete image; the first eight bytes are enough to show the failure. Real PNG files hit the same path because the first byte is 0x89, which is not valid as the first byte of a UTF-8 sequence.

import { Buffer } from "node:buffer";
import { readFileSync, writeFileSync } from "node:fs";

const bytes = Buffer.from([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]);
writeFileSync("sample.bin", bytes);

const text = readFileSync("sample.bin", "utf8");
writeFileSync("sample-corrupted.bin", text);

console.log(JSON.stringify(text)); // -> "\ufffdPNG\r\n\u001a\n"
console.log(readFileSync("sample-corrupted.bin").toString("hex"));

The corrupted output starts with efbfbd504e47.... The original started with 89504e47..., so the damage happened at the first byte.

That change is not cosmetic. The 'utf8' argument told Node to decode the bytes as UTF-8 text. Node's normal string-decoding path is non-fatal: invalid UTF-8 byte sequences are represented with U+FFFD, the Unicode replacement character. When that replacement character is written back as UTF-8, it becomes the three bytes ef bf bd.

At that point, the original 0x89 byte is gone. No later API can recover it from the string.

This is why file size can change after a "copy" that routes binary data through text. Invalid bytes may expand into replacement-character bytes, valid text-looking regions may survive, and the final payload is no longer the same byte sequence. A real image viewer, decompressor, crypto verifier, or protocol parser will reject that output because the bytes no longer match the format.

Changing encodings does not fix the design error. Node's latin1 encoding maps each byte value from 0 to 255 to a code point from U+0000 to U+00FF, so reading with latin1 and writing with latin1 can round-trip byte values:

const bytes = Buffer.from([0x41, 0x89, 0xff]);
const text = bytes.toString("latin1");

console.log(Buffer.from(text, "latin1")); // -> <Buffer 41 89 ff>
console.log(Buffer.from(text)); // -> <Buffer 41 c2 89 c3 bf>

The default string-to-Buffer path encodes strings as UTF-8, so the second conversion changes the bytes. latin1 also hides the real type of the value. Code now holds image data, compressed data, or protocol data in a string, and the next API that treats the value as text can corrupt it.

The correct fix is simpler: do not decode data that is not text.

import { Buffer } from "node:buffer";
import { readFileSync, writeFileSync } from "node:fs";

const data = readFileSync("sample.bin");

console.log(Buffer.isBuffer(data)); // -> true
writeFileSync("sample-copy.bin", data);

With no encoding argument, readFileSync() returns a Buffer. writeFileSync() writes those bytes back out. No text decoder runs, and no replacement characters are introduced.

Where Buffer Memory Is Accounted

A Buffer object is a JavaScript value, but its payload is not stored as a normal graph of JavaScript objects. Ordinary Node-created Buffers are views over ArrayBuffer-backed storage. V8 and Node account for that storage as external memory; process.memoryUsage().arrayBuffers reports memory allocated for ArrayBuffer, SharedArrayBuffer, and Buffer backing stores, and that amount is included in external.

That accounting difference matters when debugging memory. A service can have stable heapUsed while external, arrayBuffers, or RSS grows because large binary payloads live outside the ordinary object heap. The JavaScript Buffer object keeps the backing store reachable, but the payload bytes are not object properties for the garbage collector to scan as a JavaScript object graph.

For this chapter, the lifecycle can stay at a high level. If live JavaScript objects still reference a Buffer or a view into its backing store, the backing bytes must stay alive. After those references are gone, the backing store can be released to the allocator. That does not mean the process immediately returns resident memory to the operating system. Native allocator behavior, pooling, memory pressure, and platform policy decide when RSS falls.

This explains why Buffer exists as a dedicated API rather than as plain arrays of numbers. Node needs a byte container that can cross native I/O boundaries without turning every payload into a string or an array of boxed numeric values. History is the other reason for the name: Buffer predates today's standardized typed-array APIs, and Node still keeps the API because it carries Node-specific allocation, encoding, and binary-reading methods.

Creating Buffers Without Guesswork

Once code has decided that a value should remain bytes, the next question is how those bytes are allocated. The deprecated new Buffer(...) constructor should not appear in modern code. Its overloads are ambiguous, and older numeric-constructor behavior was tied to uninitialized-memory hazards. Use explicit factory methods instead.

Buffer.alloc(size) creates a Buffer of the requested length and fills every byte with zero:

import { Buffer } from "node:buffer";

const buf = Buffer.alloc(10);

console.log(buf);
// -> <Buffer 00 00 00 00 00 00 00 00 00 00>

Zero-filling costs work, but it gives the caller a known starting state. Buffer.allocUnsafe(size) skips that initialization. It may return memory whose contents are unknown until your code overwrites them. Use it only when the next operation writes every byte before any read, log, send, or exception path can observe the contents.

Buffer.from(value) starts from existing data instead of reserving empty space. The important detail is ownership, and that depends on the input type:

const original = Buffer.from("hello");
const copy = Buffer.from(original);

copy[0] = 0x48;

console.log(copy.toString()); // -> "Hello"
console.log(original.toString()); // -> "hello"

Buffer.from(buffer) copies the bytes into new storage, so mutating the copy does not mutate the original.

Buffer.from(arrayBuffer) is different:

const store = new ArrayBuffer(3);
const bytes = new Uint8Array(store);
const buf = Buffer.from(store);

buf[0] = 0x41;
bytes[1] = 0x42;

console.log(buf.toString("utf8")); // -> "AB\u0000"

Here the Buffer and the Uint8Array share the same backing memory. Writes through either view are visible through the other. That behavior is useful when the API contract says ownership is shared, and dangerous when the caller expected a copy.

Reading and Writing Bytes

After a Buffer exists, indexed access follows Uint8Array behavior. Reads return byte values. Writes coerce values into the byte range rather than validating the application's intent.

const buf = Buffer.alloc(2);

buf[0] = 300;
buf[1] = -1;

console.log([...buf]); // -> [44, 255]

When invalid input should throw instead of being coerced, use the range-checked methods:

const buf = Buffer.alloc(1);

buf.writeUInt8(255, 0); // ok

try {
  buf.writeUInt8(300, 0);
} catch (err) {
  console.log(err.code); // -> ERR_OUT_OF_RANGE
}

Decoding a Buffer with toString() is safe when the encoding matches the data. Hex and Base64 are text representations of arbitrary bytes; UTF-8 is safe only when the bytes contain valid UTF-8 text.

const data = Buffer.from("my-super-secret-password");

console.log(data.toString("hex"));
// -> 6d792d73757065722d7365637265742d70617373776f7264

console.log(data.toString("base64"));
// -> bXktc3VwZXItc2VjcmV0LXBhc3N3b3Jk

Writing text into a Buffer performs the opposite operation: the string is encoded into bytes at a chosen offset. The next snippet is a protocol-layout example, not how application code should normally send HTTP responses from Node.

const response = Buffer.alloc(128);

let offset = response.write("HTTP/1.1 200 OK\r\n");
offset += response.write("Content-Type: text/plain\r\n", offset);
offset += response.write("\r\n", offset);

console.log(offset); // -> 45
console.log(response.toString("utf8", 0, offset));

The return value from write() is the number of bytes written, not the number of JavaScript characters consumed. That difference starts to matter as soon as non-ASCII text appears in a binary layout.

Buffer and Uint8Array

The indexed behavior comes from a larger relationship: modern Buffer deliberately aligns with the standard typed-array model. Since Node v3.0.0, Buffer has inherited from Uint8Array.

import { Buffer } from "node:buffer";

const buf = Buffer.alloc(10);

console.log(buf instanceof Buffer); // -> true
console.log(buf instanceof Uint8Array); // -> true

That inheritance means a Buffer can be passed to APIs whose contract accepts arbitrary Uint8Array instances. The contract language still matters. Some APIs check exact constructors, transfer ownership, run in browser-only environments where Buffer is unavailable, or attach semantic requirements beyond "has bytes." A Buffer is a Uint8Array; that does not make every byte-taking API a Buffer API.

The inheritance also leaves one legacy incompatibility to watch for: Buffer.prototype.slice() creates a view over the same memory. TypedArray.prototype.slice() creates a copy.

const source = Buffer.from("abcd");
const view = source.slice(1, 3);

view[0] = 0x5a;

console.log(source.toString()); // -> "aZcd"

Use buf.subarray() when you want a view and want the name to match modern typed-array code. When you need independent bytes, use Buffer.from(buf.subarray(start, end)) or Uint8Array.prototype.slice.call(buf).

The Backing Store Is Not Always the View

A Buffer is a view with three pieces of location information: the underlying backing buffer, a byte offset, and a byte length. The backing store may be larger than the Buffer view, especially for small pooled allocations.

const buf = Buffer.from("abc");
const backing = buf.buffer;

console.log(buf.byteOffset); // -> 0 or an offset into a pool
console.log(buf.byteLength); // -> 3
console.log(backing.byteLength >= buf.byteLength); // -> true

const exact = backing.slice(buf.byteOffset, buf.byteOffset + buf.byteLength);
console.log(exact.byteLength); // -> 3

For that reason, never pass buf.buffer by itself when the receiver needs exactly the bytes in buf. Pass the Buffer, pass a Uint8Array view, or pass the backing buffer together with byteOffset and byteLength.

Most Buffers created through Node's ordinary allocation paths expose an ArrayBuffer through .buffer, but the API is broader than that. A Buffer can also be created over a SharedArrayBuffer:

const shared = new SharedArrayBuffer(4);
const buf = Buffer.from(shared);

console.log(buf.buffer instanceof SharedArrayBuffer); // -> true
console.log(buf.buffer instanceof ArrayBuffer); // -> false

Code that works with .buffer should therefore treat it as the underlying backing buffer, not as a guarantee that the value is always an ArrayBuffer.

The same view model also allows different typed views to expose the same backing memory at the same time:

const backing = new ArrayBuffer(4);
const bytes = new Uint8Array(backing);
const view = new DataView(backing);

bytes.set([0xff, 0xff, 0xff, 0x7f]);

console.log(view.getInt32(0, true)); // -> 2147483647

The Uint8Array writes four individual bytes. The DataView reads those same bytes as a little-endian 32-bit signed integer. No copy happens between the two operations; both views expose the same memory.

The Boundary Streams Build On

Buffer sits at the boundary between JavaScript's string model and byte-oriented I/O. It gives Node a fixed-size, mutable byte sequence, participates in the Uint8Array ecosystem, and keeps backing bytes visible to native I/O without pretending that arbitrary bytes are text.

That model becomes the basis for the next layer of I/O. Streams do not move one giant payload through the process. They move sequences of Buffer chunks, and every bug in chunk ownership, decoding, copying, and retention starts with the Buffer rules this chapter established.

DEV Community