Codeguage

Posted on Jul 19 • Originally published at codeguage.com

A Quick Primer on Buffers in Node.js

#webdev #javascript #node #backend

Introduction
What are buffers?
- Buffer wrapped by new, core JavaScript APIs
Creating a buffer
Writing data
- Don't assign characters!
- write()
Reading data
- Reading via bracket notation
- Reading via toString()
Buffers, strings, and encodings

Introduction

Node brings system handling capabilities into JavaScript. Things like working with files (including binary files of course), with network sockets, with multithreading, and so on, are all normal for Node.

Much of this relies on working with binary data efficiently and that's precisely where buffers enter the game.

In this article, we shall learn about buffers in Node; how they work under the hood; the Buffer class; how to work with it; and much more.

Let's get started.

What are buffers?

At the core, the concept of a buffer in programming is a pretty simple one.

A buffer is a chunk of memory where given data is stored.

And that's just it — a small area of memory that we use to store data and obviously also work with it.

In Node, a buffer represents the same concept. It provides us with a chunk of memory to work with and to efficiently store binary data in it.

We can easily store binary data, read individual bytes, transform those bytes, delete certain bytes, and what not.

As stated earlier, because the environment in which Node operates intrinsically revolves around binary data, having a robust API (and the skills to work with it) is important.

This API — a fairly low-level one — is Buffer.

Buffer wrapped by new, core JavaScript APIs

Ever since the advent of ES6, JavaScript has had native provision of array buffers and typed arrays (to lay out views over those buffers). In other words, core JavaScript already provides us with a plethora of interfaces to work with buffers.

But Node historically had its own implementation for buffers — that is, the Buffer API — which got merged with the native implementation of buffers in core JavaScript following ES6.

So today, Buffer in Node is basically just an extension of the built-in Uint8Array class in JavaScript. And Buffer exists to date and is the one that Node uses internally in its different modules.

So while you could also directly interface with the core buffer APIs in JavaScript, using the Buffer API feels more at home in the Node environment.

Creating a buffer

There are a handful of ways of creating a buffer in Node.

We can either:

Provide a integer representing the size of the buffer, in bytes, to create.
Provide a string to store in the buffer.
Provide an array of integers to store in the buffer.
Copy an existing buffer.

And there are even more granular ways but I'll avoid making this discussion complex and rather focus on the most common approaches.

💡 Notice: Back in time, another way to create a buffer was to invoke Buffer() in the context of a constructor. However, this has long been deprecated for some reasons.

Providing a size in bytes

One of the most straightforward ways to create a new buffer is to provide a size, in units of bytes, to the Buffer.alloc() static method.

Syntactically, this could be expressed as follows:

Buffer.alloc(size)

size is an integer representing the size of the buffer to create, in bytes.

For example, let's say you want to create a buffer spanning 4 bytes of memory.

Here's how you could do it:

import { Buffer } from 'node:buffer';

let buffer = Buffer.alloc(4);
console.log(buffer);

Output:

<Buffer 00 00 00 00>

Notice how the buffer is logged...

When we log a buffer (belonging to the Buffer class) in Node, its contents are dumped into the console, although to a certain limit. Each byte's value is converted into a hexadecimal number and this number is printed.

In the log shown above, notice the four 00s. This means that there are a total of 4 bytes in the buffer, each holding the number 0 in it (which is denoted as 00 in hexadecimal).

💡 Notice: The hexadecimal notation is used for the sake of compactness. For example, 255 (three digits) in decimal is equivalent to ff (two digits) in hexadecimal.

An important thing to note regarding Buffer.alloc() is that it returns a Buffer instance whose individual bytes are prefilled with 0 if we don't specify any other fill value at the time of invocation.

💡 Notice:
If you look into the source code of Node, Buffer.alloc() — and all other methods that returns buffers — don't return an instance of Buffer but rather an instance of a FastBuffer class in Node which extends the Uint8Array class. We can't access this class because it has been shadowed by Node's source code into the Buffer class.
For simplicity, I just refer to buffers as Buffer instances in my discussion but it's important to note that, precisely speaking, that's not the case.

Speaking of which, there's another overloaded form of Buffer.alloc() where we can specify the prefill value:

Buffer.alloc(size, fill)

fill here specifies the value to fill the buffer with. If it's less than the size of the buffer, it's effectively repeated until the buffer fills to its capacity.

It can be a number or a string (in which case, it's decoded into a list of numbers; we'll learn more about this later on in this article when we understand the notion of encoding in buffers).

For example, consider the following:

import { Buffer } from 'node:buffer';

let buffer = Buffer.alloc(10, 'ab');
console.log(buffer);

Output:

<Buffer 61 62 61 62 61 62 61 62 61 62>

The repeated pattern 61 62 here represents the bytes for the characters a and b. That where these numbers come from, I'll discuss that very soon below.

Generally, we don't need this form of Buffer.alloc() because a prefill of 0 is more than sufficient for most cases.

Providing a string

Another possible way to create a new buffer is to use a string and translate it to a series of bytes. This can be done using Buffer.from().

There are many forms of Buffer.from(). The one that we're interested in right now is when the given argument is a string:

Buffer.from(str)

str is obviously the string that defines the contents of the buffer.

Being able to directly interface with buffers in Node in terms of strings is a feature that's not currently enjoyed by browsers.

That is, JavaScript in the browser comes with other, separate APIs to deal with strings when they ought to be converted into buffer data and when buffer data ought to converted into strings.

💡 Notice: These specialized APIs are TextEncoder and TextDecoder, respectively, that allow us to transition back and forth between a string and a buffer in JavaScript (in the browser).

But talking about Node, one of the most notable things to appreciate is how we can interface with buffers directly in terms of strings without requiring any additional APIs.

The way this interface happens is really simple. Honestly, like really simple!

Remember that a buffer always stores numbers — bytes, so to speak (and each byte is just a number). So when we try to store a string in a buffer, we don't really store the string as it is but rather store its individual byte values.

For example, storing the string 'ab' means storing the individual numbers 97 and 98 (0x61 and 0x62, respectively, in hexadecimal).

Similarly, when we access the contents of a buffer as a string, each number is converted into a character.

For example, the number 100 becomes the character 'd' whereas 65 becomes 'A', and so on. Likewise a buffer with these two bytes, 100 followed by 65, will become the string 'dA'.

So where are these numbers obtained from?

Well, they are the code units corresponding to the characters 'a', 'b', 'd', and 'A' in UTF-8.

Further reading:
A detailed discussion around this topic and Unicode in general can be found at JavaScript Strings — Unicode.

We'll take a look into this in more detail later on in this article.

Anyways, let's consider a quick example.

Below we create a buffer from the string 'hello':

import { Buffer } from 'node:buffer';

let buff = Buffer.from('hello');
console.log(buff);

Output:

<Buffer 68 65 6c 6c 6f>

There is no need to specify the byte length for the buffer that's being created; Node itself figures this out based on the length of the string.

In this case, the buffer spans 5 bytes because the given string's length is 5 (and also because in the UTF-8 encoding, each of the shown characters takes up 1 byte).

Providing an array of integers

The third way of creating a buffer in Node is to use an array of integers, where each integer represents a byte value.

Buffer.from(arr)

For example, let's say we have the following array of numbers with us and wish to create a buffer using it:

let bytes = [1, 10, 2, 2];

We'll do the following for this:

import { Buffer } from 'node:buffer';

let bytes = [1, 10, 2, 2];

let buff = Buffer.from(bytes);
console.log(buff);

Output:

<Buffer 01 0a 02 02>

The numbers in bytes are precisely what get stored in buff. The hexadecimal representation of 1 is 01 (in two digits), hence the first byte in the buffer's log; the one for 10 is 0a, hence the second byte in the buffer's log, and so on.

Practically speaking, it's not very common to have an array of integers (each representing a byte) with us that ultimately needs to be transformed into a buffer.

But if you ever want to do so, at least you know there's a pretty trivial way to go for it.

Copying an existing buffer

Another way to create a new buffer is to copy an existing buffer. This might be a practical thing if you wish to transform the contents of a buffer without affecting the original data (and so you create a copy).

Now to copy an existing Buffer instance is as simple as calling Buffer.from() on it. This effectively copies the entire memory allocated to the buffer.

Shown below is an example:

import { Buffer } from 'node:buffer';

let buff = Buffer.alloc(4, 10);
let buff2 = Buffer.from(buff);

console.log(buff);
console.log(buff2);
console.log(buff.buffer === buff2.buffer);

First, a buffer buff is created, spanning 4 bytes and initialized to have the number 10 filled throughout. Next up, this buffer is copied into buff2.

The following two logs simply print the contents of both the buffers, buff and buff2, to confirm that whether their contents are the same or not. The third log confirms that whether the internal memory slots assigned to both the buffers are different, since we don't ideally want the same buffer to be re-used.

Here's the output of the code above:

Output:

<Buffer 0a 0a 0a 0a>
<Buffer 0a 0a 0a 0a>
false

Firstly, as can be seen, the contents of both the buffers are identical as per expectation.

Secondly, the log false clearly indicates that the internal ArrayBuffer instances, i.e. the internal chunks of memory, belonging to both buff and buff2 are different. To access the ArrayBuffer instance, we use the buffer property.

💡 Notice: ArrayBuffer is a core JavaScript API. Recall that the Buffer class in Node is basically a wrapper on top of Uint8Array and so it's merely a view over an ArrayBuffer. ArrayBuffer is the actual low-level representation in JavaScript of a chunk of memory.

Ain't that simple?

Now that we know of multitude of ways of creating a buffer in Node, let's move to the very next logical step — how to write data into a buffer and then read data out of it.

Writing data

As stated earlier, Buffer in Node is an extension of the native Uint8Array class in JavaScript. So naturally all the operations that are supported on Uint8Array are supported on Buffer too.

This means that we can leverage the very familiar bracket notation — as we use with arrays — to access individual bytes, and also to write to them.

However, be wary of the fact that when assigning a value to a byte, it must be an integer in the range of 0 - 255.

Don't assign characters!

JavaScript, by default, coerces the value assigned to a Buffer's element to a number and then further normalizes the number before assigning the resulting value.

For example, NaN becomes 0, a value out of range becomes 0 too, etc.

This means that you won't get any benefit of doing the following:
let buff = Buffer.alloc(4, 1);
console.log(buff);

buff[0] = 'a';

console.log(buff);
Here, you might be thinking that assigning 'a' to buff[0] will put the character code of 'a' automatically at the given location but NO, that's not going to happen!

Instead, 'a' first gets converted into a number and then this number is normalized and ultimately assigned to the given location in the buffer.

In this case, 'a' converts to the number NaN which normalizes to 0. Likewise, the first byte will become 0 following the execution of buff[0] = 'a'.

Let's even confirm this by taking a glimpse into the console logs:

Output:
<Buffer 01 01 01 01>
<Buffer 00 01 01 01>
See? Before the assignment, each byte holds the decimal number 1 (which is 01 in hex) but after the assignment, the first byte becomes 0 (00 in hex).

So, if you want to assign a character to a given byte, don't forget that buffers do NOT entertain character assignments and that you instead need to manually call charCodeAt() on the character before doing so.

Something as follows:
let buff = Buffer.alloc(4, 1);
console.log(buff);

buff[0] = 'a'.charCodeAt();

console.log(buff);
Output:
<Buffer 01 01 01 01>
<Buffer 61 01 01 01>
Notice the value of first byte post-assignment now — it's 61 (in hex) which is the character code of 'a'.

Voila!

This granular way of mutating a buffer is really helpful but often times we want to write data all at once. For this, we have the write() instance method of the Buffer class. Let's explore it quickly.

`write()`

The write() method is one of the more low-level methods exposed by the Buffer class. First, let's see its straightforward syntax:

buff.write(str[, offset[, length[, encoding]]])

write() operates around strings likewise the value to write, i.e. str, is the very first argument.
The offset represents the position where the write has to begin. By default, it's the very first byte, i.e. offset 0.
length specifies the maximum number of bytes to write. This can never be such that it exceeds the last byte of the buffer (otherwise, an error is thrown).
encoding specifies the encoding the string. By default, it's 'utf8' and we don't need to typically worry about changing it.

Let's consider an example to help clear the mist off of this jargon syntax.

In the code below, we create a fresh buffer spanning 5 bytes and then write the string 'hello' to it with the help of write():

import { Buffer } from 'node:buffer';

let buff = Buffer.alloc(5);
buff.write('hello');

console.log(buff);

Output:

<Buffer 68 65 6c 6c 6f>

Notice how we skip the last three arguments to buff.write() — that's because there's no need for them.

offset is omitted since we need 'hello' to be written starting at the very first byte in buff, which is the default.
length is omitted because the entire length of the buffer needs to be written to, starting at position offset.
encoding is omitted because... well... who worries about encoding that much!

Let's consider another example to become more confident with write().

In the code below, we write to a portion of a buffer:

import { Buffer } from 'node:buffer';

let buff = Buffer.from('soot');
console.log(buff);

buff.write('ea', 1); // Change 'soot' to 'seat'

console.log(buff);

The goal is to change the buffer's content from the string representation of 'soot' to the one for 'seat'. Of course, this requires the writing to begin at offset 1 and go up to the point the string 'ea' is written completely.

And this is exactly what the call to buff.write() is doing above.

Following is the output:

Output:

<Buffer 73 6f 6f 74>
<Buffer 73 65 61 74>

Notice how the middle two bytes are changed after the call to buff.write().

The story of writing data to a buffer doesn't end here. There are many more specific methods to write to a buffer in Node.

I strongly encourage you to look up the documentation of these methods because they'll be quite handy if you'll be writing data in terms of number types when dealing with binary data.

Reading data

After we write some data to a buffer in memory, the immediate next desire is to be able to read it.

How to do so?

Well, there are a handful of ways to read data from a buffer in Node of which two common ones are:

Read a given byte via bracket notation.
Read the contents as a string, using the toString() method.

💡 Notice: If we wish to, we can even read a whole chunk of bytes based on a given numeric type, for e.g. reading any sequence of a bytes as a uint32 element, or as a float64 element, and so on.

Let's dive into each of these...

Reading via bracket notation

We already learned about this above that being an extension of the core Uint8Array API, Buffer supports bracket notation to access individual bytes.

Therefore, one of the most trivial ways to access the contents of a buffer in Node is to use the bracket notation, with the position of the byte to access.

For instance, in the following code, we create a buffer holding the string data for 'hello' before accessing its third byte (at index 2).

import { Buffer } from 'node:buffer';

let buff = Buffer.from('hello');
console.log(buff[2]);

Output:

The character code corresponding to 'l' is 108 (6c in hex), hence the output shown.

There isn't really any more to explore in this naive approach so let's move over to the second one.

Reading via `toString()`

If you remember, most classes, if not all, in JavaScript provide a toString() method for coercion of their underlying values into strings.

Following feat, Buffer also supports a toString() method which helps read the contents of a buffer as a string.

Perhaps, one of the most common ways of reading the contents of a buffer in Node is using this toString() method.

Syntactically, toString() isn't a parameterless function unlike most toString()s in JavaScript. Instead, it allows us to control what portion of the buffer we want to read as a string.

buff.toString([encoding[, start[, end]]])

encoding specifies the encoding of the string. By default, it's 'utf8'.
start specifies the starting position of the portion to read. Default is 0. (Negative indexes don't work!)
end specifies the ending position (not inclusive) of the portion to read. Default is buff.length. (Negative indexes don't work!)

Without any sort of arguments to toString(), the default behavior, as you can guess, is to read the entire buffer as a string.

💡 Notice:
One thing I particularly dislike about toString() is its awkward signature. Naturally, encoding shouldn't concern us much when trying to read a portion of a buffer, but with this signature, if we wish to read a given portion, we still need to provide a value for the encoding parameter.
We can provide undefined as the encoding (in which case, it's assumed to be 'utf8') but the point is that we have to provide it regardless. I feel that a better signature would've been to have encoding at the very end.

Time for an example.

In the following code, we have the same buffer as before (for the data corresponding to 'hello'), where we read twice: first the entire data and then only the last two bytes:

import { Buffer } from 'node:buffer';

let buff = Buffer.from('hello');

// Read the entire data
console.log(buff.toString());

// Read the last two bytes
console.log(buff.toString(undefined, 3));

Output:

hello
lo

To reemphasize on it, the toString() method is a really useful instance method of the Buffer class so make sure to get well-versed with it.

Buffers, strings, and encodings

If you've worked with Uint8Array before, you'll be aware of the fact that there is NO way to interface with it in terms of strings.

Node, on the other hand, with its own Buffer class does allow this. But behind the scenes, Node also normalizes strings to sequences of numbers.

For example, when we do something like the following:

let buff = Buffer.from('hello');

we're creating a new buffer that has as its contents the bytes corresponding to the characters in the string 'hello'.

'h' corresponds to the numeric code (or code point) 68 (in hexadecimal); 'e' corresponds to 65; 'l' corresponds to 6c; and 'o' corresponds to 6f.

Here's how the buffer created above looks in the console when logged:

Output:

<Buffer 68 65 6c 6c 6f>

Internally, in Buffer.from(), Node takes the string 'hello' and gets the numeric code of each of its characters and then dumps this code into the buffer at the respective location, for every character.

Now, whenever we go from a character to its corresponding numeric code, this happens through a process referred to as encoding. The reverse, which is to go from the numeric code to the character, is referred to as decoding.

When we supply string data to Buffer.from(), or to any buffer utility in Node, it first needs to be encoded into a sequence of numbers. Then this sequence of numbers ought to be passed on to the buffer utility.

Similarly, if we have a buffer with us and wish to output its contents as a string, we need to take the sequence of numbers stored in it and decode them to form a string.

Most importantly, for both encoding and decoding, we need an encoding format (which is often concisely referred to as encoding scheme or even just as 'encoding'). Of course, there needs to be some way to know which character becomes which number and which number becomes which character.

By far, the most efficient and widely used encoding format is UTF-8, whose underlying character set is Unicode. Each character in UTF-8 spans a minimum of 1 byte (8 bits) and a maximum of 4 bytes (32 bits).

Without going too deep into the implementation details of UTF-8, it's sufficient to know that UTF-8 is a very commonly used encoding format across the modern-day computing world, and in Node too.

By default, all buffer utilities assume the encoding format to be UTF-8, unless stated otherwise (which isn't really required that much unless you're in an advanced setting, like workin with base64-encoded strings).

The encoding format is also specified as a string value. UTF-8 is denoted as 'utf8'.

💡 Notice: Since, it's common to include the hyphen in the name, Node also allows 'utf-8' — it's the same thing.

For completeness, shown below are all of the encoding formats that Node supports at the time of this writing, in addition to UTF-8:

'utf16le': represents UTF-16LE (little endian) whereby each character takes up minimum 2 bytes (16 bits) and maximum of 4 bytes (32 bits); and the most significant byte comes last (a consequence of being little endian).
'latin1': represents the character encoding scheme ISO-8859-1 which always spans 1 byte for every character. Out of range numbers are normalized into a byte and then the corresponding character is used.
'base64': converts the sequence of numbers in the buffer back and forth between the popular base64 encoding format.
'hex': converts the sequence of numbers in the buffer back and forth between the hexadecimal format.

As I stated earlier, you'll mostly not need to even worry about specifying an encoding because almost all use cases are pretty nicely dealt with by the UTF-8 encoding scheme.

It's only in advanced tasks, such as computing cryptographic hashes, that you may need to resort to encodings like base64 or hexadecimal.

In general, you're all good! (That's a big relief, isn't it?)

💡 Notice: Think of the ability to interface in terms of strings as a wrapping functionality of the Buffer class on top of Uint8Array.

🧠 Time to practice

Now that you know what exactly are buffers in Node, and how to interface with the Buffer class, it's time to get into practice mode.

Consider the following:

DEV Community

A Quick Primer on Buffers in Node.js

Table of contents

Introduction

What are buffers?

`Buffer` wrapped by new, core JavaScript APIs

Creating a buffer

Providing a size in bytes

Providing a string

Providing an array of integers

Copying an existing buffer

Writing data

Don't assign characters!

`write()`

Reading data

Reading via bracket notation

Reading via `toString()`

Buffers, strings, and encodings

🧠 Time to practice

Top comments (0)

Table of contents

Introduction

What are buffers?

Buffer wrapped by new, core JavaScript APIs

Creating a buffer

Providing a size in bytes

Providing a string

Providing an array of integers

Copying an existing buffer

Writing data

Don't assign characters!

write()

Reading data

Reading via bracket notation

Reading via toString()

Buffers, strings, and encodings

🧠 Time to practice

`Buffer` wrapped by new, core JavaScript APIs

`write()`

Reading via `toString()`