Buffered vs Streaming Data Transfer

#javascript #webdev #node #http

Introduction

In this article, we will see different methods of data transfer and how to efficiently transfer data from/to a server using HTTP protocol.

With HTTP protocol, the client can either transfer data to the server (upload) or request certain data from the server (download).

Client/Server can choose to transfer the data in different chunks, especially if the size of the data is more than a few kilobytes to ensure that the packet loss is minimal.

For download operations, server can choose to send the data in chunks while for upload operations, client can choose to send the data in chunks.

When request data is accumulated in memory and then processed, we call it buffered processing.

In contrast, when the data is processed at the time each chunk is received, we call it streaming processing.

By processing, it could be anything depending on the request, from storing in the file system to transmitting to some other server. For simplicity, we will only consider storing the data in the file system.

We will compare the two approaches and see which one is better in terms of performance and memory usage.
We will also see how to implement both approaches in Node.js.

Buffered Data Transfer

As soon as the client starts sending the data, server stores all byte chunks in the memory, the chunks are concatenated into a buffer (the chunks array in the code below) and when the request is finished sending all chunks, a close event is emitted and then saved to the file system. Eventually, garbage collector will free up the memory in the next run.

Let’s take a look at the following code snippet that implements the buffered upload.

import fs from "fs";
import crypto from "crypto";

async function bufferedUploadHandler(req, res) {
  let uuid = crypto.randomUUID();
  let bytesRead = 0;
  let chunks = [];

  req.on("data", (chunk) => {
    bytesRead += chunk.length;
    // add chunks into an array, file content is gauranteed to be in order
    chunks.push(chunk);
  });

  req.on("close", () => {
    // Put all the chunks into a buffer together
    let data = Buffer.concat(chunks).toString();
    // Save the content to the file on disk
    fs.writeFileSync(`./data/file-buffered-${uuid}.txt`, data, {
      encoding: "utf-8",
    });
    res.end(`Upload Complete, ${bytesRead} bytes uploaded.`);
  });
}

Streaming Data Transfer

As can be seen from the flow diagram, the chunks from the request body are written directly to the file system with the help of fs.createWriteStream and pipe and not stored in the memory for the complete timeline of the request.

This is possible because request implements Readable Stream interface and the content can be directly written to the file system using Writable Stream interface from fs.

The code snippet below that implements the file streaming upload.

import fs from "fs";
import crypto from "crypto";

async function streamedUploadHandler(req, res) {
  let uuid = crypto.randomUUID();

  let fsStream = fs.createWriteStream(`./data/file-streamed-${uuid}.txt`);
  req.pipe(fsStream); // req.read() -> fsStream.write()

  req.on("close", () => {
    fsStream.close();
    res.end(`Upload Complete, ${bytesRead} bytes uploaded.`);
  });

  let bytesRead = 0;
  req.on("data", (chunk) => {
    bytesRead += chunk.length;
  });
}

Let’s use the above in a basic http server.

import { createServer } from "http";

const server = createServer(async (req, res) => {
  const { method, url } = req;
  console.log(req.method, req.url);

  if (method === "POST" && url == "/upload-buffer") {
    await bufferedUploadHandler(req, res);
    return;
  } else if (method === "POST" && url == "/upload-stream") {
    await streamedUploadHandler(req, res);
    return;
  }
});

server.listen(3000);

console.log("Server listening on http://localhost:3000");

Benchmark

We will now compare the two approaches in terms of performance (response time) and memory usage, by firing a number of concurrent requests in parallel.

For the comparison, to also analyze the memory usage of server in each case, we can change the response to return peak memory usage of the server as below:

let peakMemory = 0;

// Update every two milliseconds
setInterval(() => {
  let memoryUsage = process.memoryUsage();
  if (memoryUsage.rss > peakMemory) {
    peakMemory = memoryUsage.rss; // Resident Set Size
  }
}, 2);

async function streamedUploadHandler(req, res) {
  let uuid = crypto.randomUUID();

  let fsStream = fs.createWriteStream(`./data/file-streamed-${uuid}.txt`);
  req.pipe(fsStream); // req.read() -> fsStream.write()
  req.on("close", () => {
    fsStream.close();
    res.end(peakMemory.toString());
  });
}

We are interested in the peak memory usage and not just the current memory usage.

Client Side Code

For firing parallel requests to the server, I used the following code which fires 200 requests in parallel, each containing a file of roughly 15 MB.

Once all requests are complete, we calculate the average, minimum and maximum time taken for the requests to complete, and also the maximum memory usage of the server.

The script is run multiple times, with different number of parallel requests each for buffered upload and streaming upload to compare the results.

import http from "http";
import fs from "fs";

async function fileUploadRequest(fileName, requestPath = "/upload-stream") {
  return new Promise((resolve, reject) => {
    let start = Date.now();
    const req = http.request(
      {
        method: "POST",
        hostname: "localhost",
        port: "3000",
        path: requestPath,
        headers: {
          Accept: "*/*",
        },
      },
      function (res) {
        const chunks = [];

        res.on("data", function (chunk) {
          chunks.push(chunk);
        });

        res.on("end", function () {
          const body = Buffer.concat(chunks).toString();
          let end = Date.now();
          resolve({
            time: end - start,
            memoryUsage: Number(body),
          });
        });
      }
    );

    const a = fs.createReadStream(fileName);

    a.pipe(req, { end: false });

    a.on("end", () => {
      req.end();
    });

    req.on("error", function (error) {
      reject(error);
    });
  });
}

async function fireParallelRequests(count, path = "/upload-stream") {
  const promises = [];
  for (let i = 0; i < count; i++) {
    promises.push(fileUploadRequest("./data.csv", path));
  }

  let metrics = await Promise.all(promises);

  let latencies = metrics.map((m) => m.time);
  let min = Math.min(...latencies);
  let max = Math.max(...latencies);
  let avg = latencies.reduce((a, b) => a + b, 0) / latencies.length;
  let maxMemoryUsage = Math.max(metrics.map((m) => m.memoryUsage));

  console.log("Total Requests:", count);
  console.log("URL:", path);
  console.log("Min Time:", min);
  console.log("Max Time:", max);
  console.log("Avg Time:", avg);
  console.log(
    "Max Memory Usage:",
    `${Math.round((maxMemoryUsage / 1024 / 1024) * 100) / 100} MB`
  );
}

async function main() {
  await fireParallelRequests(200, "/upload-stream");
  // await fireParallelRequests(200, "/upload-buffer");
}

main();

Here are the results from 200 concurrent requests for both buffered and streaming upload:

Parameter	Buffered Data Transfer	Streaming Data Transfer
Total Requests	200	200
Min Time (ms)	1975	1297
Max Time (ms)	34609	31680
Avg Time (ms)	13061	3995
Max Memory Usage (MB)	2889.66 MB	276.39 MB

As can be seen, the difference in memory usage is quite apparent, as the buffered upload uses almost 10 times more memory than streaming upload. In fact, the memory usage for buffered upload is so high that it can cause the server to crash if the number of concurrent requests is high enough.

Also, the difference in average latency is noticeable where the streaming upload is as much as 3 times faster than buffered upload.

The following charts depict the time and memory usage for both buffered and streaming upload for different number of concurrent requests.

The memory usage for buffered upload increases linearly with the number of requests, while for streaming upload, it remains almost constant.

Conclusion

In this article, we saw how to implement buffered and streaming data transfer in Node.js and compared the two approaches in terms of performance and memory usage.

We saw that streaming upload is much faster than buffered upload and also uses much less memory.

We usually have limited memory available on the server and we want to process as many requests as we can, so streaming upload is the way to go.
But sometimes, we may want to implement buffered upload, for example, if we want to process the data in some way before saving it to the file system.

References