Node.js Streams: Complete Guide to Efficient Data Processing

#node #streams #backend #performance

This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.

Node.js Streams: Complete Guide to Efficient Data Processing

Node.js Streams are one of the most powerful and underused features of the platform. They enable processing large amounts of data without loading everything into memory — critical for file uploads, data pipelines, and HTTP responses. Yet most Node.js developers avoid streams because the API (even the modern pipeline-based one) has non-obvious patterns. This guide covers streams from the ground up with practical examples you can use today.

The Four Stream Types

Type	What It Does	Examples	Key Events/Methods
Readable	Produces data that can be consumed	fs.createReadStream, HTTP request (req), process.stdin	data, end, error, pipe(), readable.read()
Writable	Consumes data that is written to it	fs.createWriteStream, HTTP response (res), process.stdout	write(), end(), drain, finish
Transform	Both reads and writes — modifies data in transit	zlib.createGzip, crypto.createCipher, CSV parser	Same as Readable + Writable, _transform() method
Duplex	Independent read and write sides (like a telephone)	net.Socket, TLS socket, WebSocket	read() + write(), data flowing in both directions

Pipeline API (Modern, Recommended)

Best for: Any time you connect streams together. pipeline() handles cleanup and error propagation automatically — raw .pipe() does not.

const { pipeline } = require('node:stream/promises');
const { createReadStream, createWriteStream } = require('node:fs');
const { createGzip } = require('node:zlib');

await pipeline(
  createReadStream('input.json'),
  createGzip(),
  createWriteStream('input.json.gz'),
);
console.log('Pipeline succeeded — file compressed');

Real-World Use Cases

1. Streaming CSV Processing (Avoid OOM on Large Files)

const { createReadStream } = require('node:fs');
const { parse } = require('csv-parse');
const { Transform } = require('node:stream');

// Process a 5GB CSV file with constant memory (~50MB)
const results = [];
createReadStream('massive-file.csv')
  .pipe(parse({ columns: true }))
  .pipe(new Transform({
    objectMode: true,
    transform(row, encoding, callback) {
      // Process and optionally filter each row
      if (row.status === 'active') {
        this.push({ id: row.id, name: row.name });
      }
      callback();
    }
  }))
  .on('data', (row) => results.push(row))
  .on('end', () => console.log(`Processed ${results.length} rows`));

2. HTTP Streaming Large Responses

// Instead of: res.json(allData) — loads all data into memory
// Use: stream data to client as you produce it
app.get('/api/export', async (req, res) => {
  res.setHeader('Content-Type', 'application/json');
  res.write('[');
  let first = true;
  const cursor = db.collection('events').find().stream();
  for await (const doc of cursor) {
    if (!first) res.write(',');
    res.write(JSON.stringify(doc));
    first = false;
  }
  res.write(']');
  res.end();
});

3. Handling Backpressure

Best practice: Respect the return value of write(). When write() returns false, the writable stream's internal buffer is full — pause reading until the drain event fires.

const readStream = createReadStream('huge-file.bin');
const writeStream = createWriteStream('copy.bin');

readStream.on('data', (chunk) => {
  const canContinue = writeStream.write(chunk);
  if (!canContinue) {
    readStream.pause(); // Stop reading — buffer is full
    writeStream.once('drain', () => readStream.resume()); // Resume when drained
  }
});
// Note: pipeline() handles this automatically — prefer it over manual piping

Bottom line: Streams are essential for processing data that exceeds memory limits. The pipeline() API should be your default — it handles backpressure, error propagation, and cleanup correctly. Avoid raw .pipe() and .on('data') patterns unless you have a specific reason. See also: Caching Strategies and REST API Best Practices.

Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.

Found this useful? Check out more developer guides and tool comparisons on AI Study Room.

DEV Community

Node.js Streams: Complete Guide to Efficient Data Processing

Node.js Streams: Complete Guide to Efficient Data Processing

The Four Stream Types

Pipeline API (Modern, Recommended)

Real-World Use Cases

1. Streaming CSV Processing (Avoid OOM on Large Files)

2. HTTP Streaming Large Responses

3. Handling Backpressure

Top comments (0)