Modern applications do not process small files anymore.
They process:
- Gigabytes of CSV data
- Real-time payments
- Live chat messages
- Streaming APIs
- Video uploads
- Database records
- AI responses
- Continuous logs and analytics
If you load everything into memory first, your application becomes slow, expensive, and eventually crashes.
This is where Streams, Async Iterators, Workers, and Web Streams become critical.
In this article, we’ll go from the fundamentals to real-world production patterns using Node.js.
What Are JavaScript Events?
Before understanding Streams, we first need to understand events.
JavaScript is event-driven.
An event is simply:
“Something happened.”
Examples:
- User clicked a button
- File finished loading
- Payment completed
- Stream received data
- Socket disconnected
Node.js internally uses an event system called the Event Emitter.
Example:
import EventEmitter from 'events';
const emitter = new EventEmitter();
emitter.on('payment-approved', (data) => {
console.log('Payment approved:', data);
});
emitter.emit('payment-approved', {
user: 'John',
amount: 100
});
Output:
Payment approved: { user: 'John', amount: 100 }
This event-based architecture is one of the foundations behind Streams.
Observer Pattern in Practice — E-commerce Payments
Streams and events are heavily related to the Observer Pattern.
The Observer Pattern means:
One object emits updates, many objects react independently.
Example:
class PaymentService extends EventEmitter {}
const payment = new PaymentService();
payment.on('approved', (data) => {
console.log('Sending email...', data);
});
payment.on('approved', (data) => {
console.log('Updating analytics...', data);
});
payment.on('approved', (data) => {
console.log('Creating invoice...', data);
});
payment.emit('approved', {
orderId: 10,
total: 300
});
One event triggered:
- Email service
- Analytics
- Invoice generation
This is how many real-world architectures work internally.
Buffers — The Core Concept Behind Streams
Streams process data in small chunks.
These chunks are usually represented using Buffers.
A Buffer is raw binary data.
Example:
const buffer = Buffer.from('Hello');
console.log(buffer);
Output:
<Buffer 48 65 6c 6c 6f>
Streams move these chunks progressively instead of loading everything at once.
What Are Streams?
A Stream is a way to process data progressively.
Instead of:
read entire file -> process -> send
Streams do:
read chunk -> process -> send
read chunk -> process -> send
read chunk -> process -> send
This makes applications:
- Faster
- More memory efficient
- Scalable
Types of Streams in Node.js
Node.js has 4 main stream types.
| Stream Type | Purpose |
|---|---|
| Readable | Read data |
| Writable | Write data |
| Duplex | Read + write |
| Transform | Modify data while streaming |
Readable Stream Example
import fs from 'fs';
const stream = fs.createReadStream('./bigfile.txt', {
encoding: 'utf8'
});
stream.on('data', (chunk) => {
console.log(chunk);
});
The file is read piece by piece.
Writable Stream Example
import fs from 'fs';
const writable = fs.createWriteStream('./output.txt');
writable.write('Hello\n');
writable.write('World\n');
writable.end();
Transform Stream Example
Transform streams modify data while it flows.
import { Transform } from 'stream';
class UpperCase extends Transform {
_transform(chunk, encoding, callback) {
callback(null, chunk.toString().toUpperCase());
}
}
process.stdin
.pipe(new UpperCase())
.pipe(process.stdout);
input:
hello
Output:
HELLO
Duplex Streams
Duplex streams can:
- Read
- Write at the same time.
Examples:
- TCP sockets
- WebSockets
- Chat systems
Duplex Streams in Practice — Chat Between Servers
Node.js has a native TCP module called net.
server:
import net from 'net';
const server = net.createServer((socket) => {
socket.write('Connected!\n');
socket.on('data', (data) => {
console.log('Client:', data.toString());
socket.write(`Server received: ${data}`);
});
});
server.listen(3000);
Client:
import net from 'net';
const client = net.connect(3000);
client.write('Hello server');
client.on('data', (data) => {
console.log(data.toString());
});
This is a real duplex communication channel.
.pipe() vs pipeline()
Most developers use .pipe().
Example:
readable.pipe(writable);
But .pipe() has a problem:
- weak error handling
The safer approach is pipeline().
import { pipeline } from 'stream/promises';
await pipeline(
readable,
transform,
writable
);
Benefits:
- automatic cleanup
- proper error propagation
- prevents memory leaks
In production systems, pipeline() is usually preferred.
Massive CSV Processing — CSV to NDJSON
Imagine:
- 10GB CSV file
- millions of records
Bad approach:
const data = fs.readFileSync('./huge.csv');
This loads everything into RAM.
Instead, use streams.
Streaming CSV Parser
import fs from 'fs';
import csv from 'csv-parser';
fs.createReadStream('./huge.csv')
.pipe(csv())
.on('data', (row) => {
console.log(row);
});
Now rows are processed progressively.
Converting CSV to NDJSON
NDJSON means:
- One JSON object per line.
Example:
{"id":1}
{"id":2}
{"id":3}
Transform stream:
import { Transform } from 'stream';
class ToNDJSON extends Transform {
constructor() {
super({ objectMode: true });
}
_transform(chunk, enc, cb) {
cb(null, JSON.stringify(chunk) + '\n');
}
}
Pipeline:
await pipeline(
fs.createReadStream('./huge.csv'),
csv(),
new ToNDJSON(),
fs.createWriteStream('./output.ndjson')
);
This is how big data ingestion systems work.
Async Iterators and Generator Functions
Streams integrate perfectly with async iteration.
Example:
for await (const chunk of stream) {
console.log(chunk.toString());
}
This creates clean, readable stream processing.
Generator Functions
async function* generateNumbers() {
yield 1;
yield 2;
yield 3;
}
for await (const num of generateNumbers()) {
console.log(num);
}
Generators are extremely useful for:
- on-demand processing
- pagination
- streaming APIs
- lazy loading
SQL Streaming on Demand
Instead of loading 1 million rows:
SELECT * FROM users;
Some database drivers support streaming results progressively.
Example concept:
const stream = db.queryStream('SELECT * FROM users');
for await (const row of stream) {
console.log(row);
}
Benefits:
- lower memory usage
- faster response time
- scalable ETL systems
Aborting Async Operations
Modern Node.js supports AbortController.
Example:
const controller = new AbortController();
setTimeout(() => {
controller.abort();
}, 3000);
fetch(url, {
signal: controller.signal
});
Useful for:
- canceling requests
- timeouts
- stopping streams
- shutting down workers
Consuming Web APIs as Streams
Some APIs stream data progressively.
Example:
const response = await fetch(url);
for await (const chunk of response.body) {
console.log(chunk);
}
This is used heavily in:
- AI streaming responses
- video streaming
- live analytics
- stock market feeds
Web Streams API
Node.js now supports the Web Streams standard.
This makes backend and frontend streaming more unified.
Main stream types:
- ReadableStream
- WritableStream
- TransformStream
Creating a Web Readable Stream
const stream = new ReadableStream({
start(controller) {
controller.enqueue('Hello');
controller.enqueue('World');
controller.close();
}
});
Consuming Massive Data — Frontend + Backend Streaming
Backend:
app.get('/stream', (req, res) => {
const file = fs.createReadStream('./huge.txt');
file.pipe(res);
});
Frontend:
const response = await fetch('/stream');
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
console.log(value);
}
This enables progressive rendering.
Huge applications use this pattern:
- Netflix
- YouTube
- ChatGPT-like streaming UIs
- dashboards
Parallel Processing with Child Processes
Node.js is single-threaded for JavaScript execution.
Heavy CPU tasks block the event loop.
Solution:
- child processes
- workers
Example:
import { fork } from 'child_process';
const process = fork('./worker.js');
process.send({ start: true });
process.on('message', (msg) => {
console.log(msg);
});
Useful for:
- image processing
- video compression
- large parsing jobs
- AI inference
Worker Threads
Workers are lighter than child processes.
Example:
import { Worker } from 'worker_threads';
const worker = new Worker('./worker.js');
worker.on('message', (data) => {
console.log(data);
});
N-Tier Architecture with Workers
Modern frontend architectures increasingly use workers.
Examples:
- parsing CSV in browser workers
- background search indexing
- image editing
- AI tokenization
Main UI thread stays responsive
Parsing CSV to JSON On Demand
Streaming parser:
async function* parseCSV(stream) {
for await (const chunk of stream) {
yield transform(chunk);
}
}
This pattern enables:
- lazy processing
- scalable pipelines
- real-time analytics
Finding Occurrences and Reporting Progress
Example:
- scanning 100GB logs
- finding errors progressively
let count = 0;
for await (const chunk of stream) {
const matches = chunk.toString().match(/ERROR/g);
if (matches) {
count += matches.length;
}
console.log('Current count:', count);
This allows:
- live progress reporting
- streaming dashboards
- real-time monitoring
Why Streams Matter in Modern Systems
Streams are no longer optional.
They power:
- AI systems
- video platforms
- analytics pipelines
- ETL systems
- chat applications
- distributed systems
- server-side rendering
- real-time dashboards
Without streams:
- memory explodes
- latency increases
- scalability collapses
Final Thoughts
Learning Streams changes how you think about backend systems.
You stop thinking:
Load everything first
and start thinking:
Process data progressively
That mindset is essential for building scalable modern applications.
Once you combine:
- Streams
- Async Iterators
- Web Streams
- Workers
- Pipelines
- AbortControllers
you begin building systems that behave like real production infrastructure instead of small demo applications.
Top comments (0)