Streams are a built-in feature in Node.js and represent asynchronous flow of data. Streams are also a way to handle reading and/or writing files. A Node.js stream can help process large files larger than the free memory of your computer, since it processes the data in small chunks.
Streams in Node.js
This is the first article of a series about streams in Node.js. It aims to give an overview of different types of streams and what the limitations, benefits and use-cases are.
Streams in Node.js
- What is a Stream in Node.js? (this article)
- Connect streams with the pipe method (planned)
- Handle stream errors (planned)
- Connect streams with the pipeline method (planned)
What are streams?
Streams are an interface for working with streaming data. Think of a Unix pipe |
as a mental model of streams. Essentially, a stream is a collection of data, which isn't available at once. The streamed data arrives in small chunks. As a result we handle each data chunk when it arrives asynchronously.
In Node.js streams are used in many built-in modules to handle async data processing, for example, the http
module uses streaming interfaces with ClienRequest
and ServerResponse
. Stream data is a buffer by default, unless it is configured to with objects. This means it helps to buffer the data in memory.
Why use streams?
Streams let us work with data that is too large to fit into memory. We can work with a chunk of data at a time. For instance, you are working with a 50gb file of analytics data with millions of rows. If you read this file into memory, it will take very long and eventually hit the memory limit of Node.js or of your local machine. Handling this file with a stream, we can process each row from the dataset at a time and don't have to read the file into memory. Hence, streams are memory efficient.
Streams are also useful in other scenarios. For example reading a large file into memory (assuming it fits), it would take some time to be readable. When consuming data from a stream, it's readable the moment a chunk of data arrives. This means streams are time efficient compared to reading data into memory.
Streams can be combined to and with other streams. For instance, the output of one stream can be used as the input for another stream. This allows us to combine streams into a pipeline through which data can flow between the streams. Hence, streams are composable.
Types of streams
There are 5 types of streams in the built-in stream
module of Node.js. docs
- Readable : You receive data from a readable stream.
- Writable : You stream data to a writable stream. Also, referred as sink, because it is the end-destination of streaming data.
- Duplex : A duplex stream implements both interfaces - readable and writable. An example for duplex stream is a TCP socket, where data flows in both directions.
- Transform : Transform stream is a type of duplex stream, where the passing through data is transformed. So, the output will be different from the input. Data can be send to a transform stream, and read after it has been transformed.
- PassThrough : The PassThrough stream is a Transform stream, but doesn't transform data when passed through. It's mainly used for testing and examples.
Out in the wild there is a high possibility you will encounter readable
, writeable
and transform
streams.
Stream Events
All streams are instances of EventEmitter
. EventEmitters are used to emit and respond to events asynchronously. Read more about EventEmitters in the article Event Emitters in Node.js. Events emitted by streams can be used to read and/or write data, manage the stream state, and handle errors.
Though streams are instances of EventEmitter
it is not recommended, to handle streams like events and just listen to the events. Instead, the recommended way is to use the pipe
and pipeline
methods, which consume streams and handle the events for you.
Working with stream events is useful, when a more controlled way of how the stream is consumed is needed. For instance, triggering an event when a particular stream ends or begins. Have a look at the official Node.js docs regarding Streams for more information on this.
Readable stream events
-
data
- emitted when the stream outputs a data chunk. -
readable
- emitted when there is data ready to be read from the stream. -
end
- emitted when no more data is available. -
error
- emitted when an error has occurred within the stream, and an error object is passed to the handler. Unhandled stream errors can crash the application.
Writable stream events
-
drain
will be emitted, when the writable stream's internal buffer has been cleared and is ready to have more data written into it. -
finish
will be emitted, when all data has been written. -
error
will be emitted when an error occurred while writing data, and an error object is passed to the handler. Unhandled stream errors can crash the application.
TL;DR
- Streams are an interface for working with streaming data.
- Stream data is a buffer by default.
- Streams are memory efficient. They consume only minimal amounts of memory.
- Streams are time efficient , data is readable as soon as the first chunk arrives.
- Streams are composable , they can be connected and combined with other streams.
- All streams are instances of EventEmitter, but listening to stream events is not the correct way of consuming a stream.
- Listening to stream events is useful, when you want to trigger something when the stream ends or starts.
Thanks for reading and if you have any questions , use the comment function or send me a message @mariokandut.
If you want to know more about Node, have a look at these Node Tutorials.
References (and Big thanks):
HeyNode,Node.js - Streams,MDN - Writable Stream,MDN - Streams
Top comments (0)