DEV Community

Cover image for How to use node.js streams for fileupload
tq-bit
tq-bit

Posted on • Edited on

How to use node.js streams for fileupload

TL:DR - Skip the theory - Take me to the code

Prerequisites

Notes: For this article it is required that you have installed working version of Node.js on your machine. You will also need an http client for request handling. For this purpose, I will use Postman.  

What are streams for Node.js?

Streams are a very basic method of data transmission. In a nutshell, they divide your data into smaller chunks and transfer (pipe) them, one by one, from one place to another. Whenever you're watching a video on Netflix, you're experiencing them first hand - not the whole video is initially sent to your browser, but only parts of it, piece by piece.

A lot of npm and native node modules are using them under the hood, as they come with a few neat features:

  • Asynchronously sending requests and responses
  • Reading data from - and writing data to one another - physical location
  • Processing data without putting them into memory

The processing part makes streams particularly charming as it makes dealing with bigger files more efficient and lives the spirit of node's event loop unblocking i/o magic.

To visualize streams, consider the following example.

You have a single file with a size of 4 gb. When processing this file, it is loaded into your computers memory. That would be quite a boulder to digest all at once.

an image that shows a visualization of how data is loaded into a buffer, sent, buffered again and then saved to memory

Buffering means loading data into RAM. Only after buffering the full file, it will be sent to a server.

Streams, in comparison to the example above, would not read/write the file as a whole, but rather split it into smaller chunks. These can then be sent, consumed or worked through one by one, lowering stress for the hardware during runtime. And that's exactly what we'll build now.

an image that shows, in comparison to the image above, how data are streamed and therefor not loaded into memory.

Instead of loading the whole file, streams process parts (chunks) of it one by one.

In a nutshell, streams splits a computer resource into smaller pieces, working through these one by one, instead of processing it as a whole.


Get started

... or skip to the full example right away

Let's formulate the features we'd like to have:

  • To keep it simple, we will work with a single index file that opens an express server.
  • Inside of it, there's a route that reacts to POST - requests and in which the streaming will take place.
  • The file sent will be uploaded to the project's root directory.
  • (Optional): We are able to monitor the streaming progress while the upload takes place.

Also, let's do the following to get started:

  1. Open up your favourite text editor and create a new folder.
  2. Initialize a npm project and install the necessary modules.
  3. Add an index.js file, which we'll populate with our code in a moment.


# Initialize the project
$ npm init -y

# Install the express module
$ npm i express

# Optionally add nodemon as dev dependency
$ npm i -D nodemon

# Create the index.js file
# $ New-Item index.js (Windows Powershell)
$ touch index.js (Linux Terminal)


Enter fullscreen mode Exit fullscreen mode

When everything is done, you should have a folder structure that looks like this:



project-directory
| - node_modules
| - package.json
| - index.js


Enter fullscreen mode Exit fullscreen mode

Create the server

Add the following to your index.js file to create the server listening to request:



// Load the necessary modules and define a port
const app = require('express')();
const fs = require('fs');
const path = require('path');
const port = process.env.PORT || 3000;

// Add a basic route to check if server's up
app.get('/', (req, res) => {
  res.status(200).send(`Server up and running`);
});

// Mount the app to a port
app.listen(port, () => {
  console.log('Server running at http://127.0.0.1:3000/');
});


Enter fullscreen mode Exit fullscreen mode

Then open the project directory in a terminal / shell and start the server up.



# If you're using nodemon, go with this
# in the package.json: 
# { ...
#   "scripts": {
#     "dev": "nodemon index.js"
#   }
# ... } 

# Then, run the dev - script
$ npm run dev

# Else, start it up with the node command
$ node index.js


Enter fullscreen mode Exit fullscreen mode

Navigate to http://localhost:3000. You should see the expected response.

Writing a basic stream to save data to a file

There are two types of streaming methods - one for reading, and one for writing. A very simplistic example of how to use them goes like this, whereas whereFrom and whereTo are the respective path to from and to where the stream should operate. This can either be a physical path on your hard-drive, a memory buffer or a URL.  



const fs = require("fs");

const readStream = fs.createReadStream(whereFrom)
const writeStream = fs.createWriteStream(whereTo)

// You could achieve the same with destructuring:
const {createReadStream, createWriteStream} = require("fs");


Enter fullscreen mode Exit fullscreen mode

After being created and till it closes, the stream emits a series of events that we can use to hook up callback functions. One of these events is 'open', which fires right after the stream is instantiated.

Add the following below the app.get() method in the index.js - file



app.post('/', (req, res) => {
  const filePath = path.join(__dirname, `/image.jpg`);
  const stream = fs.createWriteStream(filePath);

  stream.on('open', () => req.pipe(stream););
});


Enter fullscreen mode Exit fullscreen mode

What I found particular interesting about this one is:

Why does the req argument have a pipe method?

The answer is noted in the http - module documentation which express builds on - a request itself is an object that inherits from the parent 'Stream' class, therefor has all its methods available.

Having added the stream, let us now reload the server, move to Postman and do the following:

  1. Change the request method to POST and add the URL localhost:3000.
  2. Select the 'Body' tab, check the binary option and choose a file you would like to upload. As we've hardcoded the name to be 'image.jpg', an actual image would be preferable.
  3. Click on 'Send' and check back to the code editor.

If everything went well, you'll notice the file you just chose is now available in the project's root directory. Try to open it and check if the streaming went successful.

If that was the functionality you were looking for, you could stop reading here. If you're curious to see what else a stream has in stock, read ahead.

Use stream -events and -methods

Streams, after being created, emit events. In the code above, we're using the 'open' - event to only pipe data from the request to its destination after the stream is opened. These events work very similar to the ones you know from app.use(). and make use of node's event loop. Let's now take a look at some of these which can be used to control the code flow

Event 'open'

As soon as the stream is declared and starts its job, it fires the open event. That is the perfect opportunity to start processing data, just as we've done previously.

Event 'drain'

Whenever a data chunk is being processed, it's 'drained' to / from somewhere. You can use this event to e.g. monitor how much bytes have been streamed.

Event 'close'

After all data has been sent, the stream closes. A simple use case for 'close' is to notify a calling function that the file has been completely processed and can be considered available for further operations.

Event 'error'

If things go sideways, the error event can be used to perform an action to catch exceptions.

Let us now integrate the three new events with some basic features. Add the following to your main.js file, below the closing of the 'open' event:



stream.on('drain', () => {
 // Calculate how much data has been piped yet
 const written = parseInt(stream.bytesWritten);
 const total = parseInt(headers['content-length']);
 const pWritten = (written / total * 100).toFixed(2)
 console.log(`Processing  ...  ${pWritten}% done`);
});

stream.on('close', () => {
 // Send a success response back to the client
 const msg = `Data uploaded to ${filePath}`;
 console.log('Processing  ...  100%');
 console.log(msg);
 res.status(200).send({ status: 'success', msg });
});

stream.on('error', err => {
 // Send an error message to the client
 console.error(err);
 res.status(500).send({ status: 'error', err });
});


Enter fullscreen mode Exit fullscreen mode

Wrap up & modularization

Since you probably would not drop your functions right into a .post() callback, let's go ahead and create its own function to wrap this article up. I'll spare you with the details, you can find the finalized code below.

Also, if you skipped from above, the following is happening here:

  • The code below creates an express server that handles incoming post requests.
  • When a client sends a file stream to the route, its contents are uploaded.
  • During the upload, four events are fired.
  • In these, functions are called to process the file's content and provide basic feedback on the upload progress.

Now it's your turn. How about building a user interface that takes over the job of sending a file to the root path? To make it more interesting, try using the browser's filereader API and send the file asynchronously, instead of using a form. Or use a module like Sharp to process an image before streaming it back to the client.

PS: In case you try the former method, make sure to send the file as an ArrayBuffer



// Load the necessary modules and define a port
const app = require('express')();
const fs = require('fs');
const path = require('path');
const port = process.env.PORT || 3000;

// Take in the request & filepath, stream the file to the filePath
const uploadFile = (req, filePath) => {
 return new Promise((resolve, reject) => {
  const stream = fs.createWriteStream(filePath);
  // With the open - event, data will start being written
  // from the request to the stream's destination path
  stream.on('open', () => {
   console.log('Stream open ...  0.00%');
   req.pipe(stream);
  });

  // Drain is fired whenever a data chunk is written.
  // When that happens, print how much data has been written yet.
  stream.on('drain', () => {
   const written = parseInt(stream.bytesWritten);
   const total = parseInt(req.headers['content-length']);
   const pWritten = ((written / total) * 100).toFixed(2);
   console.log(`Processing  ...  ${pWritten}% done`);
  });

  // When the stream is finished, print a final message
  // Also, resolve the location of the file to calling function
  stream.on('close', () => {
   console.log('Processing  ...  100%');
   resolve(filePath);
  });
   // If something goes wrong, reject the primise
  stream.on('error', err => {
   console.error(err);
   reject(err);
  });
 });
};

// Add a basic get - route to check if server's up
app.get('/', (req, res) => {
 res.status(200).send(`Server up and running`);
});

// Add a route to accept incoming post requests for the fileupload.
// Also, attach two callback functions to handle the response.
app.post('/', (req, res) => {
 const filePath = path.join(__dirname, `/image.jpg`);
 uploadFile(req, filePath)
  .then(path => res.send({ status: 'success', path }))
  .catch(err => res.send({ status: 'error', err }));
});

// Mount the app to a port
app.listen(port, () => {
 console.log('Server running at http://127.0.0.1:3000/');
});



Enter fullscreen mode Exit fullscreen mode

This post was originally published at https://q-bit.me/use-node-streams-to-upload-files/
Thank you for reading. If you enjoyed this article, let's stay in touch on Twitter 🐤 @qbitme

Top comments (10)

Collapse
 
aderchox profile image
aderchox

This article is really valuable, I also want to ask you, just as @longbotton_dev did, to write more of such amazing articles on Node.js core/fundamentals. I have a few questions though:
Question 1. When I do the above, the file, which is an image, is uploaded successfully, and the size of the upload file matches the original file as well, but when I open the uploaded image (the one on the server) it's just blank. I tried with a few different images to make sure it's not an issue of a certain image.
My code:
Client:

        fileinput.onchange = (e) => {
            const formData = new FormData();
            console.log({ file: e.target.files[0] });
            formData.append('file', e.target.files[0]);
            upload(formData);
        }
        async function upload(data) {
            const response = await fetch("http://localhost:1234/upload", {
                method: "POST",
                body: data
            })
        }
Enter fullscreen mode Exit fullscreen mode

Server:

app.post("/upload", (req, res) => {
  const stream = fs.createWriteStream(path.resolve(__dirname, "file.png")); // name is hard-coded
  stream.on("open", () => req.pipe(stream));
});
Enter fullscreen mode Exit fullscreen mode

Question 2. Towards the end of the article you've said:

To make it more interesting, try using the browser's filereader API and send the file asynchronously, instead of using a form.

But according to this SO answer:

When doing a file upload from a File on disk, the browser doesn't load the full file in memory but streams it through the request. This is how you can upload gigs of data even though it wouldn't fit in memory. This also is more friendly with the HDD since it allows for other processes to access it between each chunk instead of locking it.
When reading the File through a FileReader you are asking the browser to read the full file to memory, and then when you send it through XHR the data from memory is being used. You are thus limited by the memory available, bloating it for no good reasons, and even asking the CPU to work here while the data could have gone from the disk to the network card almost directly.

So it seems that using FileReader is not interesting at all... or maybe I'm getting it wrong.
I hope you're still checking dev.to 😁 Thanks a lot.

Collapse
 
tqbit profile image
tq-bit • Edited

Hi there. Thank you for your reply :-)

I'm still here, will try and replicate your first case.

Good point on the filereader as well. This one was one of the first posts I made when learning Javascript & Node.js and wasn't very familiar with what's good for performance and what's not. If you have an input field available, you don't necessarily need the file reader. I just figured it'd be helpful to include because it'd be the next thing I took a look at.

PS: Out of curiousity: What about Node.js core content would you like to read? I thought about writing an article on how the http module works, but I feel like that'd be a bit trivial.

Collapse
 
aderchox profile image
aderchox • Edited
  1. This might be ridiculous (or funny?), but after I added this comment here, I read a chapter on streams from a node.js book and now that I check node.js docs again, I don't see much more that I personally need to learn about node.js itself. But I think the main reason I asked for more is because the first part of your article (the theory) was so well explained that excited me :D and I thought your articles will be valuable for future readers about whatever they should be.

  2. Did you manage to replicate the issue (the first case)?

  3. I also want to recommend a few things about the article:

    • Mention that this is exactly what packages like Multer and Formidable use under the hood.
    • Mention that <input type="file"> does not load the files in RAM and it is things like fetch that create the stream automatically internally as soon as they are making the request (so no need to use the FileReader API explicitly). I know it's not directly related to your article, but realizing this connected some vagueness dots for me personally.
    • Explain a bit more clearly where the "open" event is documented, I couldn't find it on Writeable Streams docs on Node.js docs. Maybe this was for an older version of Node.js?

Thanks for your response.

Thread Thread
 
tqbit profile image
tq-bit • Edited
  1. Noted. I do try to make my articles easily graspable. Sometimes it works, sometimes it doesn't.

  2. Yes. All you have to do is to leave the form data out. Or implement a form parser on the backend. You're basically handling the raw binary data without the form wrapper. I'm not exactly sure how form data is parsed, but I did change your code so it looks like so and it worked (same server code). I attached the full staticially served index.html file:

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8" />
        <meta http-equiv="X-UA-Compatible" content="IE=edge" />
        <meta name="viewport" content="width=device-width, initial-scale=1.0" />
        <title>Document</title>
    </head>
    <body>
        <h1>Hello World!</h1>
        <input type="file" />

        <script>
            const input = document.querySelector('input');
            input.onchange = (e) => {
                upload(e.target.files[0]);
            };
            async function upload(data) {
                const response = await fetch('http://localhost:1234/upload', {
                    method: 'POST',
                    body: data,
                });
            }
        </script>
    </body>
</html>
Enter fullscreen mode Exit fullscreen mode
  1. Formidable works a bit differently. It's a form parser, more standardised than what's going on here. Since I discovered fastify, I favour Busboy over Multer, but I believe it serves the same purpose.
Thread Thread
 
aderchox profile image
aderchox • Edited

I found the answer to my third point too, I'll add here for future readers:
Based on the documentation, createWriteStream returns an instance of <fs.WriteStream> that has an 'open' event which is emitted when the <fs.WriteStream>'s file is opened: nodejs.org/api/fs.html#event-open_1
(btw, this is weird, I did the exact same thing as you and passed the data leaving form data out, but still not working for me, but thanks anyways).

Thread Thread
 
satyanishanth profile image
satya nishanth

Just to be on the same page. In many upload file kinda websites they preview the input . I believe that has load into ram to preview right? ( I mean if the preview is enabled)

Thread Thread
 
tqbit profile image
tq-bit

Yes. Images are always loaded into memory when rendering a page. Instead of instantly uploading the img and providing a link, some pages store images as base64 on the user's computer and permit uploading only after a confirmation.

Collapse
 
tqbit profile image
tq-bit • Edited

In case you guys who commented here are still with me - I've written another article on Node.js fundamentals. It somewhat builds up on streams. I intend to write more under the series 'Node.js fundamentals'. Again, thank you so much for your feedback, I really appreciate it.

Check out how to implement Server-Sent Events with Node here: dev.to/tqbit/how-to-use-nodejs-for...

Collapse
 
krankj profile image
Sudarshan K J

Thanks for the article, it is well written!

Collapse
 
longbotton_dev profile image
YourDreamGuy #EndSwat ✊🏽

This is amazing
Please keep making more core node js content