So you want to handle large files. Perhaps you have a web app that accepts file uploads but you want to be able to let users upload several gigabytes but don't want to choke the server. What's to be done?
Streams to the rescue!
You might well be aware of Node's file streaming capabilities, I was not until I had to allow an unknown file size limit. I wanted to use the fantastic HTTP library got to stream a file from a remote server as specified by an API call. What was the best way to handle this? Well, if you're using Node and Express, it's a combination of node:fs
createWriteStream
and Multer. But for general purposes, the write stream will do just fine.
Streams in Node.js
In Node, there are a few types of streams. Below I have a simple example for two, Readable
and Writeable
.
import { createWriteStream, createReadStream, pipeline, Readable } from 'node:fs';
const writer = createWriteStream( 'path/to/your/new/large.file' );
const reader = createReadStream( 'path/to/your/saved/large.file' );
reader.on( 'data', ( chunk ) => {
console.log( chunk.toString() );
} );
writer.on( 'finish', () {
console.log( 'Finished writing a large file!' );
} );
reader.on( 'finish', () => {
console.log( 'Finished reading' );
} );
reader.pipe( writer );
// Or, alternatively
pipeline(
reader,
writer
);
// Explicilty close the reader and writer
reader.close();
writer.close();
There are two other types of stream called Duplex and Transform. These are streams that function both as a readable and writable stream, the difference being that Transform streams, you guessed it, transform the data passed to it. They can take in data in the form of a file or a Buffer like the readable stream but they can also output data like a writeable stream. This can be useful for things like data manipulation (i.e. encrypting files). Here's how you'd do it with a readStream
and a writeStream
individually.
import { createReadStream, createWriteStream } from 'node:fs';
import crypto from 'crypto';
const infile = createReadStream( 'path/to/in/file' );
const outfile = createWriteStream( 'path/to/out/file' );
const secretKey = process.env.SECRET_KEY;
const iv = crypto.randomBytes( 16 );
const encrypt = crypto.createCipher( 'aes-256-cbc', Buffer.from( secretKey ), iv );
const size = fs.statSync( 'path/to/in/file' ).size;
infile.on( 'data', ( data ) => {
const percentage = parseInt( infile.bytesRead ) / parseInt( size ) );
console.log( `${percentage * 100}%` );
const encrypted = encrypt.update( data );
if ( encrypted ) {
console.log( encrypted );
outfile.write( encrypted );
}
} );
infile.on( 'close', () => {
outfile.write( encrypt.final() );
outfile.close();
} );
And now to use an actual transform stream.
import { Transform, TransformCallback } from 'stream';
export class Base64DecodeStream extends Transform {
constructor() {
super( { decodeStrings: false } );
this.extra = '';
}
_transform( chunk: Buffer | string, encoding: BufferEncoding, cb: TransformCallback ) {
let c = `${chunk}`;
c = this.extra + c.replace( /(\r\n|\n|\r)gm, '' );
const remaining = c.length % 4;
this.extra = c.slice( c.length - remaining );
c = c.slice( 0, c.length - remaining );
const buf = Buffer.from( c, 'base64' );
this.push( buf );
cb();
}
_flush( cb: TransformCallback ) {
if ( this.extra.length {
this.push( Buffer.from( this.extra, 'base64' );
};
cb();
}
}
And you can use this above in the same manner as the other pipes. More specifically, since it is a Transform, it can be used between streams such as a response from an Axios request like:
response.pipe( new Base64DecodeStream() ).pipe( decipher ).pipe( stream );
Using streams in an Express app
Now for something real-world that you can use. A simple Multer middleware for saving form-data files without loading them entirely into memory.
// Node Modules
import { randomUUID } from 'crypto';
import { Request } from 'express';
import multer, { StorageEngine } from 'multer';
import { Writable } from 'stream';
import { createWriteStream, existsSync, mkdirSync } from 'fs';
import { log } from '../common/winston.js';
/**
* Extended Multer storage class to save a file from a stream.
*
* The idea is that this allows for large file uploads to the API without strangling the system.
*/
export class MulterStorage implements StorageEngine {
mediaRoot: string;
destination: string;
/**
* Constructor method.
* @param {string} mediaRoot
*/
constructor( mediaRoot: string ) {
this.mediaRoot = mediaRoot;
this.destination = mediaRoot;
}
/**
* Used to send the request filename to a callback function.
* @param {Request} req
* @param {Express.Multer.File} file
* @param {function} cb
*/
// eslint-disable-next-line class-methods-use-this
filename(
req: Request,
file: Express.Multer.File,
cb: ( error?: Error | string | null, filename?: string ) => void,
) {
cb( null, file.originalname );
}
/**
* Handle the file upload
* @param {Request} req - The request object
* @param {Express.Multer.File} file - The uploaded file
* @param {function} cb
*/
_handleFile(
req: Request,
file: Express.Multer.File,
cb: ( error?: Error | string | null | unknown, info?: Partial<Express.Multer.File> ) => void,
) {
const uuid = randomUUID();
// Begin handling file
try {
if ( !existsSync( `${this.destination}/${uuid}` ) ) {
mkdirSync( `${this.destination}/${uuid}`, { recursive: true } );
}
const intermediateFile: Partial<Express.Multer.File> = {
path: `${this.destination}/${uuid}/${file.originalname.replace( /\s/g, '_' )}`,
};
const finalFile: Partial<Express.Multer.File> = { ...file, ...intermediateFile };
const writeStream: Writable = createWriteStream( `${finalFile.path}` );
const fileReadStream = file.stream;
fileReadStream
.pipe( writeStream )
.on( 'finish', () => {
cb( null, finalFile );
} )
.on( 'error', ( e ) => {
writeStream.end();
cb( e );
} );
} catch ( e ) {
if ( typeof e === 'string' ) {
log( 'error', e.toUpperCase() );
} else if ( e instanceof Error ) {
log( 'error', e.message );
}
cb( e );
}
}
/**
* Method to remove a file.
* @param {Request} req - the request object
* @param {Express.Multer.File} file - The specified file to remove
* @param {function} callback
*/
// eslint-disable-next-line class-methods-use-this
_removeFile(
req: Request,
file: Express.Multer.File,
callback: ( error: ( Error | null ) ) => void,
): void {
}
}
export const upload = multer( { storage: new MulterStorage( '/media' ) } );
Then on your route, you can do something like this:
import {Request, Response, Router, NextFunction, express} from 'express';
import {upload} from '../../../middleware/multer-storage.js';
const api = Router();
api.post(
'/api/:version/',
upload.single('file'),
// This is an example of how to upload several different files with different names
//
// upload.fields([{
// name: 'video', maxCount: 1
// }, {
// name: 'subtitles', maxCount: 1
// }])
async (req: Request, res: Response, next: NextFunction) => {
const file = req.file;
const data = req.body as EncodeBody;
// Note the ? after file. Since the file should
// exist if saved correctly, but might not, we should
// check or allow nullable in the `process` function
const resp = await process(data, file?.path);
return next();
},
);
const app = express();
app.use( '', api );
The string passed to single
will be the name of the file on the Request
object. For single
there will be just one but you can add multiple and specify more information (like a maxCount
) by using the fields
method as can be seen above.
Conclusion
Using all of the above combined you can safely store, access and download large files without breaking your memory limit.
Top comments (0)