This is the results of days of trial and error. I didn't have a clue about streams and what not, hence why it took so long :D
You need the following libraries:
Create form to upload CSV files - multipart/form-data
File upload needs to be done via multipart/form-data. This is something I got acquainted recently too, and probably will be the object of another post. For now, I will skip it.
multer
multer will grab the file and place it in req.file. Don't expect to find the file in req.body - that one will only contain the form field data that is text. 90% of the tutorials for multer out there explain how to save the incoming file in a directory. I could not care less about that because this will live on a server where I don't have write rights, so I want the file to live in memory.
const multer = require("multer");
const parseCsv = multer().single("whatever-name-you-gave-to-the-input-field-in-your-form");
module.exports = {parseCsv}
This is the middleware that will place the file in req.file
fast-csv and streamifier
req.file will have a buffer property, but it is not readable for node's createReadStream. If you try fs.createReadStream(buffer) you will most likely get an error saying that this is not a file, or something like that. While Node's createReadStream accepts an instance of a Buffer (and our buffer it is an instance), that instance is not readable by createReadStream. I learnt about it in in this SO answer. The solution I found? streamifier, which I first learnt about here. If you look at its source code, it does some magic to convert the buffer in req.fileinto a readable buffer that is passed into createReadStream. I was glad to have found this library.
So, you create the stream like so
const { buffer } = req.file;
streamifier.createReadStream(buffer)
@fast-csv/parse
@fast-csv/parse takes a stream with data from the csv and calls couple of events to parse the contents of the file. It calls .on('data', data => callback) for every row, so you can do whatever you want with it. Once all the rows have been parsed, it calls .on('end', rowCount => callback). There is an event .on('error', callback) which I suppose is related to their validation capabilities but I haven't tried it yet.
You can import fast-csv as csv and then you call .pipe(csv.parse()) (see example below). Also, you can pass options to csv.parse(), the ones I have used so far are headers: true (skips header line from the csv file, see docs here) and ignoreEmpty: true (ignores empty lines, see docs here)
My first iteration was to place the document creation at every row parsing. Mistake because of the async nature of saving data in a DB and the sync nature of parsing a CSV. I found myself with the 'end' event being triggered before the first document was saved, and that screw up my strategy and my server responses.
I did a bit of research, I found a strategy that works well: add the parsed row (which comes back as an object) into an array in memory, and you call Mongoose's Model.create([ARRAY_OF_OBJECTS]) on the 'end'event. You need to make that async and determine your server response to the client. Like so, it seems to work well for me:
const csv = require("@fast-csv/parse");
const streamifier = require("streamifier");
// somewhere below
router.post("/endpoint", [multerMiddlewareExplainedAbove], (req, res) => {
const { buffer } = req.file;
const dataFromRows = [];
streamifier
.createReadStream(buffer)
.pipe(csv.parse({ headers: true, ignoreEmpty: true })) // <== this is @fast-csv/parse!!
.on("data", (row) => {
dataFromRows .push(row);
})
.on("end", async (rowCount) => {
try {
const data = await MyModelName.create(dataFromRows );
res.status(200).json({ rowCount, data });
} catch (error) {
res.status(400).json({ error});
}
});
});
Hope it makes sense. I will be adding stuff as I discover stuff. Thanks for reading (:
Top comments (0)