I'm venkatesh. I have been working as a web developer for quite some time. This is a simple explanation of a specific case of reduce that I have learnt in practice.
I am a big fan of Array.reduce. I was a java developer for quite some time and later I started to learn javascript due to new project requirement. I was little familiar with Java Collections, but was not a good one. Since, I didn't understand the lambda functions(java's version of arrow functions) well I couldn't get what map/filter/reduce meant. I have read almost every available article to understand the difference. Finally, reduce came to the rescue via a wonderful article which was something like implement your own map/filter using reduce
. I read that article, found it super crazy.
It was like a boon for me. I started using reduce
extensively everytime I had to do any map/filter filter. I loved it due to the control it offered me. People thought I was crazy for using reduce everywhere, which was obvious. This was my simplest implementation I remember for doubling a number array and filtering even numbers using reduce.
const nums = [1, 2, 3, 4, 5, 6];
// Double the nums array
const numsDoubled = nums.reduce((acc, num) => {
const temp = [...acc]; // I used Array.from(acc) at that time though
temp.push(num * 2);
return temp;
}, []);
// find the even numbers
const evenNums = nums.reduce((acc, num) => {
const temp = [...acc];
if (num % 2 === 0) temp.push(num); // Didn't know 0 was falsy back then
return temp;
}, []);
Being me at that time, I loved it like anything. Slowly, I understood what map and filter were and what they are supposed to do. I thought, "finally I will use the things for the right reason".
The Problem
This was all the history of how I came to the problem. So, now coming to the actual problem I faced, I have received a CSV file from a client which had somewhere around 70k-90k rows with around 30+ columns. I had to do some calculations, do few conditional checks and pick out few important fields. So, I started using my favourite reduce again.
function extractData(records) {
return records.reduce((acc, record) => {
const { id, ...rest } = record;
const others = computeRestFields(rest); // some mapping function
const temp = { ...acc };
temp[id] = others;
return temp;
}, {});
}
const file = fs.readFileSync("client-feed.csv");
const parsedData = csvParse(file); // returns an array of data
extractData(parsedData);
I have tested this for some 100 rows, satisfied that it works as expected and pushed it to a serverless function. However, I noticed that it was getting out of memory issue. Then, I started debugging to realize that my code was too memory intensive. So, started to look for alternatives.
Alternative 1:
function extractData(records) {
return records
.map(record => {
const others = computeRestFields(rest);
return { id, others };
})
.reduce((acc, record) => {
const t = { ...acc };
const { id, others } = record;
t[id] = others;
return t;
});
}
My first thought was to change it to map and then reduce, instead of reducing all at once. After some digging around, I thought the number of spread operators could be actually hurting the memory limits. This is because, I am creating a new object with thousands of keys in every iteration. So, I tried to split it to map and then reduce later as shown in alternative 1 above. As expected, it didn't work as the upper limit for memory of my serverless provider was 2GB. I was forced to try another approach.
I have tried to make it more functional by using lodash for increasing the number of operations by making it multiple operations each of small foot print(at least what I thought at that time). But, non of those worked out. So, I thought of alternatives and thought to give a final try to the traditional for loop. As a result is Alternative 2.
Alternative 2:
function extractData(records) {
const recordsCount = records.length;
const result = {};
for (let i = 0; i < recordsCount; i += 1) {
const { id, ...rest } = record;
result[id] = computeRestFields(rest); // some mapping function
}
return result;
}
As the code is pretty self explanatory, I am just plucking out the id and then I am pushing it on to an object, which is a key value pair. To my surprise it actually worked. I was completely lost at the result. I started analyzing what could be the difference between the two.
The Result
I am creating a new object every time when I was using reduce ,i.e., for every record I was creating a new object of the same size and adding a new value to the object. It was increasing the number of values, that have to be stored on the memory everytime the iteration runs. So, the exact culprit was not just the reduce function, which I had to blame when I wrote the first solution.
Later on I have understood that the main culprit was (me obviously! 😁) the combination of reduce and spread. Readers may have a doubt as to why does the accumulator get spread every time? The reason was I was a big fan of eslint back then and it told me editing the parameter was bad. Even though I like eslint even now, I am now more of a look if it is needed now guy. I have come to know that reduce/map/filter all are achievable with just a plain for loop(which I was accustomed to before with conditonal blocks). However, everything was there for a specific purpose and using it to things, which it is not intended to causes problems.
That is why I would recommend learning of semantic meaning of keywords when we are using something frequently. I mentioned the word frequently
intentionally because, I don't think it's worthwhile digging into things which we use once in a decade. Hope you had something to takeaway from this article.
Please do correct me in case of any wrong assumptions.
Cheers
Top comments (8)
Your problem is that you were using reduce to push all the results in the same object.
To collect you can use
Object.assign
or
Object.fromEntries
I noticed this as well.
Reduce this many fields at the same object will blow your memory
Exactly! I've seen it going crazy high
Yeah, I realized that soon after. Thanks
Restrictions and size is what makes us better developers 😉
💯
maybe a transducer could work better for you here and specifically for this type of problem
see this video about transducer in Javascript
that would be the exact name "transducer"