Frederik 👨‍💻➡️🌐 Creemers

Posted on Feb 17, 2019

Quick Tip: Transform an Array into an Object using .reduce()

#javascript #beginners #functional

I spare you the time of reading some long boring intro, here's the meat of the article:

Let's say you have an array like this:

[
    {id: 1, category: "frontend", title: "All About That Sass"},
    {id: 2, category: "backend", title: "Beam me up, Scotty: Apache Beam tips"},
    {id: 3, category: "frontend", title: "Sanitizing HTML: Going antibactirial on XSS attacks"}
]

And you'd like to get an object with categories as keys, mapping to the article id's with that category, like this:

{
    frontend: [1, 3],
    backend: [2]
}

You can use our friend Array.prototype.reduce for this.

const posts = [
    {id: 1, category: "frontend", title: "All About That Sass"},
    {id: 2, category: "backend", title: "Beam me up, Scotty: Apache Beam tips"},
    {id: 3, category: "frontend", title: "Sanitizing HTML: Going antibactirial on XSS attacks"}
];

const categoryPosts = posts.reduce((acc, post) => {
    let {id, category} = post;
    return {...acc, [category]: [...(acc[category] || []), id]};
}, {});

Alright, let's see how this works.

I think of reduce as if it turns my array into a pipeline. This pipeline takes some initial value, and applies each value in my array as a separate step, and returns the new value. This value that is passed from step to step is often called the accumulator, because it accumulates changes as it goes through the pipeline. The initial value for the accumulator is passed as the second argument to reduce. In this case, it's an empty object. So how are the elements of our array applied to the accumulator? That depends on the function you give to reduce as the first argument. Whatever you return from that function, is used as the new value for the accumulator.

(acc, post) => {
    let {id, category} = post;
    return {...acc, [category]: [...(acc[category] || [])], id]};
}

This function takes the accumulator as its first argument, and an element from the array as its second. The first line extracts the post's category and id into their own variables using object destructuring. This is just to give us nice short variable names to work with, making the next line a little bit neater.

return {...acc, [category]: [...(acc[category] || [])], id]};

I used a lot of ES6 syntax in here that not everyone might be familiar win, so let's dig in.

return {...acc}

If we were to just return this, we'd just return the initial value of the accumulator, because this ... in front of it is called spread. In an object literal, it takes all the properties and values of the given object, and puts them in the newly created object. So all the line above does, is take theproperties our accumulator has, and put them into the object we return.

return {...acc, [category]: [...(acc[category] || [])], id]};

The next thing you'll probably notice is this [category]: syntax. It's a computed property name. The idea is that you can define a property in an object literal without knowing the property name in advance. In the line above, the property name is whatever the category is.

We want this property to eventually contain an array with all the ids of posts that have this category, so let's have a look at the value we're giving this property:

[...(acc[category] || [])], id]}

Here we have that spread syntax again, but this time in an Array literal. Similar to the object spread syntax, this takes all the values from the array it is given, and acts as if they were written inside this array literal, inserting them at that position in the newly created array.

This gives us quite neat way of defining an array that is just some other array with one or more values appended to it.

const a = [1, 2, 3];
const b = [...a, 4]; // b = [1, 2, 3, 4]

So in our posts example, we'd like to append the post's id to whatever id's our accumulator already has, so we'd just write:

[...acc[category], id]}

But what if our accumulator doesn't have any posts for that category yet? (Which will be true at the start for all categories) Then acc[category] would be undefined, and the spread syntax only works on iterable values like Arrays, so we'd get a TypeError.

[...(acc[category] || [])], id]}

So instead, we take the expression acc[category] || [], (enclosed in braces so the spread syntax applies to the entire thing. The || operator returns the second value in case the first one is falsy (which undefined is), so if our accumulator doesn't have any posts with the given category, we'll just spread the empty array, resulting in no values being added before our new id.

So, let's put it all together:

const posts = [
    {id: 1, category: "frontend", title: "All About That Sass"},
    {id: 2, category: "backend", title: "Beam me up, Scotty: Apache Beam tips"},
    {id: 3, category: "frontend", title: "Sanitizing HTML: Going antibactirial on XSS attacks"}
];

const categoryPosts = posts.reduce((acc, post) => {
    let {id, category} = post;
    return {...acc, [category]: [...(acc[category] || []), id]};
}, {});

Calling reduce on the posts array with an empty object as initial accumulator, for each post we:

extract the post id and category
take all the existing properties of the accumulator, and apply the returned value.
If the accumulator already has an array of id's for the post's category, we append the post's id to it. Otherwise, we create a new empty array for that category, and add our post's id.
The value we return from the function passed to redused is used as the accumulator for the next post in the array, and returned from reduce after all posts have been processed.

Feedback 💬

This is my first somewhat beginner-oriented post, and I'd love any constructive feedback you have!. I feel like I overcomplicated this post by using so much ES6 syntax, and then felt the need to explain it all. I think I should have kept that to a minimum, and stuck to the core concept of using reduce. I might still write a more focussed version of this post, but for now, this is what I have.

This way of using reduce is probably also incredibly obvious to people who have a good understanding of functional programming. But most of my programming life has been spent writing procedural and object oriented code. Map felt intuitive to me quite quickly, but I'm still having little light bulb moments with all the ways I can use reduce.

Top comments (19)

Basti Ortiz • Feb 18 '19

Honestly, I really don't like naming the first parameter of the callback function as acc. It's very ambiguous, at least for me. I prefer to name it prev because it's much clearer to me that the prev contains the result of the previous iteration (or the initialized value if it is the first iteration).

Harry Dennen • Feb 19 '19

That makes sense for .map(), but for .reduce() the previous value is also the accumulated value which will eventually be returned. Making that distinction in the naming convention is a nice visual cue imo.

Mihail Malo • Feb 18 '19

That's the imperative name for it though, no?
You're not supposed to know that iteration is taking place, just that the values are being absorbed into the accumulator :v

Basti Ortiz • Feb 18 '19

That is indeed the true and only technical name for it, but semantically speaking, I prefer naming it prev. To each its own, I suppose.

Mihail Malo • Feb 18 '19 • Edited

Yeah, I'm totally just being a smartass.
Many, most even, reducers we write are not commutative.
This one could be parallelizable, actually, if we add a merging function, but still we start with an empty object for acc/prev, not one of the items.

@_bigblind You've seen people write something like

const allStr = strings.reduce((acc, next)=>acc+next, '')
// instead of
const allStr = strings.reduce((acc, next)=>acc+next)

right?

Basti Ortiz • Feb 18 '19

Excuse my lack of knowledge on the subject, but what does it mean for a reducer to be "commutative" and "parallelizable"? And what do you mean by "merging function"?

Frederik 👨‍💻➡️🌐 Creemers • Feb 18 '19

Oh, now I understand your point about not thinking about the fact that iteration is being used! If they're not done in parallel, you don't get the previous value :).

Mihail Malo • Feb 18 '19 • Edited

If we don't care about the order of the incoming ids, and just want to get the sets of ids of each article, we could split the counting between multiple threads or even machines.
Something like this silly thing:

const posts = [
  { id: 0, category: "fairy tales", title: "Gommunist Manifesto" },
  { id: 1, category: "frontend", title: "All About That Sass" },
  { id: 2, category: "backend", title: "Beam me up, Scotty: Apache Beam tips" },
  { id: 3, category: "frontend", title: "Sanitizing HTML: Going antibacterial on XSS attacks" },
  { id: 4, category: "frontend", title: "All About That Sass" },
  { id: 5, category: "backend", title: "Beam me up, Scotty: Apache Beam tips" },
  { id: 6, category: "frontend", title: "Sanitizing HTML: Going antibacterial on XSS attacks" },
  { id: 7, category: "frontend", title: "All About That Sass" },
  { id: 8, category: "backend", title: "Beam me up, Scotty: Apache Beam tips" },
  { id: 9, category: "frontend", title: "Sanitizing HTML: Going antibacterial on XSS attacks" }
]

const idsByCategory = posts => {
  const categories = new Map()
  for (const { category, id } of posts) {
    const existing = categories.get(category)
    if (!existing) categories.set(category, [id])
    else existing.push(id)
  }
  return categories
}

const mergingFunction = ([result, ...results]) => {
  for (const other of results)
    for (const [category, ids] of other) {
      const existing = result.get(category)
      if (!existing) result.set(category, ids)
      else existing.push(...ids)
    }
  return result
}

const parallel = posts => {
  const { length } = posts
  const results = []
  for (let i = 0; i < length; i += 2)
    results.push(idsByCategory(posts.slice(i, i + 2)))
  return results
}

const results = parallel(posts)
console.log(results)
const categoryPosts = mergingFunction(results)
console.log(categoryPosts)

And by "commutative" I mean that if you pushed an array into a number you'd get an error, and that 'a'+'b' and 'b'+'a' gives you different strings.
Whereas integer addition without overflow is commutative: 1+2 gives the same result as 2+1 and const s = new Set; s.add(1); s.add(2) as well.

Basti Ortiz • Feb 18 '19

Oh, wow. You're right about calling it "silly". 😂

Mihail Malo • Feb 18 '19

It's silly in the sense we have only 10 items instead of billions, they are in memory at once, and it doesn't actually spawn threads or workers.

Jason Rice • Sep 1 '22

This is really inefficient because of the nested spread operator in the reduce function. Here's some more info: prateeksurana.me/blog/why-using-ob...

Frederik 👨‍💻➡️🌐 Creemers • Sep 8 '22

Hey, thanks for your comment, and great post! I had no idea that the spread operator was O(n) in terms of number of properties, though that totally makes sense. I wonder if some JS engines would optimize this code into a mutation if they could somehow make sure that the value before mutation is never accessed, but of course, we shouldn't rely on JS engine optimizations to fix our bad JS habits :).

Mihail Malo • Feb 18 '19

That's a really cool fully immutable solution.
I understand it, but I would personally avoid it in JS, if not for readability by juniors then at least because I have an addiction to micro optimization.

const idsByCategory = posts => {
  const categories = new Map() // I prefer Map
  for (const { category, id } of posts) {
    const existing = categories.get(category)
    if (!existing) categories.set(category, [id])
    else existing.push(id)
  }
  return categories
}


// This is what Prettier does to it
const categoryPosts = posts.reduce(
  (acc, { id, category }) => ({
    ...acc,
    [category]: [...(acc[category] || []), id]
  }),
  {}
)

// Yes, this is short.
const categoryPosts = posts.reduce((acc, { id, category }) => ({...acc, [category]: [...(acc[category] || []), id]}), {})