Data mutation in functional JavaScript

#javascript #webdev #functional

Let's talk about immutability and mutability. The whole web is just full of articles referring how mutation is bad. Even I had some quite popular article about the benefits of immutability. Take a look here The State of Immutability, and sure thing is that the article you are reading now, is partially in contrary to my previous work. Let's ask the question - should we follow immutability always, is it a silver bullet, is this approach, in the language like JavaScript suitable in every situation. Looks like not, but let me try to explain, before you close this article and say - what an ignorant 😎

Let's recall what benefits does lack of mutation give:

predictability
safety
code trust
less complexity

And yes, this is true as hell. But when these benefits show their strength? In shared state of course. Still, what about the state which is hidden from the outside world. What with the local state used only for computation done inside? Has it any sense to force immutability there. Let's investigate more and think more deeply.

Data normalization as an example

Nothing will tell more than code. In the example I will show typical normalize function, the purpose of it, is to change collection data structure into key->value map structure, in order to have fast access to elements by key. In other words such key->value map enables us to have constant access time to every record in the map, no matter how big the map is. In contrary to standard collection, where time for accessing an element will have complexity n, where n is a size of collection. Normalizing is very typical for Redux kind of architecture, where also flat structure has an direct positive impact on performance and complexity.

As the purpose of normalization is to transform/reduce one data structure into another one, then it is straight forward to use Array.prototype.reduce here:



function normalize(data) {
    return data.reduce((result, record) => ({...result, [record.id]: record}), {});
}

Works like a charm. But we need to understand, that for each element inside the collection, this implementation is doing a shallow copy of the whole map created in previous iteration. It also means, that the complexity of this operation is n * (n-1) so we can simplify and present it as O(n²) , where n is a size of the collection.

Now, the version with local mutation.



function normalizeWithMutation(data) {
    return data.reduce((result, record) => {
      result[record.id] = record;
      return result;
  }, {});
}

In contrary to the previous example, the second has no shallow copy inside the predicate (function passed to reduce), it's just setting fields in newly created object. Thanks that, the complexity of the later is linear O(n).

Clearly the anonymous function passed to the reduce is not a pure one, it mutates the state given to it. So it breaks the immutability rule, but is it any flaw?

Pure outside, impure inside

Let's take a look at these two functions from outside, what is passed and what is returned.

From the function in/out there is no difference. Both functions are pure, so are referential transparent, in other words both functions for the same input return back the same output, no matter what time, system, and outside state is. Both functions are independent units, without any relation with the environment.

The conclusion then is that both functions are equal, and the inequality lays in the implementation. We can call it - implementation detail, it remains implementation detail until the function doesn't change any state outside.

Having said that, both functions remain pure.

Performance does matter

However I stated that these functions complexity is not the same, and it has the direct impact on the system performance. Yes, performance, and yes it matters. Matters more on the server, but still even if working on the front-end side, the difference should be understandable here.

I performed few simple performance tests of both implementations, the time of execution is changing dramatically, and it is directly proportional to the size of collection.

Below is the proportion how faster was the mutable version for given size of the collection.

100 elements - 2 times faster
1000 elements - 30 times faster
10 000 elements - 80 times faster
100 000 elements - 6000 times faster

And to be clear, as for first two cases there was no difference for me to spot, for the last two there was visible lag, for the last one, browser hang for 30 seconds.

The test has been performed on Chrome 76 version. The utility used to check execution time was window.performance

Significantly there is a difference, functions are not equal, the complexity difference is n to n², and it is also evident in the test. Yet, I cannot say that the first implementation should be considered just bad in an every case. No, as I always say there are no silver bullets and the best solutions for every problem (yes talking to you, best practices follower 😉 ). For small collections it is hard to spot that we did anything wrong, and small collections are typical in the front-end apps, it's a rare thing to work with collections bigger than 1000 rows. So no worries if such implementation exists in the code base. But if such exists in node.js then it should be really checked and considered as potential bottleneck. As node.js apps need to cover not a one user, but many. To put it another way, there is additional factor k, where k represents how many clients are currently processed by the app, so our real processing time should be expressed as n² * k . If one client blocks the IO, even for small amount of the time, then other clients cannot perform any action, because of the single threaded JavaScript run-time nature. The time of execution is a product of execution time of the algorithm and amount of connected clients.

Where the immutable version fits better.

Functional programmers like to combine bigger functions from smaller ones. We solve small problems, and compose these solutions into functions solving bigger problems. And yes, this is great! However in JavaScript it can have some pit falls. As JS has no tail-call optimization and no immutable data structures. Consider following code.



const appendKeyValue = (key) => (product, value) => ({...product, [value[key]]: value});  
const normalize = (data) => data.reduce(appendKeyValue('id'), {});

Code is created in functional style. The normalize function is created as a composition of reduce and appendKeyValue functions. As appendKeyValue remains generic and standalone function then it should be a pure one, to be pure one, it cannot modify the input or have any side effects, and it does not, it creates a copy every time.

Thanks to this feature, the appendKeyValue is just predictable utility function , which can be used for any transformation from any collection to the map.

Having said that, this implementation has the same complexity as the first one. So it has O(n²), sorry.

Provided that, I can state that everything based on copying will be just insufficient, what a discovery 😂, for places where high performance matter most (all places 😝). Don't cry functional programmer, as always there is a trade-off, and for most front-end operations probably fully functional compositions of pure functions will work fine, but as said already, we should know it flaws.

Be functional and know when mutation is allowed

How then still compose functions, but benefit from the mutation performance?

Mutate only local and not shared state
Create mutable/unsafe functions with clear descriptions of the risk

As for first, the example of local state mutation was presented before. The normalizeWithMutation is fully pure function, and naming it as just normalize would be fully acceptable:



// pure function with local mutation being only an implementation detail
function normalize(data) {
    return data.reduce((result, record) => {
      result[record.id] = record;
      return result;
  }, {});
}

Or, the second possibility - create re-usable mutating functions with proper naming:



const IMPURE_appendKeyValue = (key) => (product, value) => {
  product[value[key]] =  value
  return product;
};

and compose them:



const normalize = (data) => data.reduce(IMPURE_appendKeyValue('id'), {});

The former proposition, local mutation as implementation detail, should be used without any hesitation, as there is no difference and risks for functional control flow. Notably local mutation should still remain in our toolbox, as there is no loss here.

The later, mutable functions marked by special prefix, is also a nice idea which aims into preservation of the code re-use. The most important here is to explicitly say which function is impure, this explicit marking allow the caller to understand the impact.

All things considered, it is crucial to understand what we are doing, and recognize if the solution is enough for particular problem. Even if we consider ourselves as functional programmers, even then it is a good thing, to understand what flaws can have fully immutable implementation in language like JS. Maybe it is a good idea to mutate. However these mutations should be always controlled, the worst situation is to allow mutation to spread and share. To avoid that, I gave here two solutions - keep mutation only locally, never mutate what not belongs to the function, or clearly name impure units to keep them explicit. When we follow that, the code paradigm still remains functional, the default is purity and immutability, and where mutation is needed, this mutation causes no issues, because it stays local or is clearly defined and explicit.