loading...

Immutable Data Structures

martinhaeusler profile image Martin Häusler ・6 min read

Functional programming is currently on the rise due to its ability to prevent a lot of errors right out of the gate. The two corner stones of the functional paradigm are pure functions and immutable data structures. Today we will look at immutable collections - and why there's more to them than you might think.

Levels of Immutability

There are actually several different "levels" of immutability. It's difficult to keep them separated because the terminology is often used in the wrong way. I try to be as clear and consistent as possible.

Mutable Collections

Mutable collections are the ones we most commonly see in languages like Java, C# and JavaScript. They are usually built into the standard library. Their primary characteristic is that we can create an instance of those collections, and then afterwards modify it as we please.

List<String> list = new ArrayList<>();
list.add("Hello");
list.contains("Hello"); // -> true

Pros

  • Very high flexibility. There's no use case you cannot cover with mutable collections.
  • Fast insertion and modification.

Cons

  • Very often when you receive a collection as a method argument, you have to make a defensive copy. A full copy is O(n) in memory and time complexity.
  • If you do not create defensive copies, you might end up using a collection and sharing it with another piece of code. If either party makes a change, that change is visible to the other piece of code, potentially breaking it. Such bugs can be extremely subtle and hard to debug.
  • Mutable collections do not lend themselves well to concurrent programs. Concurrent modifications on shared mutable collections require proper external locking.

Use them for...

  • Local variables which are not shared.
  • Locally aggregating computation results.
  • Fields which get in touch with any reflection-based framework (most prominently, Object-Relational Mappers such as Hibernate).
  • Collections which are subject to frequent changes by a single thread.
  • As a fallback when all other types of collections don't work.

Unmodifiable Collections

Unmodifiable collections are merely views on underlying mutable collections. You cannot modify the view, but any modification on the underlying mutable collection will be visible in the view.

List<String> original = new ArrayList<>();
original.add("Test");
List<String> view = Collections.unmodifiableList(original);

// changing the view is forbidden
view.add("x"); // -> Runtime exception: the view is unmodifiable!

// reading from the view is fine
view.contains("Test"); // -> true

// changes in the original collection...
original.add("Foo");
// ... are visible in the view
view.contains("Foo"); // -> true

Pros

  • Useful if you have one object who "owns" the list, and merely lets others read from it.
  • If you create an unmodifiable view, and throw away all references to the original, you end up with a truly immutable collection (see below).

Cons

  • The interface of an unmodifiable view is the same as the original collection. Which means: view.add(...) will be valid according to the compiler. It will appear in your code completion. If you don't know that you are dealing with an unmodifiable view, you're in for a lot of runtime exceptions. Unmodifiable views can fool the user of your API into believing that they are working with mutable collections. Some languages, such as Kotlin, actually differentiate between a List and a MutableList interface for this very reason.
  • There is no efficient way to create a "changed copy" of an unmodifiable view which contains an additional change. The only way to do that is to copy the entire view content into a new (mutable) collection, and apply the change there. That's O(n) time and space complexity.
  • Massive Pitfall: If you receive a collection as an argument, and you are calling e.g. Collections.unmodifiableList on it, you are not safe from external modifications! Remember: you only create a view. The original collection you received may still be changed from the outside. This is therefore not a replacement for defensive copies!

Use them for...

  • Sharing your object state with the world in a read-only fashion. Don't forget to document in your API that the collections you return are unmodifiable views.

Persistent Collections

This category of (im)mutability is extremely important, but often overlooked or even mislabeled. Persistent collections have nothing to do with "persistence" as in "storing to disk". Persistent collections are completely immutable structures, except that they offer smart copy-transforms. The primary principle of a persistent data structure is:

If you modify the data structure, you get a new data structure which contains the old data, as well as your change.

Sounds familiar? That's exactly what ImmutableJS does. The name of this library itself is extremely misleading, because what it actually provides are persistent collections, which are much more useful than strictly immutable ones. For Java, there's for example the (aptly named) PCollections library:

PSet<String> original = HashTreePSet.empty(); 
PSet<String> mod1 = original.plus("Hello");
PSet<String> mod2 = mod1.plus("World");

// the original does not know what happened in mod1 and mod2
original.contains("Hello"); // -> false
original.contains("World"); // -> false
original.size(); // -> 0

// mod1 knows the original and itself, but does not know about mod2
mod1.contains("Hello"); // -> true
mod1.contains("World"); // -> false
mod1.size(); // -> 1

// mod2 knows original and mod1, as well as its own modification
mod2.contains("Hello"); // -> true
mod2.contains("World"); // -> true
mod2.size(); // -> 2

The easiest way to think about persistent collections is: make a change, get a copy that includes the change (copy being the operative word). You may think that this is extremely inefficient, as a copy would entail an O(n) cost in runtime and space complexity. While this naive "copy and change" approach would satisfy the interface, that is acutally not what happens in persistent collections. The trick is that all existing data doesn't change; therefore we are free to reuse it as often as we like. In the example above, mod2 will internally reuse the contents of mod1. There's no actual copying going on, but you will never realize it, because you cannot change mod1 anymore after creating mod2.

Pros

  • True, reliable immutability. No need for defensive copies, ever.
  • Still relatively easy to work with.
  • Safe under concurrent read/write, write/write concurrency can sometimes require additional synchronization.
  • All operations on persistent collections are pure functions. No side effects, no surprises.

Cons

  • Nesting of persistent collections can be tricky to pull off (hello, Redux!).
  • If you apply a large amount of modifications to a single persistent data structure, performance will suffer. It might be more efficient to create a mutable copy, apply all your modifications, and then create a persistent version of it (resulting in O(n) + O(n) space and time complexity).
  • If you are unaware that you are working with a persistent collection, you might forget to assign the return value of list.plus("test") to a variable, and then wonder why "test" is not in your list.
  • Some collection frameworks, such as PCollections, implement the regular (mutable) collection interfaces (e.g. PSet<T> extends Set<T>) for compatibility reasons. Thus, the persistent collections inherit the mutating methods, such as void add(E element) which throw runtime exceptions when called.

Use them for...

  • Representing shared state under read-write access.
  • Scenarios with a single producer and multiple concurrent consumers.
  • Scenarios where you have to compute multiple alternative solutions which are based on an initial solution.

(Truly) Immutable Collections

Truly immutable collections only offer read access through their API, and have to be filled at creation time. Afterwards, their contents are sealed and set in stone forever. You can think of truly immutable collections as "persistent collections without copy-on-write helpers".

To the best of my knowledge, there is no truly immutable collection interface in Java. In Kotlin, you can do this:

// "listOf" creates a truly immutable list in Kotlin.
val list = listOf("Test")
list.contains("Test") // -> true 
list.add("foo") // -> compile error: interface "List" has no method "add"

Pros

  • Easy to implement efficiently (all data has to be known in advance)
  • Iteration / analysis may be faster than with a persistent collection

Cons

  • There is no use case for truly immutable collections which would not work with persistent collections.
  • Very limited flexibility.

Use them for...

  • Defining constants.

Closing Thoughts

I hope you enjoyed this tour of (im)mutable collections. It is sometimes hard to properly communicate about this topic because the terminology is often misused. I hope that this blog post will help you in the future to distinguish the types of (im)mutable collection, their pros and cons, and when to use them.

Posted on by:

martinhaeusler profile

Martin Häusler

@martinhaeusler

Primarily a Software Engineer who feels at home on the JVM. OOP enthusiast, testing addict and Software Architect.

Discussion

markdown guide
 

Thanks! I'm not familiar with the apple world at all, but from your description it sounds like Kotlin does the same thing - there's Set<T> which is immutable, and MutableSet<T> which extends Set<T> and adds methods such as add(T) and remove(T). In general, I think this is good practice, it works really well in Kotlin and conveys a lot more information than in Java, where you only have one Set<T> interface, and it may or may not be modifiable for you.