DEV Community

Cover image for Lists: do you know the nature of yours? The strange story of a data container in Java
Stefano Fago
Stefano Fago

Posted on • Originally published at linkedin.com

Lists: do you know the nature of yours? The strange story of a data container in Java

Foreword

Lists are among the most used data structures in various programs but, in Java, the choices are limited even if the evolution of the language has led to new elements. To date, the main type of list used is the ArrayList. Other types of lists would also be present (such as SkipLists) but they have been used for different purposes and, given their nature, to effectively build other types of data structures. Still others such as elements deriving from the Dynamic Arrays are hidden in the internals of the JDK!

How many lists we can find in Java?

The concept of List is part of the Java Collection Framework (JCF) and starts from the List interface that is a direct extension of two concepts: Collection and Iterable (so a provider of external Iterator but also an entry point to stream).

We can briefly summarize JCF with the following image:

Java Collection Framework structure

We have public classes that express the main list data structure available in Java:

The most used list in Java is ArrayList which makes it possible to express the concept of the list based on an array implementation: this realization is more effective than the LinkedList as stated in different tests ( caused also by CPU and Memory mechanics)
An ArrayList makes it possible to have random access to the element of the list since it's also an implementation of interface RandomAccess; this implies that having a reference to a List, we can test if we can use the get([index]) method or not. ( e.g. ArrayList vs LinkedList)

private static void ifNullThrows(Object data,int idx)
{
  if (data == null) {
      throw new IllegalArgumentException(
         "Collection contains null values "
         +(idx==-1?"":" at idx: "+idx));
  }  
}

public static final void checkNullElements(
                           Collection<?> collection)
{
    if (collection instanceof RandomAccess && 
        collection instanceof List) {
        final List<?> list = (List<?>)collection;
        for (int i = 0,
             size = list.size(); 
             i < size; 
             ifNullThrows(list.get(i),i),
             i++);
        return;  
    } 
    for (final Object element : collection) {
        ifNullThrows(element,-1);
    }
}
Enter fullscreen mode Exit fullscreen mode

In how many ways You can build a List?

At the moment I'm writing, if we talk about the List interface and we don't consider public classes, we can obtain an instance using different invocations as follows:

  • Arrays.asList
  • Collections.unmodifiableList
  • Collections.nCopies
  • Collections.emptyList / Collections.EMPTY_LIST
  • Stream.toList
  • Collectors.toUnmodifiableList
  • List.of

Once You have a reference List to manage how can You say what characteristics are owned?
Or more simply how can You say if the List can mutate or not? And How? And...
When designing software, what kind of list You'll return, and how will be created?

Why is it a Problem?

The first problem is at the level of Type System, given that a situation more correct would allow us to distinguish through the Collection Type which abstraction we are operating with, species if definable as mutable or immutable.
The JCF was born at a time when great care was taken to offer immediate operational data structures, and with attention to performance, but with less attention to constructs or uses that are now seen as common.
These concepts have been taken up by other infrastructures from which we certainly cannot fail to mention: Eclipse Collection, Guava Collections, and VAVR.

A second aspect arises from the fact that immutable data structures generally support a fluent API; this style of API allows for better development in the presence of functional programming constructs, as in the following example:

static List<String> flattenListsWithMilestone(
                                 List<List<String>> data,
                                 String milestone)
{
  return data.stream()
             .map(list->
              { 
               list.add(milestone);
               return list;
              })
         .flatMap(List::stream)
         .collect(toList());     
}
Enter fullscreen mode Exit fullscreen mode

Even if this code is a bit contrived, it demonstrates how, without fluent API, as for Scala, we are forced to use lambda with a body, a style which is generally not a good practice and which does not allow us to make the code more readable thanks, also, to the possible use of a reference method. ( Oh Yes, You can however use a private commodity method as a surrogate!)

Another problem is that the differentiation of data structures allows the implementation of specific design and optimization strategies for the given structure. This aspect is evident if we consider the new lists that derive from the List.of factory methods. The JCF allows the creation of any new data structure, in our case a list, but always of a mutable nature rather than a general vision.

Mutable, Immutable, Not Modifiable List

It's easy to find out that mutable lists are ArrayList, LinkedList, and with specific concurrency support, Vector, Stack, and CopyOnWriteArrayList but what can say of the other kind of List<T>?
As a preamble about Immutability, we need to accept that in Java we have unmodifiable lists more than properly immutable ones, and than things can be difficult to change. So let's explore the ways of creating List with the different invocations!

Arrays.asList returns a fixed-size list backed by the specified array. The returned list is serializable and implements RandomAccess. Changes to the returned list write through to the array. It seems unmodifiable but, also if You can't use add method, You can use List.set changing element at the given index. The inner class used is called ArrayList but is not the public one as the following signature:

private static class ArrayList<E> extends AbstractList<E> implements RandomAccess, java.io.Serializable
Enter fullscreen mode Exit fullscreen mode

We also need to notice that the generic signature of asList create problems/ambiguities with arrays of primitive types, like for int[] that can't be passed to Arrays.asList to obtain a List<Integer>. In the future, the support of primitive in generics will change more things but today a safer way to bypass the problem is:

int[] numbers = new int[]{1, 2, 3, 4, 5};
   List<Integer> numberList = 
        Arrays.stream(numbers)
              .boxed()
              .toList();
Enter fullscreen mode Exit fullscreen mode

but can be also useful to use Collectors.toCollection([Supplier]) since base implementation of Stream.toList:

default List<T> toList(){ 
        return (List<T>) Collections.unmodifiableList(
                         new ArrayList<>(
                         Arrays.asList(this.toArray()))); }
Enter fullscreen mode Exit fullscreen mode

Collections.emptyList and Collections.nCopies are commodities methods that return specialized unmodifiable lists. All modification methods will throw Exceptions. Collections.nCopies is a lazy list where only the original element is used and repeated only in the case of toArray methods (logically the copy of provided elements in construction is a reference copy)
Collections.unmodifiableList is similar to the other methods, returning a specialized unmodifiable list; a decoration approach is used by design and all modification methods will throw Exceptions.

public static <T> List<T> unmodifiableList(
                               List<? extends T> list)
{ 
    return (list instanceof RandomAccess ?
             new UnmodifiableRandomAccessList<>(list) :
             new UnmodifiableList<>(list));
}
Enter fullscreen mode Exit fullscreen mode

Stream.toList create unmodifiable list, also If Javadoc ask to not make any assumption on the nature of the list, and remember that a better control on the result is in using Collectors.toCollection([supplier])

List.of represent a set of static method that uses package class ImmutableCollections.java: this class hosts different inner implementation returning fixed-size list backed by a specified array. All modification methods will throw Exceptions. Some aspects need to be considered or reiterated:

  • wrapping class as Arrays.asList, Collections.unmodifiableList can be modified by altering the original collection or array used in construction
  • It is possible to obtain an empty list in many ways while actually obtaining different types of lists by:
Arrays.asList()
Collections.emptyList()
Collections.EMPTY_LIST
Collections.nCopies(0, [type])
List.of()
Enter fullscreen mode Exit fullscreen mode

On this last topic, the singletons Collections.emptyList() and Collections.EMPTY_LIST are conceptually aliases that allow a performance gain in returning an empty list but it should be considered whether it is always correct to use these elements rather than returning a new empty list immutable or not, or create, in a specific namespace, another singleton based on one of the above invocations.

...but what if you wanted to recognize the nature of a list?

How can I recognize if a list is mutable or not modifiable? Let's say that from a pure design point of view it shouldn't interest us because by playing on typing it should be possible to act differently but this is not the situation. As a pure exercise let's try to see some solutions.

A first approach can be very simple and take advantage of the basic knowledge we have:

public static final <T> boolean isMutable(List<T> subject)
{
      return(subject instanceof ArrayList  || 
         subject instanceof LinkedList || 
         subject instanceof Vector     || 
         subject instanceof CopyOnWriteArrayList);
}
Enter fullscreen mode Exit fullscreen mode

A second solution, a little more elaborate, could be to memorize the reference classes of the various possible lists and take advantage of the instance-of operator:

static HashSet<Class<? extends List>> set = null;

static
{
   set = new HashSet<>();   
   set.add(List.of("", "").getClass());
   set.add(List.of("", "","").getClass());
   set.add(Collections.emptyList().getClass());
   set.add(Collections.nCopies(0, "").getClass());
   set.add(Collections.unmodifiableList(
                           new ArrayList<>()).getClass());
   set.add(Arrays.asList().getClass());
}

public static <T> boolean isMutable(List<T> subject)
{
  return !set
    .stream()
    .anyMatch(clazz -> subject.getClass()
                              .isAssignableFrom(clazz)); 
}
Enter fullscreen mode Exit fullscreen mode

Note: ...what will happen when having Structural Pattern Matching? We can derive some solutions?

A third solution takes advantage of the fact that the main unmodifiable classes are actually inner classes unlike the mutable ones which are top classes; using this knowledge we can write the following code:

public static <T> boolean isMutable(List<T> subject)
{
  return subject==null?false:
         subject.getClass().getEnclosingClass()==null;
}
Enter fullscreen mode Exit fullscreen mode

An interesting proposition, to make things compatible with the actual List interface, comes from Java Guru Heinz Kabutz: adding default method characteristics as for Spliterator, it would be possible to understand the nature of the given List implementation.

What are my dreams about Lists?

The Java Collection Framework has a high level of extensibility but its design is based on hierarchically defined relationships with interfaces, which define roles of data structures, and abstract classes that act as extension hot spots. In the near future, it will be further modified to meet the needs of well-established use cases.

My personal view is to have a different packaging of data structures to allow for better isolation of them since not all of them are used in all applications, even with the current JCF design.

In relation to the lists it would be nice to have:

to complete the set of array-based dynamic lists, so as to improve performance in the light of mechanical sympathy or optimization of memory use.

It would be useful to have self-organizing lists to approximate use cases such as local caches and allow some sorting of data, based on customizable logic. This in conjunction with use with other data structures would be a way to expand the JCF power and modeling possibilities without resorting to many infrastructures outside the JDK.

Immutable lists should probably appear in a dedicated package and it should be really nice to have persistent data structures to have: the advantages of immutability, a set of fluent APIs, the optimizations in the use of the memory, normally present for this type of data structures.

Conclusions

There are no perfect solutions but perfectible solutions are possible. When choosing a data structure it is necessary to take a moment to reflect on the choice made. In the case of lists, most part of the use cases are covered by what is offered by the JDK.

Logically in the ecosystem of open source projects, there are good frameworks that make it possible to further meet the needs of the day by day (both OO and FP).

What can we learn from what we examined?

  • To date, we cannot tell if a list is idiomatically mutable or not.
  • To date, in the design of an API we need to understand when it is possible to reduce the abstraction level or widen it using Iterable or list implementation types, to avoid creating confusion with the only concept of List (however in many cases unavoidable)
  • Always use Javadoc to be able to describe the type of lists actually returned or that you want to manage in the various functional elements
  • We can certainly say that Arrays.asList as well as the lists deriving from the Collections class must be used in testing contexts and only as commodities elements and, therefore, their use-cases seem to see them hidden in internal portions of higher-level implementations or realizations.
  • It is possible to use more structured or effective frameworks, for performance or definition of types, present in the open source world, taking care not to leak anything of the specific framework on the boundaries of a system or in any case on the interfacing elements of the realizations: this to avoid to find our-self with problems of dependencies and inconvenient or "sticky" maintenance
  • Currently, the lists deriving from List.of must be the ones to be used in case of unmodifiable needs (and check for the absence of nulls), to make fallback on the composition of lists in situations of aggregate fragments of lists from different points of the system.
  • Consider that the various types of lists are serializable: this possibility must further reflect on what we are offering to a third party. This must bring to reason when, whether or not to actually use an aggregate type instead of a generic data container.

Top comments (0)