loading...

De-throning the List: Part SC4K

robinheghan profile image Robin Heggelund Hansen Updated on ・5 min read

In the last post we looked at how we could make Array more expressive. By which I mean we added a way for people to extend the data structure to support more use cases. But while third parties now can add missing functionality, there are certain functions which should be supported out of the box, both for performance and convenience reasons.

Elm has four built-in collection types: List, Array, Dict and Set. With the exception of Dict, these types look similar from a birds eye view. You can insert an item into the collection, get it back out, and even perform some task for every item stored. Of course, they do have different trade-offs which make them suitable for different situations.

The List is great when adding and retrieving items at the beginning (left side) of the collection. Array is great when you need fast access to any element, and Set is good when you need a sorted collection with no duplicates.

The API for these collections are different as well. Sometimes this makes sense, but not always. For instance, there is no sort function for Set, which is reasonable as Set is a sorted collection. However, there also isn't a sort function for Array, which means that you have to convert it to a List or a Set and then back again if you want it sorted.

Converting from one collection type to another is a costly operation. If you use an Array because it suits your use case most of the time, you don't want to convert to another collection just to perform that one operation.

Let's look at how we can improve the API of List, Array and Set to reduce the need for converting from one collection type to another to support the functionality you need.

Differences which should remain

Certain operations aren't supported on a collection type with good reason. For instance, there is not get or set operation on List because the performance would be bad. If one need to get or set a random item in a collection, Array should be used. Likewise, union, intersect and diff should only be supported for Set. Here are some functions that are not supported by all three collection types, which I believe shouldn't be supported either:

get, set, insert, remove, :: and push can only be supported efficiently by their respective collections. If you're in need of one of these functions, it's a tell-tale sign of which collection you should use. However, with the improvements I've mentioned in earlier posts, Array could support all of these functions efficiently, which is yet another reason as to why Array should be the default collection type.

indexedMap and toIndexedList make no sense for Set, as it doesn't operate by indexes and doesn't retain insertion order. repeat doesn't make much sense for Set either, as it cannot contain duplicates.

union, intersect and diff can only be efficiently implemented by Set.

head can be supported efficiently by the other collections, but the operation itself really only makes sense for List. Array would use get 0, and it isn't quite clear what head means in the case of Sets, is it the first/lowest/minimal element or the root/middle element?

sort, sortBy and sortWith doesn't make sense for Set, but Array should definitely get these functions.

reverse can be implemented efficiently for Array, but makes little sense for Set.

Where we could bridge the gap

Below is a list of functions which can be easily supported across all three collections:

  • singleton
  • initialize
  • range
  • concat
  • intersperse
  • member
  • map2, map3, map4, map5
  • concatMap
  • filterMap
  • partition
  • take
  • drop
  • unzip
  • slice
  • sum
  • product
  • maximum
  • minimum
  • all
  • any
  • scanl

While these functions could be supported across all three collections, does that mean they should? Since we're already messing about, let's ask ourselves if there are certain functions that shouldn't be in core to begin with.

sum and product are essentially shortcuts for List.foldl (+) 0 and List.foldl (*) 1. Are those operations common enough to grant them dedicated functions in the core library? If we introduce slice for all three collections, do we need take and drop?

Do people actually use map4 and map5? How about scanl and unzip? I mean, there probably are, but are those operations so common that one would expect to find them in core? Not that it necessarily does any harm, but there is something to be said of having a small, but powerful, package where you can read the entire documentation in a short sitting.

Anyway, here's a list of which functions I, personally, see no reason for having in core. I expect people will tell me that these functions definitely belong, but I think it would be nice to at least have the discussion.

  • intersperse
  • map4
  • map5
  • concatMap
  • unzip
  • sum
  • product
  • scanl
  • take
  • drop

Finishing touches

Here are some minor things I think shouldn't be left out of a API renovation:

List and Array has a length function, Set has size. This is likely due to the fact that Set is a thin abstraction on top of Dict and so has just copied the name. I would rename Set.size to Set.length out of consistency if nothing else.

Now that I'm on the topic, since Set is thin abstraction on top of Dict, maybe everything discussed up til this point applies to Dict as well?

List has :: and tail. The analogous for Array is push and pop. pop is missing from the API, so I would add it.

I would add find for all three collections. List and Set already supports member, which is essentially find except that it returns a boolean instead of the value for which the predicate function returns true. It's a more useful function in general, and so if member should be a part of the API (which I believe it should) then find belongs as well.

For Set and Dict I would throw in min and max to retrieve the minimum/first and maximum/last element.

I think Stack is a better name than List and so would rename that type. It's a name that more clearly explains what that collection type is good at.

Finally, as mentioned in the previous post, I think stoppableFoldl and stoppableFoldr, or something with the same functionality but with better names, should be added to the API of Array. I think Dict and Set could be well served with that functionality as well.

What do you think?

The goal I'm trying to reach is to make the Array type more general purpose, but when improving the Array API, it makes sense to look at the other collection types as well. This article lays out what I believe those APIs should look like.

First, I want to say that I'm not authority on what makes it into Elm. If I actually code up the suggestions in this article and submit it as a proposal, all of it could make it in, or none of it.

Second, all these suggestions are based on my own experience. What I need is your experience. Does these suggestions make sense, or am I off my rockers? Let me know in the discussion thread on the Elm Discourse.

And now, for something quite different

So, is this the end of this series? Not quite. There's a summary left the be published, but that will have to wait until after we go over the top (you didn't think I'd forgot, did you?) in the next post, which is about the literal representation of List and Array.

Discussion

pic
Editor guide
Collapse
dubyabrian profile image
W. Brian Gourlie

Functional languages tend to get the distinction between arrays and lists right because List is generally understood to mean "linked list" and array is generally understood to mean "array-backed list." They also tend not to expose random-access methods on linked lists, unlike Java which is happy to provide this particular foot-gun. In other words, I feel like imperative/OO languages should change their nomenclature, not functional languages.

Collapse
robinheghan profile image
Robin Heggelund Hansen Author

You're talking about my interest in renaming List to Stack?

Generally understood amongst whom? Most people who come to Elm come from JS. Depending on what other language they have experience with, List is a very ambiguous term. In python and C#, List actually refers to Array. Also, when a JS developer sees [1, 2, 3] it's reasonable to expect List to work as an Array.

Stack is not only more descriptive of what performance one can expect, it also avoids confusion. While List often means linked list in other functional languages, most people new to Elm have no experience with other functional languages, and so they are more likely to expect that List works like an Array.

Collapse
dubyabrian profile image
W. Brian Gourlie

You're talking about my interest in renaming List to Stack?

Yep.

Generally understood amongst whom?

I probably could have phrased it better: Those who work in functional languages tend to have a less ambiguous understanding of what a list is, and are unlikely to conflate linked lists with arrays or array-backed lists.

A more pragmatic approach to introducing people to functional languages is to make this distinction clear to them from the start, as opposed to introducing ambiguous terminology in the one paradigm where little ambiguity exists for these terms.

Collapse
mcapodici profile image
Martin Capodici

Funnily enough it is called ArrayList in C#1.0.

That type is still there, although most will now opt to use the newer List<> which is a generic type.

Once everyone is conditioned to think "A list is an array" it's OK for Microsoft to shorten the name in the newer type.

In Elm LinkedList would be a good type name to avoid confusion.

To me it doesn't feel like a stack. I mean technically it is. But then what's with the (++) operation? That's a weird operator for stacks, right? It feels weird to use any infix and would rather have definitions like this:

push : a -> Stack a -> Stack a
peek : Stack a -> Maybe a
pop : Stack a -> (a, Stack a)

Thread Thread
robinheghan profile image
Robin Heggelund Hansen Author

You’re right, (++) is not an efficient operation for a single-linked-list, and so one could make a (good, in my oppinion) argument for it’s removal.

Thing is, Haskell’s List also has (++), and Elm hasn’t had a stable Array implementation (it gets one in 0.19).

If my proposal is accepted, it would make sense to remove bad operations from List.

Collapse
josephthecoder profile image
Joseph Stevens

Interesting topic, thanks for the post