DEV Community

loading...

Advanced Arrays

entomy profile image Patrick Kelly ・5 min read

When discussing the various data structures implemented within Collectathon, the best starting point is the simplest. Explaining how a Trie is a variation of Multiway-Tree which itself uses a dynamically resizing array to hold its various child references makes absolutely no sense if you're not familiar with the base components. And what is the absolute most basic data structure? Ignoring things implemented directly within the language, like a class or struct, it would be the array. You're certainly familiar with the most basic of all arrays, the fixed-size array, as it's easily written in almost every programming language as something like Int32[]. But did you know there's more advanced behaviors that can be implemented on top of what is quite literally still an array?

BoundedArray<TElement> and BoundedArray<TIndex, TElement>

I'm introducing two types at once, and that'll be a general trend throughout the majority of collections. Why? Because both behave the exact same way, with the only difference being one is associative and the other isn't.

Associativity

As a quick aside if you're unfamiliar with associativity, because I've been told new learners especially struggle with it, associativity just means the element is entered into the collection with a specific index. I'll have examples as this goes on that should make it clear.

Bounding

Bounding is the idea that the array has a maximum boundary or size, also known as its capacity, but that it can freely resize below this bound. This is the single trait that differentiates it from the standard fixed-sized array, as instead of having an array of 5 elements, you have an array that can be 0, 1, 2, 3, 4, or 5 elements! If you work with SQL a lot, you should recognize that SQL character fields are bounded arrays.

Use

First, let's look at BoundedArray<TElement> since it's a little more straight forward.

How does declaring and constructing one look compared to a fixed-size array?

public readonly Int32[] fixedArray =
    new Int32[] { 1, 2, 3, 4, 5 };
public readonly BoundedArray<Int32> boundedArray =
    new BoundedArray<Int32>(5) { 1, 2, 3, 4, 5 };
Enter fullscreen mode Exit fullscreen mode

Looks pretty similar, which should make sense considering these are nearly the same thing. But let me show you something else. BoundedArray<TElement> supports Implicit(TElement[] to BoundedArray<TElement>).

public readonly BoundedArray<Int32> boundedArray =
    new Int32[] { 1, 2, 3, 4, 5 };
Enter fullscreen mode Exit fullscreen mode

In fact, any fixed-sized array can be made into a bounded-sized array, because we have all the information needed to create it.

You'll be happy to know that bounded arrays support all the members you'd expect from a fixed array.

Assert.Equal(3, boundedArray[2]);
Assert.Equal(new[] { 1, 2, 3 }, boundedArray[0..2]);
Assert.Equal(5, boundedArray.Length);
Enter fullscreen mode Exit fullscreen mode

And so on...

So why use a bounded array instead of a fixed array? That all comes down to the additional flexibility that bounded arrays provide. Consider these operations which simply don't exist for fixed arrays:

boundedArray.Remove(5); // Remove every 5
boundedArray.Remove((x) => x % 2 == 0); // Remove every even
// boundedArray is now: [1, 3]
boundedArray.Add(4);
// boundedArray is now: [1, 3, 4]
boundedArray.Insert(1, 2);
// boundedArray is now: [1, 2, 3, 4]
Enter fullscreen mode Exit fullscreen mode

Note: if you add or insert an element when the array is at its maximum capacity, you get an exception!

Where BoundedArray<TIndex, TElement> goes different is that it's associative, and we now have a known index type as a result. Consider construction:

public readonly BoundedArray<Char, String> boundedArray =
    new BoundedArray<Char, String>(5);
Enter fullscreen mode Exit fullscreen mode

In this instance, we can use an array initializer assuming we have an array of (Char, String)[]. You can call Zip() to join two arrays of equal length into this, but you could also just specify like so:

new BoundedArray<Char, String>(5) {
    ('a', "Alfa"),
    ('b', "Bravo"),
    ('c', "Charlie"),
    ('d', "Delta"),
    ('e', "Echo"),
};
Enter fullscreen mode Exit fullscreen mode

And then, hopefully it's obvious that with associative collections, since we're explicitly passing an TIndex type and parameter, that you use that to, ya know, index.

Assert.Equal("Bravo", boundedArray['b']);
Assert.Equal("Echo", boundedArray['e']);
Assert.Equal(null, boundedArray['f']); // Not in the collection, returns default
Enter fullscreen mode Exit fullscreen mode

This also means that you can use the associative variant as a sparse array, just by using Int32 or another integer type as the index!

The strengths of bounded arrays is they offer a decently level of flexibility without a large degree of implementation complexity. In fact, the implementation just relies on an additional integer inside of the data structure, and absolutely no fancy logic.

The drawback to bounded arrays is they always use up their maximum capacity in memory, regardless of how much they are actually using. You don't get a choice. Potentially need both very large and very small arrays? You gotta use the largest capacity for all of them regardless. That's not okay, and it lead to the development of:

DynamicArray<TElement> and DynamicArray<TIndex, TElement>

Believe it or not, we're already largely showed off everything that would need to be showed off. How is the dynamic-sized array different then? That comes down to what it's doing internally. While the bounded-size array will throw an exception if you add or insert an element while at capacity, dynamic-size arrays do something very different. They grow! This operation is extremely inefficient, and there's other more suitable collections if you're going to be changing size a lot. But it does this by creating a new array with a larger capacity, copying the elements over, swapping the internal reference, and then discarding the old array. All of this, done for you! Dynamic-size arrays do have their use cases, and are used internally quite a bit, but typically speaking you're gonna want a list or tree. If you know size changes are going to be a rare thing, dynamic arrays just might be what you need.

While this is it for complete, ready to use, data structures, there's more you may be interested in, like:

Array<TElement, TSelf> and Array<TIndex, TElement, TSelf>

Since Collectathon isn't just a collections library, but also a framework, we should also dive into how the framework types work and how they can be used to add additional behavior if you so wish. In both of these variants, it assumes little beyond the type just being an array of some kind. This does still provide quite a few things for implementers however, such as forward and reverse enumerators (yes, Collectathon supports reverse enumerators on all its types that can support it), indexers and slicers, equality methods and operators, element replacement, shifts, and more. Quite a lot of functionality for not being able to assume anything other than "it's some kind of array".

If you're unfamiliar with CRTP, that TSelf parameter is just saying "hey, pass the implementing type into me, because I'm gonna use it to provide stuff for you". Basically, it helps you get even more out of the framework, for free.

But we can go further!

FlexibleArray<TElement, TSelf> and FlexibleArray<TIndex, TElement, TSelf>

Both the bounded and dynamic arrays derive from this. It adds the assumption that "this array can be resized, but I don't know the limitations and mechanics of that".

This assumption allows a lot more than you might expect. Inserting elements, clearing the entire array, queue and stack behaviors, and removing elements, are all automatically provided for you. In fact, even by this point so much has been defined that the DynamicArray<TElement only needs to implement... 4 methods! No, seriously!

Alt Text

Collection frameworks are that effective.

Discussion (0)

pic
Editor guide