DEV Community

Cover image for Some lists are not like the others
Klemen Slavič
Klemen Slavič

Posted on • Updated on

Some lists are not like the others

So far in this series, we've been dealing with arrays as natural containers of values that allows us to treat them as a sequence. But what is an array, really? What makes them tick? Let's find out!

Impostors, impostors everywhere

In JavaScript, an array is a special type of object with a magical property called length and integer strings for keys, starting with 0. A special syntax allows you to create an array by specifying the sequnce of values in square brackets:

const realArray = ['a', 'b', 'c'];
Enter fullscreen mode Exit fullscreen mode

If you look at an array as any other object in JavaScript, you'll notice that you'll get approximately the same shape as the following object:

const fakeArray = {
  '0': 'a',
  '1': 'b',
  '2': 'c',
  length: 3
};
Enter fullscreen mode Exit fullscreen mode

This array will work just fine if we loop over it. 🎵 Don't believe me? Ask the dishes! 🎵

const printArray = (name, arr) => { const report = []; for (let i = 0; i < arr.length; i++) report.push(i + " => '" + arr[i] + "'"); console.log(name, '[' + report.join(', ') + ']'); }; const realArray = ['a', 'b', 'c']; const fakeArray = { '0': 'a', '1': 'b', '2': 'c', length: 3 }; printArray('real array', realArray); printArray('fake array', fakeArray);

Speaking of ducks, this is called duck typing, if you've ever wondered where the term comes from or what it means. Languages support duck typing in various forms using interfaces, which enables loose coupling while still enforcing object shapes.

Some JavaScript and DOM objects are also array-like but aren't real arrays, like arguments or NodeList. Some libraries took the dynamic nature of objects even further, and appended methods directly onto arrays for convenience (hello, jQuery!).

As long as it looks like an array (and quacks like an array), any code using it will be none the wiser. Well, at least the code that uses integer keys and length to loop over the properties. It won't work with for...of, Array.from() or spreads, which is what we're going to fix next.

Iterators, iterables and Symbol.iterator

To improve our disguise, we'll implement the API required for JavaScript to provide iteration capability on our fake array. To do this, let's first take a look at what an iterator is.

An iterator is any object with a method called next(). When you want to get the values from the iterator, you call next() to get an object with two properties:

  • value: the next value in sequence,
  • done: a boolean that tells you whether there's more values to give

Given those requirements, let's build a function that creates an iterator that counts from 1 to 5:

const createIterator = max => { // take an upper bound to count to let count = 1; // set the initial value to 1 const iterator = { // create an object... next() { // ...that has a next() method if (count > max) // if the current value exceeds the upper bound... return { done: true }; // ...tell the caller that there are no more values const value = count; // if not, grab the current value... count += 1; // ...increment the counter... return { // ...and return an object value, // with the current value done: false // and tell the caller we're not done yet }; } }; return iterator; // oh yeah, and give the iterator to the caller. }; const iterator = createIterator(5); console.log(iterator.next()); // 1 console.log(iterator.next()); // 2 console.log(iterator.next()); // 3 console.log(iterator.next()); // 4 console.log(iterator.next()); // 5 console.log(iterator.next()); // no more values!

Okay, that looks kinda painful to use directly. You could write a while() loop, but it's easy to accidentally cause an infinite loop or have a off-by-one error. We can make this easier to use by making an iterable object.

An iterable object can be consumed in a for...of loop, by Array.from() or the spread operator.

The difference between an iterator and an iterable is that an iterable returns an iterator when calling a specially named property called Symbol.iterator. That is quite a mouthful, so let's write it down step by step:

const createIterator = max => { let count = 1; const iterator = { next: () => { if (count > max) return { done: true }; const value = count; count += 1; return { value, done: false }; } }; return iterator; }; const createIterable = max => { // start by taking the upper bound const iterable = { // create an object... [Symbol.iterator]: () => { // ...with a [Symbol.iterator] method... return createIterator(max); // ...that creates and returns an iterator } }; return iterable; // finally, return the iterable }; // create an iterable that can count to three const oneToThree = createIterable(3); // for...of? for (const n of oneToThree) console.log(n); // spreading? console.log([...oneToThree]);

So, in order for our fake array to become iterable, we have to add a method that will return an iterator:

const fakeArray = { '0': 'abc', '1': 'def', '2': 'ghi', '3': 'jkl', length: 4, [Symbol.iterator]: () => { // implement the iterable interface let i = 0; // start counting at 0 return { // return an object... next() { // ...with a next() method (the iterator) const value = fakeArray[i]; // get the current value i += 1; // increment the counter return i <= fakeArray.length // if we're not out of bounds yet... ? { value, done: false } // ...give the value back... : { done: true }; // ...else, signal we're done. } }; } }; for (const element of fakeArray) console.log(element); const realArray = [...fakeArray]; console.log(realArray);

There are three more iterable methods that need to be implemented in order for our fake array to behave as close to the real one as possible:

  • keys(): returns an iterable for the array's keys,
  • values(): returns an iterable for the array's values,
  • entries(): returns an iterable that returns arrays of key-value pairs ([key, value]).

I'll leave it as an exercise for the reader to implement those, along with the other array methods, like map(), filter(), slice(), etc.

There's one last thing to be aware of, though: you'll find it very hard to fool code using Array.isArray() and instanceof Array to check for array types. For our purposes, we only wanted to replicate the behaviour of arrays, not fool JavaScript into believing it's an actual array when it really isn't.

Arrays: the fast and easy parts

Because of the way arrays are constructed, there are certain properties that make arrays preferrable over other data structures in some situations. Arrays are wonderful data structures when you want:

  • a known amount of values in a list,
  • to preserve the sequence of values,
  • access values directly via index positions in the list,
  • a fast way to append to or pop elements off the end of the list.

If those properties match well with the requirements of the problem you're trying to solve, then arrays are a perfect fit. Go ahead and use them! But that last property is mentioned specifically because there's a fundamental trade-off made there you may not be aware of. Let's take a look at the reason why that would be the case.

Arrays: the costly parts

Our fake array looks like this:

const a = {
  '0': 'first',
  '1': 'second',
  '2': 'third',
  length: 3
};
Enter fullscreen mode Exit fullscreen mode

What would it take to append a new value onto that object?

a['3'] = 'fourth';    // set index 3 to equal the 'fourth' value
a.length = 4;         // update length to 4
Enter fullscreen mode Exit fullscreen mode

With 4 elements in the array, how would we popp the last element off?

delete a['3'];        // remove index 3
a.length = 3;         // update length to 3
Enter fullscreen mode Exit fullscreen mode

It only takes two changes to make each of those operations. So what if we decided to shift the first element off the start of the array? Well, let's try:

const first = a['0'];  // we take the first element out
a['0'] = a['1'];       // we move the second element into first position ...
a['1'] = a['2'];       // ... the third element into second position...
delete a['3'];         // ... and delete the third element
a.length = 2;          // finally, we update the length to 2

// this is what the array looks like now:
{
  '0': 'second',
  '1': 'third',
  length: 2
}
Enter fullscreen mode Exit fullscreen mode

Now think about what this means in terms of the number of operations when the size of the array grows. If we have n elements in the array, how many operations do we need to perform each of the following:

  • get the number of values in the collection,
  • get a specific value by index position from the array,
  • append a single value,
  • prepend a single value,
  • remove a value off the end of the array,
  • remove a value off the start of the array,
  • searching for a value in the array.

Let's go through them one by one.

length

The first one is easy to determine; the array already has a value stored that keeps the count of values: length. Accessing it costs us about the same as accessing an object property:

a.length;
Enter fullscreen mode Exit fullscreen mode

This operation is independent of the array size, since we don't have to count the size of the collection each time we access that property, so let's assign that a cost of 1.

[index]

The second one is similar to the first; accessing a string property on a JavaScript object carries a fixed cost similar to length, so let's assign that the same cost, 1.

push()

Appending a single value requires two updates: assigning a value to a new index and adding 1 to the length property. That makes the cost equal to 2.

pop()

Removing a value from the end of the array also requires two updates (deleting the last index and subtracting 1 from length), so it gets a cost of 2.

unshift()

Prepending the array with a value is a little trickier. For each element added to an array of length n, we have to:

  • increment all index positions of existing values (n operations)
  • assign the new element to the 0 index (1 operation)
  • increment length by 1 (1 operation)

Sum it all up, and you get a total cost of n + 2.

shift()

Removing a value off the start of the array is similar in cost. For each element removed from an array of n element:

  • store the first element (1 operation)
  • decrement all index positions of the rest of the values (n - 1 operations)
  • decrement length by 1 (1 operation)

The total cost therefore comes down to n + 1.

indexOf()

Searching is a more interesting problem to estimate, since it depends on three factors: where you start searching, the way you iterate over the indices and where the found value is. If we could take a reasonable guess about the probable location of the value, we might improve our odds, but let's say that the value has an evenly spread probability among n indices. Assuming we start from the beginning of the array, we have to:

  • take value at current index (each loop costs 1 operation)
  • compare reference to the value at selected index
    • if found, return index
    • else, select next index

In the best case scenario, the first element is the value we're looking for, so we have a total of 1 loop. In the worst case, we would have to reach the very last index to find the value, so the cost would be n. If we average out all the possible scenarios and their costs, we get an average of n / 2 operations.

For reference, if we have to go through a collection of items one at a time without skipping any elements in a sequence to guarantee finding the element, it's called a linear search. This will be important later.

The final cost table

So, let's break down the costs again:

| Array method | Cost  |
|--------------|-------|
| length       |     1 |
| push()       |     2 |
| pop()        |     2 |
| shift()      | n + 2 |
| unshift()    | n + 1 |
| indexOf()    | n / 2 |
Enter fullscreen mode Exit fullscreen mode

And in case you wanted to get a feel for how these methods perform in your chosen JavaScript environment, try out this benchmark that illustrates the difference in performance on an array of 1000 elements.

The Big (and Scary) O Notation

You may have heard of Big O when people discuss runtime performance of algorithms. It is a mathematical expression that allows you to compare the time it takes for algorithms to complete a task given the size of the input, n.

Think of it as a rating, like the ratings we assign to chess players. A rating allows you to compare two chess players to see how well matched they would be if they ever played a match. A chess player with a high rating would probably wipe the floor with someone from a lower tier (assuming they played enough games for their ratings to reflect their real skill).

We can use Big O as a rating for algorithms, with a simple rule: smaller is faster.

Big O is written as O(...) where the parens contain an expression involving the size of the input. To derive this expression, you can count how many steps an algorithm performs for a given size n. Let's update our table by using the Cost column as our starting point:

| Array method | Cost  | Big-ish O |
|--------------|-------|-----------|
| length       |     1 | O(1)      |
| push()       |     2 | O(2)      |
| pop()        |     2 | O(2)      |
| shift()      | n + 2 | O(n + 2)  |
| unshift()    | n + 1 | O(n + 1)  |
| indexOf()    | n / 2 | O(n / 2)  |
Enter fullscreen mode Exit fullscreen mode

There is a rule for Big O: we don't care about small inputs, we only want to know how to compare performance for big inputs. You know, inputs the size of bank bailouts, as n approaches ridiculous. There are three steps to perform when reducing the expression to Big O:

  1. expand all expressions,
  2. anything times n^x is just n^x (a * n^x ~ n^x)
  3. cross out everything but the highest power of n

Note: this is a very naive and simplistic reduction of the limit theorem. Formally, the task is to simplify the expression for lim(n->Inf) f(n). The steps above are broadly useful for most algorithms you may encounter on a daily basis.

Let's take a hypothetical example. If we have a list of n values. We have to compare each element to every other element in the list, and we have to go through the entire list twice. To do that, we need to:

  1. for every element, we perform n-1 comparisons (cost 1 each),
  2. we repeat this for n elements (n times the cost of step 1),
  3. repeat the process once more (double the cost – 2).

So our final cost is 2 * (n * (n - 1)) operations. First we expand that expression by multiplying the two factors:

2 * (n * (n - 1)) = 2n * (n - 1) = 2n^2 - 2n
Enter fullscreen mode Exit fullscreen mode

We cross out all factors of powers of n:

2n^2 - 2n  ~~~  n^2 - n
Enter fullscreen mode Exit fullscreen mode

And finally, we cross out everything but the highest power of n, and we're left with Big O notation:

n^2 - n   ~~~  O(n^2)
      ^ ignore
Enter fullscreen mode Exit fullscreen mode

Now we can derive real Big O values for our array methods:

| Array method | Cost  | Big O |
|--------------|-------|-------|
| length       |     1 | O(1)  |
| push()       |     2 | O(1)  |
| pop()        |     2 | O(1)  |
| shift()      | n + 2 | O(n)  |
| unshift()    | n + 1 | O(n)  |
| indexOf()    | n / 2 | O(n)  |
Enter fullscreen mode Exit fullscreen mode

Anticipating problems

Big O allows us to estimate how long something will take when the input grows in size. For O(1), no matter how large the input grows, it should not noticeably impact our performance (unless limited by hardware or the JS runtime).

It also allows us to estimate how slow our program will be when the size of our input data grows. Let's say that generating a report currently takes 30 seconds for a thousand customers. If our report generation complexity is O(n), then growing the company by 100% should increase that time by 100% as well. This may or may not be acceptable, but at least you can now anticipate problems and predict how soon you might be hitting your limits.

Sometimes, algorithms can be changed to leverage other types of data structures that perform better than arrays on some tasks, making O(n) seem painfully slow in comparison.

Wrapping up

We've now seen how the array works in JavaScript. By carefully reasoning about what the built-in methods do, we've been able to derive Big O performance envelopes that we can use to estimate how fast our programs will run when using arrays as the primary data structure.

Up next, we'll look at some of the other built-in data structures and see how we can improve on some of the shortcomings of arrays, and dip our toes into more interesting problems.

Until next time!

Photo by Mike Alonzo on Unsplash

Top comments (2)

Collapse
 
joelnet profile image
JavaScript Joel • Edited

You can take the fake array even further by forcing Array's prototype onto it:

const a = {
  '0': 'first',
  '1': 'second',
  '2': 'third',
  length: 3
};

a.__proto__ = Object.create(Array.prototype)

a.push('fourth')
//  \
//    Now we can use push!
//
//    We can also use map!
//  /
a.map(x => x.toUpperCase())
// [ 'FIRST', 'SECOND', 'THIRD', 'FOURTH' ]

btw... never write code like this ^

Cheers!

Collapse
 
krofdrakula profile image
Klemen Slavič

Correct, the power of prototypal inheritance allows you to quickly compose objects like that. 👌

The point of the exercise was to implement the methods directly, rather than just piggybacking off of the existing ones, though. It could have easily been an article about how to use prototypes to build your custom arrays, something that I might pick up a later date.