loading...

Lazy Iterators From Scratch

emnudge profile image EmNudge Updated on ・7 min read

I really like functional programming paradigms. Not necessarily functional programming. I've never quite got into it.

But things such as higher-order functions, pattern matching, immutable data structures, pure functions, and so on are really nice to use and reason about. These aspects allow for cleaner and readable code, but may come at the expense of performance if not implemented properly.

One of the easiest ways to sacrifice performance for readable code in javascript is with the higher-order functions. They're fantastic, but you can land yourself in some situations that could have been avoided with a less functional approach.

Let's create a bit of a contrived but somewhat practical example. We need to do some shopping. To simplify things, we won't include any named for the products, just the price. We will try to calculate which items we can afford to buy via filtering out the ones that go over the total.

// constants
const TAX_RATE = 1.08875;
const BASE_SHIPPING = 8;
const BANK_BALANCE = 40; //

// all the items we want to buy
const itemPrices = [2, 4, 5, 9, 10, 13];


let currTotal = BASE_SHIPPING;

const newPrices = itemPrices
    .map(n => n * TAX_RATE)
    .filter(n => {
        if (currTotal + n > BANK_BALANCE) return false;
        currTotal += n;
        return true;
    });

console.log(newPrices)

Did you catch the problem? No? Let's pretend our array had a thousand elements. A million elements, maybe. Let's also keep our bank balance the same. We're a child with a piggy bank and big dreams.

Each method call takes in a higher-order function and loops through the entire array. Is there any way to stop looping prematurely? Yes, but not with this approach.

We are checking if the current total is greater than our bank balance. Once the total exceeds the balance, there isn't really a need to continue. We know the rest of the items are not within our budget.

(This wouldn't necessarily be the case if the items weren't sorted. They are in this snippet.)

Let's now write the same code with a for-loop:

// snip...

const newPrices = [];

for (const n of itemPrices) {
    const priceWithTax = n * TAX_RATE;

    if (currTotal + priceWithTax > BANK_BALANCE) break;

    currTotal += priceWithTax;

    newPrices.push(priceWithTax);
}

// snip...

Our object-oriented code, aside for the keyword, is faster by virtue of not creating a new array each time. We combined both map and filter into statements in our for loop. Only one array is created.

But did you notice that keyword?

break

It lets us prematurely exit the loop. Not only are we no longer checking if we reached our total, but we also aren't even adding the tax! We have skipped 2 operations that we otherwise couldn't!

The functional approach using Array.prototype.map and Array.prototype.filter are just less performant due to the very nature of the methods themselves.

One thing you also may have noticed is that our less functional approach is almost objectively less readable. It's harder to scan and realize what's going on. Yes, it's less performant, but it may need to be a sacrifice taken when writing clean code for smaller arrays where a couple of extra operations is insignificant.

However, there is a way to satisfy the performance problem while still applying a clean-code/imperative paradigm. This is with lazy iterators.

Lazy Iterators

One thing that may seem obvious to use about these higher-order functions is that they do something when you call them. When you tell it to map, it maps. When you tell it to filter, it filters.

What other way can we make these work? We can probably envision a system where the filter method is provided another parameter - one that tells it when to stop iterating. This would involve moving the method provided to filter into a for loop.

We can also probably envision a system where the map and filter are combined as we did in our object-oriented approach.

This is all possible with lazy iterators. We can take in methods such as map and filter, but not execute them until we're told to. We take the functions passed into these methods and execute them in a for loop in order to break iteration early.

A lazy iterator library in JS might looks something like:

const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9];
const arrIter = new LazyIter(arr);

const newArr = arrIter
    .map(n => n ** 2)
    .filter(n => n < 30)
    .collect();

Although the code looks very similar to the functional version, it uses a for loop under the hood where all functions are executed on each element, one by one. This also provides some other benefits.

// snip...
const arrIter = new LazyIter(itemPrices); // use lazy iter library

const newPrices = arrIter
    .map(n => n * TAX_RATE)
    .takeWhile(n => {
        if (currTotal + n > BANK_BALANCE) return false;
        currTotal += n;
        return true;
    })
    .collect();

// snip...

takeWhile is a method that stops the iteration when it returns false on a given element. Because every function gets executed once per element instead of each one iterating over the entire set, we can also ensure the map is only getting executed for the elements returned.

As it is an iteratable, we can also use it in a for loop without collecting and then stop prematurely using break, saving on function calls yet again, since the functions are only called when each element is retrieved.

const arr = new LazyIter(
    [1, 2, 3, 4, 5, 6, 7, 8, 9]
).map(expensiveOperation);

for (const item of arr)  {
    break; // only called expensiveOperation once
}

Let's Make It

Well, this wouldn't be a "from scratch" article if we didn't go over how to make one. It's surprisingly simple.

Let's first create our class.

class LazyIter {
    constructor(arr) {
        this.arr = arr;
        this.funcs = [];
    }
}

Nothing especially important here. We are storing the array provided to us and then creating an array to store all the functions that users will add via the methods provided.

class LazyIter {
    // snip..

    map(func) {
        this.funcs.push({ type: 'map', func })
        return this;
    }

    filter(func) {
        this.funcs.push({ type: 'filter', func })
        return this;
    }

    takeWhile(func) {
        this.funcs.push({ type: 'take_while', func })
        return this;
    }
}

Here we have functions that add the parameter to the funcs array, with a string identifying what kind of function via the type property. You may also notice the return this at the end of each function. This is to allow method chaining, but is not strictly necessary.

These are the only 3 function methods we're going to provide. Other ones should be simply as trivial, but I'll leave the minutia up to you.

class LazyIter {
    // snip...

    *[Symbol.iterator]() {
        for (const item of this.arr) {
            yield item;
        }
    }
}

So this might look a bit odd. It's not finished, don't worry.

This here is a [Symbol.iterator] method. If there exists a Symbol.iterator method that returns an iterator, the class/object is known as an iterable, which lets us use it in for loops and other areas where iterables can be used.

We can alternatively create a generator instead of the weird mess hand-implementing the iterator-protocol would require. That's what the * means. Any expression we yield will be an item in our iterator.

That means our class currently can be shoved into a for loop and give us the elements in our array. Since we could have done that without shoving the array into this class, this isn't especially helpful.

class LazyIter {
    // snip...

    *[Symbol.iterator]() {
        outer:
        for (const item of this.arr) {
            let val = item;

            for (const { type, func } of this.funcs) {
                if (type === 'map') {
                    val = func(val);
                    continue;
                }

                if (type === 'filter') {
                    if (!func(val)) continue outer;
                    continue;
                }

                if (!func(val)) break outer;
            }

            yield val;
        }
    }
}

You'll find a bit of an odd coding style here, like how I use continue instead of else, but it's easy if you take it slowly.

Essentially, we have 2 loops - one to loop over the array and an inner one to apply all the functions to each item.

We are labeling the outer loop with outer: in order to break out of both loops from the innermost one without making things a little too complicated.

Take note of continue outer and break outer. This is how we skip out of the inner loop and perform some action continue/break on the outer loop. A filter would skip the outer loop from yielding the value, essentially filtering out the item. A takeWhile would break the outer loop, removing all subsequent items.

We are going to use this iterator protocol to create our collect method, finishing up our entire class.

class LazyIter {
    // snip...

    collect() { 
        return [...this];
    }

    // snip...
}

Yup. Simple as that. Since we are now an iterable, we can spread ourselves into an array. This lets us keep our code nice and simple.

We can create other methods similar to collect, such as take(num) which accepts a number of elements to retrieve. It's simple enough to code, so I'll leave that up to you.

Here is the class in its entirety:

class LazyIter {
    constructor(arr) {
        this.arr = arr;
        this.funcs = [];
    }

    map(func) {
        this.funcs.push({ type: 'map', func })
        return this;
    }

    filter(func) {
        this.funcs.push({ type: 'filter', func })
        return this;
    }

    takeWhile(func) {
        this.funcs.push({ type: 'take_while', func })
        return this;
    }

    collect() { 
        return [...this];
    }

    *[Symbol.iterator]() {
        outer:
        for (const item of this.arr) {
            let val = item;

            for (const { type, func } of this.funcs) {
                if (type === 'map') {
                    val = func(val);
                    continue;
                }

                if (type === 'filter') {
                    if (!func(val)) continue outer;
                    continue;
                }

                if (!func(val)) break outer;
            }

            yield val;
        }
    }
}

Closing

I don't usually make tutorials. My articles are more conceptual than anything.

I wanted to write this one to outline the type of performance improvements that developers may want to focus on. While micro-optimizations and language-specific optimizations are never a very wise choice in JS, algorithmic improvements work across languages and are very difficult for the engine to optimize.

A developer's primary concern should be code clarity, with performance coming at a close second, depending on the situation. When performance benefits can be achieved without sacrificing code clarity, there often isn't much of an excuse.

If your goal is more important in your situation, the object-oriented approach will always be faster than using our abstraction class. It is simply much more difficult to read and reason about.

EDIT: After writing this (and yes, after) I decided to go ahead and put a lazy iterable class on github. This one is made with TS, so there are some code changes and additions.

Posted on Aug 6 '19 by:

emnudge profile

EmNudge

@emnudge

Web dev, voice actor, and water enthusiast. Explaining the complicated bits of JS.

Discussion

markdown guide