DEV Community

Anton Alexandrenok
Anton Alexandrenok

Posted on • Edited on

Lazy data processing using Undercut

Undercut is a JavaScript library for processing data in a lazy or deferred manner by building pipelines.

The focus of the library is on leveraging existing JavaScript features like Iterators/Generators while having balanced API: not being Java/C# influenced or heavily functional. Undercut also aims to avoid prototype extension and a situation, where you need to name a method as flat instead of flatten. You may also use it as an alternative to Lodash's chain functionality with support for lazy execution, tree shaking, etc.

Imagine a conveyor on a car factory: a chain of operations from welding body parts and painting doors to gluing a logo and inflating wheels. Every operation is independent and is based only on a protocol: a car comes from this side and goes to that side after the operation is complete.

In JavaScript we may represent this as an array of functions:

const pipeline = [
    skip,
    map,
    filter,
    take,
];
Enter fullscreen mode Exit fullscreen mode

Of course, those operations have some input data: this car should have 17" wheels, that car should have 16" wheels. We can do this too:

const pipeline = [
    skip(1),
    map(x => x - 3),
    filter(x => x !== 4),
    take(100),
];
Enter fullscreen mode Exit fullscreen mode

Calling skip(1) creates a function (operation) that knows how to skip exactly 1 item (car).

Sometimes you need to make a new model with additional equipment package. It may be as simple as adding a couple of steps to the conveyor:

const pipeline_2 = [
    ...pipeline,
    filter(x => x < 1000)
];
Enter fullscreen mode Exit fullscreen mode

Or replacing some steps in existing:

pipeline[1] = map(x => x - 4);
Enter fullscreen mode Exit fullscreen mode

Arrays give you this flexibility to concatenate, merge, copy, and modify existing pipelines.

To finish the conveyor there should be some mechanism like moving belt that will transport a car from one operation from another. This is where Undercut tries to help (not mentioning a pack of 40+ prebuilt common operations like filter/map/skip/etc).

Core pull functions allow you to quickly run a pipeline and acquire the result or combine it into something self-contained and reusable like an Iterable.

Having a list of numbers called source:

const source = [1, 2, 3, 4, 5, 6, 7];
Enter fullscreen mode Exit fullscreen mode

And a pipeline of operations:

const pipeline = [
    skip(1),
    map(x => x - 3),
    filter(x => x !== 4),
    take(100),
];
Enter fullscreen mode Exit fullscreen mode

We could pull items out of the source through the pipeline and get an array of result items:

const result = pullArray(pipeline, source);
Enter fullscreen mode Exit fullscreen mode

In our case result will be:

[ -1, 0, 1, 2, 3 ]
Enter fullscreen mode Exit fullscreen mode

All is done lazily, so map won't run for the skipped item. There're also pullValue, if your result is a single value (not a sequence). Or more generic pull, where you pass target function getting result items and converting it into whatever you want.

As pull is built around Iterables, and many native objects are Iterable out of the box (arrays, strings, maps, sets, etc), you can easily transform a Map of Usernames-by-Id into an Object of Ids-by-Username.

const namesById = new Map([
    ["root", 0],
    ["sam", 1000],
    ["kate", 1004],
]);

const pipeline = [
    filter(entry => entry[0] > 0),
    map(entry => [entry[1], entry[0]]),
];

const idsByNameObj = pull(Object.fromEntries, pipeline, namesById);

// idsByNameObj == Object {"1000":"sam","1004":"kate"}
Enter fullscreen mode Exit fullscreen mode

Moreover, you may create a reusable view of this data:

const idsByName = pullLine(pipeline, source);
Enter fullscreen mode Exit fullscreen mode

The pullLine function binds together a pipeline and a source into an Iterable. Every time you iterate over it, the pipeline will be executed again, giving you a fresh view on processed data.

namesById.set("sam", 1111);

console.log(Object.fromEntries(idsByName)); // Object {"1111":"sam","1004":"kate"}
Enter fullscreen mode Exit fullscreen mode

Every operation is just a function, so you can create your own. Or even create a whole library of your own operations and reuse in different projects. The protocol, operations rely on, is similar to car-in/car-out, but instead of cars there're Iterables. An operation get an Iterable of items to process and return an Iterable of processed items. Returning an Iterable sounds complicated, but it isn't with JavaScript Generators.

Let's build a pow operation:

function* powOperation(iterable) {
    for (const item of iterable) {
        const newItem = Math.pow(item, exponent);

        yield newItem;
    }
}
Enter fullscreen mode Exit fullscreen mode

Get an Iterable, go by its items, calculate new values, put them into another iterable with yield.

If you aren't familiar with generators (functions marked with * asterisk). Basically, the return value of such function will be not what you return, but an implicit Iterable you can put items into with the yield keyword. Please read MDN for more detailed decsription. I also recommend reading an awesome book Exploring ES6 by Dr. Axel Rauschmayer.

Actually, one important aspect is missing. The exponent value isn't defined and should be assigned in a pipeline like those 17" wheels. To fix this just add another function around:

function pow(exponent) {
    function* powOperation(iterable) {
        for (const item of iterable) {
            const newItem = Math.pow(item, exponent);

            yield newItem;
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

And this pow we can actually use:

const source = [0, 1, 2];
const pipeline = [
    map(x => x + 1),
    pow(2),
];

const result = pullArray(pipeline, source);

console.log(result); // [1, 4, 9]
Enter fullscreen mode Exit fullscreen mode

It was only a brief review of the Undercut, but should be enough for basic use cases. If you want to learn more, please with undercut.js.org for documentation and tutorials.

Top comments (0)