DEV Community

loading...

Improving Javascript functional programming with Polyethylene

danielescoz profile image Daniel Escoz ・9 min read

If you've worked with Javascript and arrays, you've surely used some of the functional utilities packed with them: filter, map, reduce and others. They're incredibly useful tools in a ton of situations, but come with a few notable drawbacks:

  • Every call to a functional method returns a new Array. This creates unnecessary intermediate arrays, wasting time and memory.
  • These utilities are only available for Arrays. This wasn't a problem a few years ago, but with the introduction of Symbol.iterator and for...of it's now insufficient.
  • There is no support whatsoever for asynchronous operations. No callbacks, no promises, no events, no nothing: Your code must be sync and your data must be already in memory.

Thanks to for..of we can solve all of these issues by kind of re-implementing the methods ourselves, catered to each situation, but that defeats the point of having functional utilities in the first place. What can we do?

Here comes Polyethylene to the rescue. Polyethylene can solve all of the above problems, and a few more you didn't know you had. Let's see how one by one, and I'll expand later.

But first, a disclaimer: I am the author of Polyethylene, so take everything I say here with that in mind.

Also, all code you'll see here will assume you are importing Polyethylene as follows:

const Poly = require('polyethylene');
Enter fullscreen mode Exit fullscreen mode

That's all you need to know, let's get to business!

Saving on Array copies

As Polyethylene objects are pure generators, no time or space will be spent in storing intermediate results of a chain of functional calls. This can make long arrays much faster to process.

Let's use an example. Let's say we have a list with peoples names, country codes, and ages. We want to find out the average age of those living in Spain:

const people = [{name: 'Dani', country: 'ES', age: 27}, /* more people */];

const {age, num} = people
  .filter(person => person.country === 'ES') // filter by country
  .map(person => person.age) // we're only interested in their age
  .reduce( // find total age and number of people
    (acc, age) => ({age: acc.age + age, num: acc.num + 1}),
    {age: 0, num: 0}
  );

const avgAge = age / num; // we have the average now!
Enter fullscreen mode Exit fullscreen mode

Note: I know the map stage isn't necessary, but it's there for illustration purposes

If we run that code, we'll find out the average age of all Spanish people in the dataset. Simple, right? The problem arises if our dataset is not a single person or even a few hundred, but thousands or millions. Because we're creating arrays on each step, we have to spend time and space to store and fill all of those arrays. We can adapt this code to Polyethylene in one easy step: wrap our array in a Polyethylene object:

const Poly = require('polyethylene');
const people = [{name: 'Dani', country: 'ES', age: 27}, /* more people */];

const {age, num} = Poly.from(people)
  .filter(person => person.country === 'ES') // filter by country
  .map(person => person.age) // we're only interested in their age
  .reduce( // find total age and number of people
    (acc, age) => ({age: acc.age + age, num: acc.num + 1}),
    {age: 0, num: 0}
  );

const avgAge = age / num; // we have the average now!
Enter fullscreen mode Exit fullscreen mode

The only change is that, when starting our functional chain, we've wrapped our array as Poly.from(people). This will create a polyethylene Iterable object which can be used for functional chains like that. The difference, however, is that no intermediate array will be created, ever.

In a toy example like this, when measuring with about a million people, I noticed around a 10% time reduction. However, I created the dataset by repeating the same 1000 people 1000 times, storing it in an array and only then using Polyethylene. But it turns out we can also do that with Polyethylene!

/* Array-only version */
const repeatedPeople = Array(1000).fill().flatMap(() => somePeople)

/* Polyethylene version */
const repeatedPeople = Poly.range(1000).flatMap(() => somePeople)
Enter fullscreen mode Exit fullscreen mode

In both cases, we will end up with an iterable of a million people, but in the second case no array with a million entries is ever created. I then repeated my experiment and increased the amount of repetitions:

Amount 1000 5000 10000 50000 100000
Array 212ms 1123ms 2190ms 10350ms CRASH
Poly 84ms 380ms 749ms 3671ms 7446ms

As you see, Polyethylene is way faster when it comes to very big datasets. This is especially true in this case as, with arrays, we need to first build the dataset, then process it. As you can also see, with 100 million entires, the array version simply crashed: it ran out of memory. The Polyethylene version might take a very long time, but it will never crash because of that.

Note that this isn't always true, for small arrays Polyethylene can actually be slower due to the overhead of generators and possibly because of caching. Performance is not a goal of Polyethylene, though, just a nice side-effect.

Using functional utilities in iterables other than Arrays

Now we enter the realm of what you can't do without Polyethylene. In this case, it's doing functional stuff on non-Array iterables.

To exemplify this we're going to use maths. Let's say we want to find the first 100 happy numbers:

const first100HappyNums = Poly.range(1, Infinity)
  .filter(isHappy) // assume we already have an `isHappy` function
  .take(100)
  .toArray();
Enter fullscreen mode Exit fullscreen mode

Let's go step by step:

  • Poly.range(1, Infnity) iterates over all number beteen 1 and Infinity. This is, as you can imagine, an infinite iteration, which we can handle because of later restrictions
  • .filter(isHappy) will only leave those numbers that are happy, assuming the isHappy function works correctly. This will still be infinite, but much less dense.
  • .take(100) will result in a finite iteration with only the first 100 elements. Because we already have only happy numbers, these will be the first 100 happy numbers.
  • .toArray() will finally gather all elements and return an array.

As you see, doing this with functional utilities would be impossible with arrays. Polyethylene has, therefore, filled a gap in functionality.

You don't need to have infinite iterations to make this work, though. Poly.from works with any iterable, so you could use a Set, a Buffer, or any other object that implements he iterator interface.

But again, we're just scratching the surface of what Polyethylene can do...

Using async callbacks and async iterables

We have only used synchronous functions, but Polyethylene can also handle async functions as callbacks. To dfo that, though, we need to first convert the Iterable to an AsyncIterable by calling .sacync() in our chain. From that point on, everything is async.

Let's use an example. Let's say we have a list of cities and we want to know their weather forecast. I'm going to use request-promise to make calls to MetaWeather, so you can also try this without having to sign up anywhere.

First, let's define functions to query our API:

const reqProm = require('request-promise');

async function searchLocation (query) {
  return reqProm({
    uri: 'https://www.metaweather.com/api/location/search',
    qs: {query},
    json: true,
  });
}

async function getWeather (id) {
  const response = await reqProm({
    uri: `https://www.metaweather.com/api/location/${id}`,
    json: true,
  });

  return response.consolidated_weather;
}
Enter fullscreen mode Exit fullscreen mode

Let's say that we want to print, for each city in our list, the min and max temperatures for today; if our city query matches multiple location, we'll print it multiple times. If we had to do it without Polyethylene, this is how I'd approach it:

const today = new Date().toISOString().split('T')[0];
const cities = ['madrid', 'san']; // 'san' will yield 11 results

for (const city of cities) {
  const searchResult = await searchLocation(city);

  for (const location of searchResult) {
    const weatherList = await getWeather(location.woeid);
    const todaysWeather = weatherList.find(w => w.applicable_date === today);
    console.log('%s: %s, %s', location.title, todaysWeather.min_temp, todaysWeather.max_temp);
  }
}
Enter fullscreen mode Exit fullscreen mode

Not so bad, although it will get complicated if we ever need more steps.
Polyethylene lets us do it in a more streamlined way, although with one caveat that we'll mention:

const today = new Date().toISOString().split('T')[0];
const cities = ['madrid', 'san'];

Poly.from(cities)
  .async()
  .flatMap(searchLocation)
  .flatMap(async (loc) => (await getWeather(loc.woeid))
    .map(w => ({city: loc.title, ...w}))
  )
  .filter(res => res.applicable_date === today)
  .forEach(res => console.log('%s: %s, %s', res.city, res.min_temp, res.max_temp));
Enter fullscreen mode Exit fullscreen mode

The only weird thing is on the second .flatMap, where we need to inject the city name with a nested map in order to have it later. We didn't need that in the previous example because of the natural nesting of the code. This is to show that Polyethylene isn't perfect, and sometimes we need to adapt the code for it to work.

As you see, we've been able to use async functions for the flatMap calls. We could have used them too for filter or forEach. All of that is possible thanks to the .async() call, if we didn't use that our iterator would have been synchronous and nothing would have worked.

But that's not all, one of the best things of Polyethylene is its ability to work directly with async iterables. An example that I like a lot is to load data from Reddit in pages. Let's say we want to list the top 100 posts from a given subreddit that aren't stickies and are text posts (type self). An approach might be:

const reqProm = require('request-promise');

async function getRedditPage (subreddit, {limit = 50, before, after} = {}) {
  return reqProm({
    uri: `https://reddit.com/r/${subreddit}.json`,
    qs: {limit, before, after},
    json: true,
  });
}

const WANTED = 50;
const posts = [];
let after = null;

while (posts.length < WANTED) {
  const page = await getRedditPage('factorio', {limit: 100, after});

  posts.push(...page.data.children.filter(post => !post.data.stickied && 
  post.data.post_hint === 'self'));
  after = page.data.after;
}

posts.slice(0, WANTED)
  .forEach((post, i) => console.log('[%s]', post.data.name, post.data.title))
Enter fullscreen mode Exit fullscreen mode

It's a bit cumbersome as we need the loop and all that adding to an array for it to work. but the main problem is that it is really hard to make it reusable, as the total number of items we're loading is unknown thanks to the filter, so we need to go page by page.

With Polyethylene we could create a function that first lists all posts from that subreddit, and then we filter and print them. We can use iterate for this:

function listSubreddit (subreddit) {
  return Poly.iterate(async ({done, after}) => {
    if (done) {
      return {done, posts: []};
    }

    const result = await getRedditPage(subreddit, after);
    return {
      after: result.data.after,
      posts: result.data.children,
      done: after == null,
    };
  }, {done: false})
    .flatMap(({posts}) => posts)
    .map(post => post.data);
}

listSubreddit('factorio')
  .filter(post => !post.stickied && post.post_hint === 'self')
  .take(100)
  .forEach((post, i) => console.log('[%s]', post.name, post.title));
Enter fullscreen mode Exit fullscreen mode

That needs some explanation. The Poly.iterate method creates an iterable by repeatedly calling the passed function infinitely, passing as an argument the last element (and the second argument to iterate for the initial value). We use these properties to pass back the after field and a done flag that indicates if the pages were exhausted, as well as passing the posts forward. then, we flatten the posts and get their data property.

That function can then be called for whatever subreddit and you'll get a list with all the posts, plain and simple. We call it, filter with our condition, take only the first 100 and print them. Easy peasy.

Beyond functional utilities: prefetch / preload

But wait, there's more!

One last trick up our sleeve is preload and prefetch. These are two options you can pass to any stage of an async iteration, and magic will ensue:

  • If preload is on, the first element of that stage will be produced as soon as possible. This will make sure it will be available right away if the iterable object takes a while to be iterated. This is not very useful most of the time, as you'll likely iterate right away, though.
  • If prefetch is on, the next element of the iteration will be requested before frocessing the current one. This means that, if you have a long processing after a stage, the next element will be available as it will be produced in parallel.

These two options can speed up the aggregate processing time on a chain, as they allow for parallelization, but aren't active by default as they will request more elements than necessary if you use limiting stages.


That was a long post.

So, that's Polyethylene. It's a bit of a toy project I started a while ago, but I think it can be really useful, especially the async bits. I'm still thinking of improvements and everybody is welcome to contribute with ideas, suggestions, bug reports, criticism and, of course, code.

Find Polyethylene in npm and GitHub.

Discussion (7)

pic
Editor guide
Collapse
lexlohr profile image
Alex Lohr

Cool project, though I would argue that you could also improve the performance of the native methods by merging filter, map, etc. to a single reduce method call.

It should even be possible to automate that kind of optimization using Babel or a similar transpiler.

Collapse
avalander profile image
Avalander

To be honest, I'm baffled that Javascript runtimes don't do any kind of optimisation already.

Collapse
lexlohr profile image
Alex Lohr

I think they first aimed to optimize the single methods instead for optimizing the combinations for the simple reason that improving the performance for a single method directly improved the performance of every combination it was used in whereas you have many more combinations to improve to yield effects.

Collapse
avalander profile image
Avalander

That's an interesting take, thanks for sharing! Out of curiosity, why did you choose chaining methods instead of piping functions? (approach taken by Rambda and RxJS 6, for instance)

Here's an example of what I mean.

const averageSpaniards = Poly.pipe(
  Poly.filter(({ country }) => country === 'ES'),
  Poly.map(({ age }) => age),
  Poly.reduce(({ age, num }, x) => ({ age: age + x, num: num + 1 }))
)

averageSpaniards(worldPopulation)
Collapse
danielescoz profile image
Daniel Escoz Author

Hm... I hadn't though about this, and it's interesting to me. The main reason to use chainable methods was to mimic arrays, as extending array functions to generators is how this idea started.

For what you propose,

  • how would you discriminate between sync and async iterators?
  • what would the pipeline stages receive and return?
Collapse
avalander profile image
Avalander

The main reason to use chainable methods was to mimic arrays, as extending array functions to generators is how this idea started.

Fair enough, familiarity is always a bonus :)

how would you discriminate between sync and async iterators?

Either take MojiScript's idea and make all pipelines async anyway, or provide a pipeAsync function that would be equivalent to your .async().

what would the pipeline stages receive and return?

I'd say there wouldn't be any intermediate stages. The pipe would be created with a list of functions, and then it would be given and array and apply all operations to each element of the array at once, effectively iterating over the array only once.

But I think that's more a matter of API ergonomics than actual differences in behaviour, you can probably achieve similar results either way.