loading...
Cover image for ES2018. Real-life simple usage of async iteration: get paginated data from REST APIs in 20 lines of code

ES2018. Real-life simple usage of async iteration: get paginated data from REST APIs in 20 lines of code

exacs profile image Carlos Saito Updated on ・4 min read

The next JavaScript standard, ES2018 is here and it comes with a new big feature: asynchronous iteration. It is a enormously useful feature and I want to share with you one super simple example on how can we use it in real life.

In this post I am NOT going to explain what are async iterators or iterators. You can get those explanations in here or here

The problem. We want to fetch data from an API that is returned paginated and do stuff with every page. For example, we want to fetch all the commits of a Github repo and do some stuff with those data.

We want to separate the logic of "fetching commits" and "do stuff", so we are going to use two functions. In a Real Life™ scenario, fetchCommits would be probably in a different module and the "do stuff" part will call fetchCommits somehow:

// Imagine that this function is in a different module...
function fetchCommits(repo) {}


function doStuff() {
  const commits = fetchCommits('facebook/react')
  // do something with `commits`
}

Now, Github API will return commits paginated (like most of the REST APIs) so we will fetch the commits "in batches". We want to implement this "pagination" logic somehow in fetchCommits.

However we don't want to return all the commits together in fetchCommits, we want to do some logic for each page when they come and implement such logic in the "do stuff" part.

Solution without async iteration

To do it, we were somehow forced to use callbacks:

// Here we "do stuff"
fetchCommits('facebook/react', commits => {
  // do something with `commits`
}

Can we use Promises?. Well, not in this way because we will get only one page or the whole thing:

function doStuff() {
  fetchCommits('facebook/react').then(commits => {
    // do something
  })
}

Can we use sync generators? Well... we could return a Promise in the generator and resolve that promise outside it.

// fetchCommits is a generator
for (let commitsPromise of fetchCommits('facebook/react')) {
  const commits = await commitsPromise
  // do something
}

This is actually a clean solution, but how is the implementation of the fetchCommits generator?

function* fetchCommits(repo) {
  const lastPage = 30 // Must be a known value
  const url = `https://api.github.com/${repo}/commits?per_page=10`

  let currentPage = 1
  while (currentPage <= lastPage) {
    // `fetch` returns a Promise. The generator is just yielding that one.
    yield fetch(url + '&page=' + currentPage)
    currentPage++
  }
}

Not bad solution but we have one big issue here: the lastPage value must be known in advance. This is often not possible since that value comes in the headers when we do a first request.

If we still want to use generators, then we can use an async function to get that value and return a sync generator...

async function fetchCommits (repo) {
  const url = `https://api.github.com/${repo}/commits?per_page=10`
  const response = await fetch(url)

  // Here we are calculating the last page...
  const last = parseLinkHeader(response.headers.link).last.url
  const lastPage = parseInt(
    last.split('?')[1].split('&').filter(q => q.indexOf('page') === 0)[0].split('=')[1]
  )

  // And this is the actual generator
  return function* () {
    let currentPage = 1
    while (currentPage <= lastPage) {
      // And this looks non dangerous but we are hard coding URLs!!
      yield fetch(url + '&page=' + currentPage)
      currentPage++
    }
  }
}

This is not a good solution since we are literally hard-coding the "next" URL.

Also the usage of this could be a bit confusing...

async function doStuff() {
  // Calling a function to get...
  const getIterator = await fetchCommits('facebook/react')

  // ... a function that returns an iterator???
  for (const commitsPromise of getIterator()) {
    const value = await commitsPromise
    // Do stuff...
  }
}

Optimally, we want to obtain the "next" URL after every request and that involves to put asynchronous logic in the generator but outside of the yielded value

Async generators (async function*) and for await loops

Now, async generators and asynchronous iteration allow us to iterate through structures where all the logic outside of the yielded value is also calculated asynchronously. It means that, for every API call we can guess the "next URL" based on the headers and also check if we reach the end.

In fact, this could be a real implementation:

(The example works in node >= 10)

const rp = require('request-promise')
const parseLinkHeader = require('parse-link-header')

async function* fetchCommits (repo) {
  let url = `https://api.github.com/${repo}/commits?per_page=10`

  while (url) {
    const response = await request(url, {
      headers: {'User-Agent': 'example.com'},
      json: true,
      resolveWithFullResponse: true
    })

    // We obtain the "next" url looking at the "link" header
    // And we need an async generator because the header is part of the response.
    const linkHeader = parseLinkHeader(response.headers.link)

    // if the "link header" is not present or doesn't have the "next" value,
    // "url" will be undefined and the loop will finish
    url = linkHeader && linkHeader.next && linkHeader.next.url
    yield response.body
  }
}

And the logic of the caller function gets also really simple:

async function start () {
  let total = 0
  const iterator = fetchCommits('facebook/react')

  // Here is the "for-await-of"
  for await (const commits of iterator) {
    // Do stuff with "commits" like printing the "total"
    total += commits.length
    console.log(total)

    // Or maybe throwing errors
    if (total > 100) {
      throw new Error('Manual Stop!')
    }
  }
  console.log('End')
}
start()

Do you have any other examples on how to use async generators?

Posted on by:

exacs profile

Carlos Saito

@exacs

Developer. I like JavaScript (specially since ES2015) and Elixir. I don't like python. I don't know why, maybe because I haven't tried it enough.

Discussion

markdown guide
 

why aren't you using the header link returned appositely by GitHub API to know the next page to load instead?

 

It is exactly what I'm doing :)

 

you are right, I actually somehow skipped the last url assignment. One possible improvement then, would be to fetch headers once, use URLSearchParams to get/set pages, and load all the pages at once in parallel, returning results as Promise.all(...)

That would be N pages at once, instead of N pages one after the other ;-)

Edit: my suggestion is based on the fact GitHub returns the last page too, but I guess for your article what you are doing is already good enough as example.

Thanks for the suggestion! Your solution would work perfectly :)

I don't think that my solution is valid for all scenarios but might be good sometimes. For example:

  • We want to fetch pages until we met some condition (for example, search the last 10 commits that says "refactor" in the text) where we don't want to fetch all pages
  • The API doesn't return the "last" page in the header