DEV Community

Cover image for Retrieving results from cursor-paginated endpoints in Elixir
Coletiv for Coletiv Studio

Posted on • Originally published at coletiv.com

Retrieving results from cursor-paginated endpoints in Elixir

If you’re familiar with using external API calls for your coding needs, there’s probably been a time when you wanted to retrieve all the results from a certain endpoint, but you only managed to receive part of them 😧


Paginating the results of API endpoints has become a common practice, since without it, the response could be pretty big, increasing both the load time and size of the payload, and rarely are all the results needed straight ahead. Most of the time, an application just wants to retrieve a small part of them to show to the users.

But what about when you REALLY need the whole list of results, but the request is paginated and has a maximum limit of items per page? And what if the pagination is cursor-based?

This has been a recurring issue — retrieving all the results from limited, cursor-paginated endpoints — and we’ve come across it on many projects. And since cursor-based pagination is becoming so frequent in APIs, we thought we’d share our approach on how to deal with calling these endpoints in Elixir, our favorite coding language 😃

1. Components of a Paginated API call

Just as a quick reminder, if you are not familiar with this — a paginated REST API call will be a GET request, usually with the following parameters:


Limit — How many results the page will return. Most of the time, it has an upper limit. This article focuses on endpoints that have this upper limit, but if you find out a request that does not feel free to pull out your 🔨 and make the limit as high as you need!


Page/Cursor — This parameter states the offset of the results. For page-based pagination, it represents the page number. For cursor-based pagination, it is an ID. Cursor based pagination usually works by ordering the results and retrieving the ones after or before using their ID as the parameter.

2. Iterating through the pages

Let’s assume you want a list of all the results, starting from the first result of the first page to the last result from the last page.


With page-based pagination… you normally start on page 1 and keep summing 1 to the page until you reach an empty page, meaning you’ve gotten all the results. Example link:

www.somewebsite.com/api/somerequest/someitems?limit=50&page=1

However, with cursor-based pagination… you do not have a page number to increment. Each request response gives you the link or the parameters to make the subsequent extra requests you need, in order to list additional items. An example of a request link would be:

www.somewebsite.com/api/somerequest/someitems?limit=50&page_after=2AFA8A72R

Since for cursor-based pagination you need the response of a request to make the next requests, you might be wondering: what’s a good way to do that cleanly? 🤔


We have 2 easy-to-understand solutions that iterate through all of the pages to get you a complete list of queried items. Whichever you want to use is up to you.

2.A Streams

We want to be able to iterate through all the result pages, and we know each page gives us the parameters for the next.


We solved this by using the Stream.unfold function, which is really useful and efficient in iterations where you do not have an initial Enumerable to iterate upon (otherwise, we could have used Enum.reduce_while). You should probably read the documentation on this function before proceeding.


In each iteration, the unfold function saves the result into the stream, and passes on an accumulator to the next iteration, effectively being able to make one API call in each iteration and using the accumulator to pass the URL for the next call.

defp get_all_results() do
  initial_url = "www.somewebsite.com/api/somerequest/someitems?limit=50"
  get_results(initial_url)
end

def get_results(initial_url) do
  Stream.unfold(initial_url, fn
    nil ->
      nil

    url ->
      case some_api_call(url) do
        {:ok, response} ->
          next_url = extract_next_url_from_response(response)
          results_from_page = response["results"]

          {results_from_page, next_url}

        {:error, error} = error ->
          {error, nil}
      end
  end)
  |> Enum.to_list()
  |> List.flatten()
  |> Enum.find(results, {:ok, results}, fn
    {:error, _error} -> true
    _ -> false
  end)
end

At the end, when there are no more URLs to call from, the Stream unfold named captureends, and we transform the returned stream, into a list, appending all its partial results.


If any of the results is an error, it can be easily found and you can then decide what to do with the remaining results, which is the part I like the most about this solution.

2.B Recursive

It’s a functional approach, and pretty easy to understand, you’ve probably dealt with recursive calls like these in the past. However, not everyone likes them, which is why we also wrote the alternative using Streams.

defp get_all_results() do
  initial_url = "www.somewebsite.com/api/somerequest/someitems?limit=50"
  get_results(initial_url)
end

def get_results(nil, accumulated_results \\ []) do
  # No next URL, return the accumulated results so far
  {:ok, accumulated_results}
end

def get_results(url, accumulated_results \\ []) do
  case some_api_call(url) do
    {:ok, response} -> 
      next_url = extract_next_url_from_response(response)
      results_from_page = response["results"]
      get_results(next_url, accumulated_results ++ results_from_page)

    {:error, error} -> 
      # *** You decide how to handle it here and what to return ***
      {:error, error_reason}
  end
end

In this solution, we simply pass the initial URL to the function, and then let it retrieve both the results and next URL from the API call.


From here on, it’s as simple as the function calling itself again with the new URL and accumulating the results by appending them in the argument.


The function terminates if there are no further URLs to make calls from, from which point you can decide if you want to return an error if you want to include the partial list of results or whatever most benefits you.

Any of these 2 options work really well and are pretty easy to implement, so feel free to choose the one you’re most comfortable with.

3. Cursor based pagination — how to retrieve the next or previous page

Most of the time, the request-response for a page will have the links to the next or previous page into its response body. If so, you’re in luck 🍀, as it should be relatively easy to retrieve! If it is a JSON, just parse it and access the correct variable.


However, sometimes, these links are embedded into the header and you might have to parse them 😒. Here’s an example I encountered:

Header key: Link
Header value: <https://somepage.com/api/somerequest.json?page_cursor=FGHIJ&limit=50>; rel=”next”, <https://somepage.com/api/somerequest.json?page_info=ABCDE&limit=50>; rel=”previous”

Extracting these links is a 2-step operation: extracting the header value, and parsing it.

3.A Extracting the header value

I usually use HTTPoison, in which the request header is a list of tuples. The first element of the tuple is the header name and the second is the value. Knowing this, finding the Link header field is easy:

{header_name, header_value} = List.keyfind(headers, “Link”, 0, {nil, nil})

3.B Parsing the string to obtain the links

The fastest and most accurate way that I found to do this is with a regular expression. Most specifically, in Elixir, using a named capture. Imagining that you only want the link for the next page, you could use this regex:

next_link_regex = ~r/<(?<next>[^>]*)>; rel="next"/

Now just capture this regex:

Regex.named_captures(next_link_regex, header_value)

In any case, regardless of where these links are placed, do not forget to always verify if the link for the next page exists and can be correctly parsed. If the header does not exist or the body does not possess the next page, maybe it is because you are already on the last page.


You should always check the API reference for whichever API you are accessing to make sure since it usually has better guidelines on how to follow their pagination rules.

Final notes 📓

Cursor-based pagination might look annoying, but it brings a better and more accurate alternative to the normal page-based pagination, so it is probably here to stay. As such, if you use external APIs in Elixir, you will probably come into contact with it a lot.


Also, feel free to use any of the code shared here and adjust it to your own needs. We hope these simple and intuitive solutions are of help in dealing with this.

Thank you for reading!

Thank you so much for reading, it means a lot to us! Also don’t forget to follow Coletiv on Twitter and LinkedIn as we keep posting more and more interesting articles on multiple technologies.

In case you don’t know, Coletiv is a software development studio from Porto specialized in Elixir, Web, and App (iOS & Android) development. But we do all kinds of stuff. We take care of UX/UI design, software development, and even security for you.

So, let’s craft something together?

Top comments (2)

Collapse
 
exit9 profile image
Alexey Novoselov

get_results(next_url, accumulated_results ++ results_from_page) would be too expensive for large datasets. [results_from_page | accumulated_results] will not iterate over all the results at each step.

Collapse
 
pedromlcosta profile image
Pedro Costa • Edited

Thanks for the suggestion, Alexey!

Yes, this should be an efficiency improvement, although, if the ordering of the results matters, they may need some extra handling.

In case this were needed, I'd either order all the results at the end if there's a key for it, or reverse the results_from_page each time they are fetched, so that we have an 'end to beginning' ordered list).