When working with APIs that handle large datasets, it's crucial to manage data flow efficiently and address challenges such as pagination, rate limits, and memory usage. In this article, we’ll walk through how to consume APIs using JavaScript's native fetch
function. We'll see important topics like:
- Handling huge amounts of data: retrieving large datasets incrementally to avoid overwhelming your system.
- Pagination: most APIs, including Storyblok Content Delivery API, return data in pages. We'll explore how to manage pagination for efficient data retrieval.
- Rate Limits: APIs often impose rate limits to prevent abuse. We'll see how to detect and handle these limits.
- Retry-After Mechanism: if the API responds with a 429 status code (Too Many Requests), we’ll implement the "Retry-After" mechanism, which indicates how long to wait before retrying to ensure smooth data fetching.
-
Concurrent Requests: fetching multiple pages in parallel can speed up the process. We’ll use JavaScript’s
Promise.all()
to send concurrent requests and boost performance. - Avoiding Memory Leaks: handling large datasets requires careful memory management. We’ll process data in chunks and ensure memory-efficient operations, thanks to generators.
We will explore these techniques using the Storyblok Content Delivery API and explain how to handle all these factors in JavaScript using fetch
. Let’s dive into the code.
Things to keep in mind when using the Storyblok Content Delivery API
Before diving into the code, here are a few key features of the Storyblok API to consider:
-
CV parameter: the
cv
(Content Version) parameter retrieves cached content. Thecv
value is returned in the first request and should be passed in subsequent requests to ensure the same cached version of the content is fetched. -
Pagination with
page
andper-page
: using thepage
andper_page
parameters to control the number of items returned in each request and to iterate through the results pages. -
Total Header: The first response's
total
header indicates the total number of items available. This is essential for calculating how many data pages need to be fetched. -
Handling 429 (Rate Limit): Storyblok enforces rate limits; when you hit them, the API returns a 429 status. Use the
Retry-After
header (or a default value) to know how long to wait before retrying the request.
JavaScript example code using fetch()
for handling large datasets
Here’s how I implemented these concepts using the native fetch function in JavaScript.
Consider that:
- This snippet creates a new file named
stories.json
as an example. If the file already exists, it will be overwritten. So, if you have a file with that name already in the working directory, change the name in the code snippet. - because the requests are executed in parallel, the order of the stories is not guaranteed. For example, if the response for the third page is faster than the response of the second request, the generators will deliver the stories of the third page before the stories of the second page.
- I tested the snippet with Bun :)
import { writeFile, appendFile } from "fs/promises";
// Read access token from Environment
const STORYBLOK_ACCESS_TOKEN = process.env.STORYBLOK_ACCESS_TOKEN;
// Read access token from Environment
const STORYBLOK_VERSION = process.env.STORYBLOK_VERSION;
/**
* Fetch a single page of data from the API,
* with retry logic for rate limits (HTTP 429).
*/
async function fetchPage(url, page, perPage, cv) {
let retryCount = 0;
// Max retry attempts
const maxRetries = 5;
while (retryCount <= maxRetries) {
try {
const response = await fetch(
`${url}&page=${page}&per_page=${perPage}&cv=${cv}`,
);
// Handle 429 Too Many Requests (Rate Limit)
if (response.status === 429) {
// Some APIs provides you the Retry-After in the header
// Retry After indicates how long to wait before retrying.
// Storyblok uses a fixed window counter (1 second window)
const retryAfter = response.headers.get("Retry-After") || 1;
console.log(response.headers,
`Rate limited on page ${page}. Retrying after ${retryAfter} seconds...`,
);
retryCount++;
// In the case of rate limit, waiting 1 second is enough.
// If not we will wait 2 second at the second tentative,
// in order to progressively slow down the retry requests
// setTimeout accept millisecond , so we have to use 1000 as multiplier
await new Promise((resolve) => setTimeout(resolve, retryAfter * 1000 * retryCount));
continue;
}
if (!response.ok) {
throw new Error(
`Failed to fetch page ${page}: HTTP ${response.status}`,
);
}
const data = await response.json();
// Return the stories data of the current page
return data.stories || [];
} catch (error) {
console.error(`Error fetching page ${page}: ${error.message}`);
return []; // Return an empty array if the request fails to not break the flow
}
}
console.error(`Failed to fetch page ${page} after ${maxRetries} attempts`);
return []; // If we hit the max retry limit, return an empty array
}
/**
* Fetch all data in parallel, processing pages in batches
* as a generators (the reason why we use the `*`)
*/
async function* fetchAllDataInParallel(
url,
perPage = 25,
numOfParallelRequests = 5,
) {
let currentPage = 1;
let totalPages = null;
// Fetch the first page to get:
// - the total entries (the `total` HTTP header)
// - the CV for caching (the `cv` atribute in the JSON response payload)
const firstResponse = await fetch(
`${url}&page=${currentPage}&per_page=${perPage}`,
);
if (!firstResponse.ok) {
console.log(`${url}&page=${currentPage}&per_page=${perPage}`);
console.log(firstResponse);
throw new Error(`Failed to fetch data: HTTP ${firstResponse.status}`);
}
console.timeLog("API", "After first response");
const firstData = await firstResponse.json();
const total = parseInt(firstResponse.headers.get("total"), 10) || 0;
totalPages = Math.ceil(total / perPage);
// Yield the stories from the first page
for (const story of firstData.stories) {
yield story;
}
const cv = firstData.cv;
console.log(`Total pages: ${totalPages}`);
console.log(`CV parameter for caching: ${cv}`);
currentPage++; // Start from the second page now
while (currentPage <= totalPages) {
// Get the list of pages to fetch in the current batch
const pagesToFetch = [];
for (
let i = 0;
i < numOfParallelRequests && currentPage <= totalPages;
i++
) {
pagesToFetch.push(currentPage);
currentPage++;
}
// Fetch the pages in parallel
const batchRequests = pagesToFetch.map((page) =>
fetchPage(url, page, perPage, firstData, cv),
);
// Wait for all requests in the batch to complete
const batchResults = await Promise.all(batchRequests);
console.timeLog("API", `Got ${batchResults.length} response`);
// Yield the stories from each batch of requests
for (let result of batchResults) {
for (const story of result) {
yield story;
}
}
console.log(`Fetched pages: ${pagesToFetch.join(", ")}`);
}
}
console.time("API");
const apiUrl = `https://api.storyblok.com/v2/cdn/stories?token=${STORYBLOK_ACCESS_TOKEN}&version=${STORYBLOK_VERSION}`;
//const apiUrl = `http://localhost:3000?token=${STORYBLOK_ACCESS_TOKEN}&version=${STORYBLOK_VERSION}`;
const stories = fetchAllDataInParallel(apiUrl, 25,7);
// Create an empty file (or overwrite if it exists) before appending
await writeFile('stories.json', '[', 'utf8'); // Start the JSON array
let i = 0;
for await (const story of stories) {
i++;
console.log(story.name);
// If it's not the first story, add a comma to separate JSON objects
if (i > 1) {
await appendFile('stories.json', ',', 'utf8');
}
// Append the current story to the file
await appendFile('stories.json', JSON.stringify(story, null, 2), 'utf8');
}
// Close the JSON array in the file
await appendFile('stories.json', ']', 'utf8'); // End the JSON array
console.log(`Total Stories: ${i}`);
Key Steps Explained
Here’s a breakdown of the crucial steps in the code that ensure efficient and reliable API consumption using the Storyblok Content Delivery API:
1) Fetching pages with retries mechanism (fetchPage
)
This function handles fetching a single page of data from the API. It includes logic for retrying when the API responds with a 429 (Too Many Requests) status, which signals that the rate limit has been exceeded.
The retryAfter
value specifies how long to wait before retrying. I use setTimeout
to pause before making the subsequent request, and retries are limited to a maximum of 5 attempts.
2) Initial page request and the CV parameter
The first API request is crucial because it retrieves the total
header (which indicates the total number of stories) and the cv
parameter (used for caching).
You can use the total
header to calculate the total number of pages required, and the cv
parameter ensures the cached content is used.
3) Handling pagination
Pagination is managed using the page
and per_page
query string parameters. The code requests 25 stories per page (you can adjust this), and the total
header helps calculate how many pages need to be fetched.
The code fetches stories in batches of up to 7 (you can adjust this) parallel requests at a time to improve performance without overwhelming the API.
4) Concurrent requests with Promise.all()
:
To speed up the process, multiple pages are fetched in parallel using JavaScript's Promise.all()
. This method sends several requests simultaneously and waits for all of them to complete.
After each batch of parallel requests is completed, the results are processed to yield the stories. This avoids loading all the data into memory at once, reducing memory consumption.
5) Memory management with asynchronous iteration (for await...of
):
Instead of collecting all data into an array, we use JavaScript Generators (function*
and for await...of
) to process each story as it is fetched. This prevents memory overload when handling large datasets.
By yielding the stories one by one, the code remains efficient and avoids memory leaks.
6) Rate limit handling:
If the API responds with a 429
status code (rate-limited), the script uses the retryAfter
value. It then pauses for the specified time before retrying the request. This ensures compliance with API rate limits and avoids sending too many requests too quickly.
Conclusion
In this article, We covered the key considerations when consuming APIs in JavaScript using the native fetch
function. I try to handle:
- Large datasets: fetching large datasets using pagination.
-
Pagination: managing pagination with
page
andper_page
parameters. - Rate limits and retry mechanism: handling rate limits and retrying requests after the appropriate delay.
-
Concurrent requests: fetching pages in parallel using JavaScript’s
Promise.all()
to speed up data retrieval. -
Memory management: using JavaScript Generators (
function*
andfor await...of
) to process data without consuming excessive memory.
By applying these techniques, you can handle API consumption in a scalable, efficient, and memory-safe way.
Feel free to drop your comments/feedback.
Top comments (4)
This is an ongoing problem I have with one client. the solution was so creative and successful , I've got to share.
So, this client has a 20mb dataset that a crew of 30 people use all day to manage orders and schedules and equipment. It's a requirement that they be able to filter, pivot, sort, etc.... typing in cells like a gigantic google-sheet, the load was insane.
So, we compile all of the active data into a fat json file, upload it to Google storage at 5am. When logging in, the browser asks for a signed link to the file and downloads it directly into IndexedDB. Then, it calls an api requesting changes since the latest updated record in a second API call. While doing that, it populates AG-Grid (an amazing tool). Then, it has the entire database of active projects locally.
But it gets better... We have a webaocket that announced when a record has been updated by someone. So, the browser sees the update and says, "OK, I'd better ask for the updates since I've updated last". So the request is never more than one or two records, but.... Everyone has a bunch of tabs open, so all the browser windows were calling for every single update...... BUT you can actually use "Broadcast " to do like a websocket between chrome tabs or Edge tabs from the same domain. IndexedDB is shared between all tabs of the same domain. So we have just one tab do the requests and updating of IndexedDB then Broadcast to the others that they should sync up with IndexedDB data instead of calling the server.
So we
(And, that last one is still in beta, but the rest has been running like that for a couple years. It's been a hell of a ride getting this client happy!)
Thanks for sharing 🙏 I'm working on a open source project and I'll try to include it there
Hi, thank you for your message. My suggestion is to handle the errors better. For example here I return empty content in case of error (an error different from 429, for example 500 or something else). Probably in a real case you should raise an exception or something like that. It depends how you want to handle errors/exceptions.
And another thing, already mentioned in the article, because the concurrent requests, this logic doesn't guarantee the sorting of the contents (you should add a simple logic to track the positions)
EchoAPI offers all the essential features I need for effective API testing, from authentication to detailed request management.