DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

How to generate website thumbnails at scale (for directories, CMSs, and link previews)

How to Generate Website Thumbnails at Scale

Link preview services, directory sites, bookmark managers, and CMS tools all need website thumbnails. The naive implementation — one Puppeteer instance, one screenshot at a time — doesn't survive contact with a queue of 500 URLs.

Here's the pattern that does.

Single thumbnail

import fs from 'fs';

const response = await fetch('https://api.pagebolt.dev/v1/screenshot', {
  method: 'POST',
  headers: {
    'x-api-key': process.env.PAGEBOLT_API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://example.com',
    viewport: { width: 1280, height: 800 },
    clip: { x: 0, y: 0, width: 1280, height: 800 },
    blockBanners: true,
    blockAds: true
  })
});

fs.writeFileSync('thumbnail.png', Buffer.from(await response.arrayBuffer()));
Enter fullscreen mode Exit fullscreen mode

blockBanners and blockAds are worth enabling for directory thumbnails — they produce cleaner captures that represent the actual site rather than a cookie wall.

Batch thumbnails in parallel

For processing a list of URLs, fire requests concurrently with a concurrency limit to stay within rate bounds:

import fs from 'fs/promises';
import path from 'path';

const urls = [
  'https://linear.app',
  'https://vercel.com',
  'https://railway.app',
  // ... hundreds more
];

async function thumbnail(url) {
  const res = await fetch('https://api.pagebolt.dev/v1/screenshot', {
    method: 'POST',
    headers: {
      'x-api-key': process.env.PAGEBOLT_API_KEY,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url, blockBanners: true, blockAds: true })
  });

  const slug = new URL(url).hostname.replace(/\./g, '-');
  await fs.writeFile(`thumbnails/${slug}.png`, Buffer.from(await res.arrayBuffer()));
  console.log(`✓ ${url}`);
}

// Process in batches of 10 concurrent requests
async function batch(urls, concurrency = 10) {
  for (let i = 0; i < urls.length; i += concurrency) {
    await Promise.all(urls.slice(i, i + concurrency).map(thumbnail));
  }
}

await fs.mkdir('thumbnails', { recursive: true });
await batch(urls);
Enter fullscreen mode Exit fullscreen mode

10 concurrent requests processes a 500-URL list in roughly the same time one self-hosted Puppeteer instance takes to do 10.

Mobile thumbnails

For directories that want mobile previews alongside desktop:

body: JSON.stringify({
  url,
  viewportDevice: 'iphone_14_pro',
  blockBanners: true
})
Enter fullscreen mode Exit fullscreen mode

Same call, different preset. Run both in parallel per URL to generate a desktop+mobile pair.

Handling failures gracefully

Some URLs will time out, return 4xx, or trigger bot detection. Wrap each call:

async function thumbnail(url) {
  try {
    const res = await fetch('https://api.pagebolt.dev/v1/screenshot', {
      method: 'POST',
      headers: { 'x-api-key': process.env.PAGEBOLT_API_KEY, 'Content-Type': 'application/json' },
      body: JSON.stringify({ url, blockBanners: true, stealth: true })
    });

    if (!res.ok) {
      console.warn(`⚠ ${url}${res.status}`);
      return;
    }

    const slug = new URL(url).hostname.replace(/\./g, '-');
    await fs.writeFile(`thumbnails/${slug}.png`, Buffer.from(await res.arrayBuffer()));
  } catch (err) {
    console.warn(`⚠ ${url}${err.message}`);
  }
}
Enter fullscreen mode Exit fullscreen mode

stealth: true masks the headless browser fingerprint — useful when crawling a diverse list that includes sites with aggressive bot detection.

On-demand via API route

For a link preview service that generates thumbnails on request and caches:

// GET /thumbnail?url=https://example.com
app.get('/thumbnail', async (req, res) => {
  const { url } = req.query;

  const upstream = await fetch('https://api.pagebolt.dev/v1/screenshot', {
    method: 'POST',
    headers: { 'x-api-key': process.env.PAGEBOLT_API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ url, blockBanners: true })
  });

  res.setHeader('Content-Type', 'image/png');
  res.setHeader('Cache-Control', 'public, max-age=86400');
  upstream.body.pipe(res);
});
Enter fullscreen mode Exit fullscreen mode

Cache at the CDN layer — one API call per unique URL, served from cache on every repeat request.


No browser pool. No Chromium binary. No queue management. Just a fetch call per URL, parallelized as aggressively as your API plan allows.

Free tier: 100 requests/month. → pagebolt.dev

Top comments (0)