How to Take Screenshots in AWS Lambda (Without Puppeteer Crashing)

#aws #lambda #serverless #node

How to Take Screenshots in AWS Lambda (Without Puppeteer Crashing)

You're building a serverless app. A user triggers a Lambda to generate a screenshot. Puppeteer launches... and crashes.

Why?

Lambda's constraints:

No native Chromium — You need custom layers. One layer is 50–100MB. Uncompressed, Chromium is 200MB+.
Memory limits — Each Puppeteer instance eats 150–300MB. Two simultaneous requests = your Lambda runs out of memory.
Timeout hell — Puppeteer cold starts take 5–10 seconds. Your Lambda times out at 15 seconds by default.
Layer management — Keeping Chromium binaries up-to-date across deployments is painful.
Cost — You pay for CPU time for every second the browser is spinning up and running.

Self-hosted Puppeteer on Lambda is expensive and unreliable.

There's a simpler pattern: replace Puppeteer with an API call.

One HTTP request. Screenshot back in 1–2 seconds. No Chromium. No layers. No crashes.

Here's how to take screenshots in AWS Lambda without managing a browser.

The Problem: Puppeteer in Lambda Is Fragile

A typical Lambda screenshot function looks like this:

// Puppeteer on Lambda: fragile and slow
const puppeteer = require('puppeteer');

exports.handler = async (event) => {
  let browser;
  try {
    browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });

    const page = await browser.newPage();
    await page.goto(event.url, { waitUntil: 'networkidle2' });
    const screenshot = await page.screenshot();

    return {
      statusCode: 200,
      body: JSON.stringify({ message: 'Screenshot taken' })
    };
  } catch (error) {
    console.error(error);
    return {
      statusCode: 500,
      body: JSON.stringify({ error: error.message })
    };
  } finally {
    if (browser) await browser.close();
  }
};

What goes wrong:

Cold start — First invocation takes 10+ seconds (Chromium launch + layer extraction)
Memory spike — Puppeteer + Chromium consumes 200–400MB, limiting concurrency
Timeout — If navigation is slow, the function times out before returning
Layer management — You're responsible for keeping Chromium binaries up-to-date
Crashes — Puppeteer sometimes hangs or crashes on Lambda due to signal handling

The Solution: Screenshot API Instead

Replace Puppeteer with a simple HTTP POST:

// AWS Lambda with PageBolt API: simple and reliable
const https = require('https');

exports.handler = async (event) => {
  const { url } = event;

  if (!url) {
    return {
      statusCode: 400,
      body: JSON.stringify({ error: 'url is required' })
    };
  }

  try {
    const result = await takeScreenshot(url);

    return {
      statusCode: 200,
      body: JSON.stringify({
        message: 'Screenshot taken',
        url: result.url
      })
    };
  } catch (error) {
    console.error(error);
    return {
      statusCode: 500,
      body: JSON.stringify({ error: error.message })
    };
  }
};

function takeScreenshot(url) {
  return new Promise((resolve, reject) => {
    const payload = JSON.stringify({
      url: url,
      format: 'png',
      width: 1280,
      height: 720,
      fullPage: false
    });

    const options = {
      hostname: 'api.pagebolt.dev',
      path: '/v1/screenshot',
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.PAGEBOLT_API_KEY}`,
        'Content-Type': 'application/json',
        'Content-Length': Buffer.byteLength(payload)
      }
    };

    const req = https.request(options, (res) => {
      let data = Buffer.alloc(0);

      res.on('data', (chunk) => {
        data = Buffer.concat([data, chunk]);
      });

      res.on('end', () => {
        if (res.statusCode === 200) {
          resolve({ url: url, statusCode: 200 });
        } else {
          reject(new Error(`API error ${res.statusCode}`));
        }
      });
    });

    req.on('error', reject);
    req.write(payload);
    req.end();
  });
}

That's it. No Puppeteer. No Chromium layer. No crashes.

Complete Serverless Example: Save Screenshots to S3

Here's a production-ready Lambda that takes a screenshot and uploads it to S3:

// lambda-screenshot-s3.js
const https = require('https');
const AWS = require('aws-sdk');

const s3 = new AWS.S3();

exports.handler = async (event) => {
  const { url, bucket, key } = event;

  // Validate input
  if (!url || !bucket || !key) {
    return {
      statusCode: 400,
      body: JSON.stringify({
        error: 'url, bucket, and key are required'
      })
    };
  }

  try {
    // 1. Take screenshot via API
    const screenshotBuffer = await takeScreenshot(url);

    // 2. Upload to S3
    const s3Params = {
      Bucket: bucket,
      Key: key,
      Body: screenshotBuffer,
      ContentType: 'image/png'
    };

    await s3.upload(s3Params).promise();

    // 3. Generate S3 URL
    const s3Url = `https://${bucket}.s3.amazonaws.com/${key}`;

    return {
      statusCode: 200,
      body: JSON.stringify({
        message: 'Screenshot saved to S3',
        url: s3Url,
        s3Key: key
      })
    };
  } catch (error) {
    console.error('Error:', error);
    return {
      statusCode: 500,
      body: JSON.stringify({ error: error.message })
    };
  }
};

function takeScreenshot(url) {
  return new Promise((resolve, reject) => {
    const payload = JSON.stringify({
      url: url,
      format: 'png',
      width: 1280,
      height: 720,
      fullPage: true,
      blockAds: true,
      blockBanners: true
    });

    const options = {
      hostname: 'api.pagebolt.dev',
      path: '/v1/screenshot',
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.PAGEBOLT_API_KEY}`,
        'Content-Type': 'application/json',
        'Content-Length': Buffer.byteLength(payload),
        'User-Agent': 'AWS-Lambda/Node.js'
      },
      timeout: 30000 // 30 second timeout
    };

    const req = https.request(options, (res) => {
      let data = Buffer.alloc(0);

      res.on('data', (chunk) => {
        data = Buffer.concat([data, chunk]);
      });

      res.on('end', () => {
        if (res.statusCode === 200) {
          resolve(data);
        } else {
          reject(new Error(`API error ${res.statusCode}`));
        }
      });
    });

    req.on('error', reject);
    req.on('timeout', () => {
      req.destroy();
      reject(new Error('Request timeout'));
    });

    req.write(payload);
    req.end();
  });
}

Deployment: Lambda Configuration

Create the Lambda with these settings:

# 1. Create the function
aws lambda create-function \
  --function-name screenshot-to-s3 \
  --runtime nodejs18.x \
  --role arn:aws:iam::ACCOUNT_ID:role/lambda-basic-execution \
  --handler lambda-screenshot-s3.handler \
  --zip-file fileb://function.zip

# 2. Add S3 permissions
aws lambda add-permission \
  --function-name screenshot-to-s3 \
  --statement-id AllowS3Access \
  --action lambda:InvokeFunction \
  --principal s3.amazonaws.com

# 3. Set environment variable for API key
aws lambda update-function-configuration \
  --function-name screenshot-to-s3 \
  --environment Variables={PAGEBOLT_API_KEY=YOUR_API_KEY}

Lambda configuration:

Memory: 256MB (plenty for API calls; Puppeteer would need 512MB+)
Timeout: 30 seconds (API calls complete in 1–5 seconds)
Ephemeral storage: Default 512MB

Test It

# Invoke the function
aws lambda invoke \
  --function-name screenshot-to-s3 \
  --payload '{"url":"https://example.com","bucket":"my-bucket","key":"example-screenshot.png"}' \
  --cli-binary-format raw-in-base64-out \
  response.json

cat response.json

Output:

{
  "statusCode": 200,
  "body": "{\"message\":\"Screenshot saved to S3\",\"url\":\"https://my-bucket.s3.amazonaws.com/example-screenshot.png\",\"s3Key\":\"example-screenshot.png\"}"
}

Cost Breakdown: API vs Self-Hosted

Metric	Puppeteer on Lambda	PageBolt API
Cold start	10–15 seconds	<1 second
Memory per invocation	200–400MB	50MB
Concurrent requests (512MB Lambda)	1–2	10+
Monthly cost (1,000 screenshots)	$15–25 (compute)	$3–5 (API)
Maintenance	Update layers, manage Chromium	None
Reliability	Crashes, timeouts	Guaranteed

For 10,000 screenshots/month: API is 5–10x cheaper than Lambda compute + layer management.

Error Handling & Retries

async function takeScreenshotWithRetry(url, maxRetries = 3) {
  let lastError;

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await takeScreenshot(url);
    } catch (error) {
      lastError = error;

      if (error.message.includes('Rate limited')) {
        const delay = (attempt + 1) * 1000;
        await new Promise(resolve => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }

  throw new Error(`Failed after ${maxRetries} attempts: ${lastError.message}`);
}

Best Practices for Lambda + Screenshot API

Keep payloads small — Only send required fields (url, format, width, height)
Set realistic timeouts — API calls should complete in <5 seconds; set Lambda timeout to 30 seconds
Use async/await — Cleaner than callbacks, better error handling
Monitor with CloudWatch — Log API response times and errors
Scale horizontally — Lambda automatically handles concurrent requests; no need to manage connection pools
Cache screenshots — Store in S3 or ElastiCache to avoid repeated API calls