Dennis

Posted on Mar 16

Python Website Screenshots: Selenium, Pyppeteer, and API Approaches

#webdev #ai #python #softwaredevelopment

I tried to be short, but thoughts expanded a bit, sorry about that.

You can take a screenshot of any URL in Python using Selenium, Pyppeteer, Playwright, or a screenshot API. Selenium uses the WebDriver protocol and supports Chrome, Firefox, and Safari. Pyppeteer is a Python port of Pyppeteer using Chrome's DevTools Protocol. Screenshot APIs handle everything server-side, so you get an image back from a single HTTP request with no browser to manage.

Here's each method with full working code, followed by a comparison to help you pick the right one.

Method 1: Selenium

Selenium is the oldest and most established browser automation tool for Python. If you've done any web testing, you've probably used it already.

Install

pip install selenium

You also need a browser driver. For Chrome, the easiest path is webdriver-manager:

pip install webdriver-manager

Basic Screenshot

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

options = Options()
options.add_argument('--headless=new')
options.add_argument('--window-size=1280,720')

service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)

driver.get('https://example.com')
driver.save_screenshot('screenshot.png')
driver.quit()

webdriver-manager automatically downloads the right ChromeDriver version for your installed Chrome. Without it, you have to manually match ChromeDriver to your Chrome version, which breaks every time Chrome updates.

Headless Mode

The --headless=new flag uses Chrome's new headless mode (available since Chrome 112). The old --headless flag uses a separate headless implementation that renders some pages differently. Always use --headless=new for accurate screenshots.

options = Options()
options.add_argument('--headless=new')
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')

The --no-sandbox flag is needed in Docker and some CI environments. On your local machine, you can skip it.

Full-Page Screenshot Workaround

Here's the frustrating part: Selenium's Chrome WebDriver doesn't natively support full-page screenshots. save_screenshot() captures only the viewport. Firefox does support it via save_full_page_screenshot(), but Chrome doesn't.

The workaround is to resize the browser window to match the page height:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

def full_page_screenshot(url, output_path):
    options = Options()
    options.add_argument('--headless=new')
    options.add_argument('--window-size=1280,720')

    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=options)
    driver.get(url)

    # Get the full page dimensions
    total_height = driver.execute_script('return document.body.scrollHeight')
    total_width = driver.execute_script('return document.body.scrollWidth')

    # Resize the window to fit the entire page
    driver.set_window_size(total_width, total_height)

    # Give the page a moment to reflow after resize
    import time
    time.sleep(0.5)

    driver.save_screenshot(output_path)
    driver.quit()

full_page_screenshot('https://example.com', 'full-page.png')

This works, but it's a hack. The page reflows at the new dimensions, so what you capture might look different from what a user sees at normal viewport size. Fixed headers stack at the top, sticky elements pile up, and responsive breakpoints can shift.

Alternatively, you can use the Chrome DevTools Protocol directly through Selenium:

def full_page_screenshot_cdp(url, output_path):
    options = Options()
    options.add_argument('--headless=new')
    options.add_argument('--window-size=1280,720')

    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=options)
    driver.get(url)

    # Use CDP command for full-page capture
    metrics = driver.execute_cdp_cmd('Page.getLayoutMetrics', {})
    width = metrics['contentSize']['width']
    height = metrics['contentSize']['height']

    driver.execute_cdp_cmd('Emulation.setDeviceMetricsOverride', {
        'width': int(width),
        'height': int(height),
        'deviceScaleFactor': 1,
        'mobile': False
    })

    result = driver.execute_cdp_cmd('Page.captureScreenshot', {
        'format': 'png',
        'captureBeyondViewport': True
    })

    import base64
    with open(output_path, 'wb') as f:
        f.write(base64.b64decode(result['data']))

    driver.quit()

More reliable, but now you're mixing WebDriver and CDP, which can get messy.

Waiting for Content

Selenium's default behavior waits for the page load event. For SPAs and dynamic content, you need explicit waits:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver.get('https://spa-app.com')
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, 'main-content'))
)
driver.save_screenshot('spa.png')

Firefox Full-Page (Native Support)

Firefox actually handles full-page screenshots natively:

from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from webdriver_manager.firefox import GeckoDriverManager

options = webdriver.FirefoxOptions()
options.add_argument('--headless')

service = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(service=service, options=options)
driver.get('https://example.com')
driver.save_full_page_screenshot('full-page.png')
driver.quit()

One method call, no hacks. If full-page capture is important and you don't specifically need Chrome, Firefox with Selenium is the simpler path.

Method 2: Pyppeteer

Pyppeteer is a Python port of Node.js Pyppeteer. It uses async/await and talks directly to Chrome via the DevTools Protocol. The API mirrors Pyppeteer closely, so if you've used Pyppeteer in Node.js, Pyppeteer will feel familiar.

Install

pip install pyppeteer

On first run, Pyppeteer downloads Chromium automatically (similar to how Pyppeteer works in Node.js).

Basic Screenshot

import asyncio
from pyppeteer import launch

async def take_screenshot(url, output_path):
    browser = await launch(headless=True)
    page = await browser.newPage()
    await page.setViewport({'width': 1280, 'height': 720})
    await page.goto(url, {'waitUntil': 'networkidle2'})
    await page.screenshot({'path': output_path})
    await browser.close()

asyncio.run(take_screenshot('https://example.com', 'screenshot.png'))

Full-Page Capture

Unlike Selenium, Pyppeteer supports full-page screenshots natively:

await page.screenshot({
    'path': 'full-page.png',
    'fullPage': True
})

Viewport and Mobile Emulation

async def mobile_screenshot(url, output_path):
    browser = await launch(headless=True)
    page = await browser.newPage()
    await page.setViewport({
        'width': 390,
        'height': 844,
        'deviceScaleFactor': 3,
        'isMobile': True,
        'hasTouch': True
    })
    await page.goto(url, {'waitUntil': 'networkidle2'})
    await page.screenshot({'path': output_path})
    await browser.close()

PDF Generation

async def save_as_pdf(url, output_path):
    browser = await launch(headless=True)
    page = await browser.newPage()
    await page.goto(url, {'waitUntil': 'networkidle2'})
    await page.pdf({
        'path': output_path,
        'format': 'A4',
        'printBackground': True,
        'margin': {'top': '20mm', 'bottom': '20mm'}
    })
    await browser.close()

A Note on Pyppeteer's Maintenance Status

Pyppeteer's development has slowed significantly. The last major release was in 2022, and it pins to an older Chromium version. For new projects, Playwright's Python bindings are a better choice (covered below). If you're already using Pyppeteer in production and it works, no reason to migrate. But I wouldn't start a new project with it today.

Playwright for Python

Playwright has first-class Python bindings that deserve mention. It's the most actively maintained option and supports Chromium, Firefox, and WebKit.

Install

pip install playwright
playwright install

Basic Screenshot

from playwright.sync_api import sync_playwright

def take_screenshot(url, output_path):
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page(viewport={'width': 1280, 'height': 720})
        page.goto(url, wait_until='networkidle')
        page.screenshot(path=output_path, full_page=True)
        browser.close()

take_screenshot('https://example.com', 'screenshot.png')

Playwright also has an async API:

from playwright.async_api import async_playwright
import asyncio

async def take_screenshot(url, output_path):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page(viewport={'width': 1280, 'height': 720})
        await page.goto(url, wait_until='networkidle')
        await page.screenshot(path=output_path, full_page=True)
        await browser.close()

asyncio.run(take_screenshot('https://example.com', 'screenshot.png'))

Playwright's auto-waiting, locator API, and cross-browser support make it the strongest option for new Python projects that need browser automation.

Method 3: Screenshot APIs

All the methods above require running a headless browser. That means managing Chromium binaries, handling process lifecycles, and dealing with memory issues on your server. Screenshot APIs move that complexity off your infrastructure entirely.

Using SnapRender's Python SDK

pip install snaprender

from snaprender import SnapRender

client = SnapRender(api_key='sk_live_...')
image_bytes = client.screenshot(
    url='https://example.com',
    format='png',
    width=1280,
    full_page=True
)

with open('screenshot.png', 'wb') as f:
    f.write(image_bytes)

Ad blocking, cookie banner removal, dark mode, device emulation: all just parameters. No browser to configure, no selectors to maintain.

Using Raw requests

import requests

response = requests.get(
    'https://app.snap-render.com/v1/screenshot',
    headers={'X-API-Key': 'sk_live_...'},
    params={
        'url': 'https://example.com',
        'format': 'png',
        'width': 1280,
        'full_page': True
    }
)

with open('screenshot.png', 'wb') as f:
    f.write(response.content)

No dependencies beyond requests. No Chromium download. The screenshot comes back over HTTP.

Comparison Table

Factor	Selenium	Pyppeteer	Playwright	Screenshot API
Install complexity	Moderate (driver matching)	Low (auto-downloads)	Low (auto-downloads)	Minimal (pip install)
Full-page capture	Workaround needed (Chrome)	Native	Native	Native
Async support	No (sync only)	Yes (async only)	Both sync and async	Both (SDK or requests)
Browser support	Chrome, Firefox, Safari, Edge	Chromium only	Chromium, Firefox, WebKit	Managed remotely
Community/ecosystem	Largest (20+ years)	Small, declining	Growing fast	Varies by provider
Maintenance status	Active	Stalled	Very active	Managed service
Memory per instance	150-300 MB	150-300 MB	150-300 MB	None (server-side)
Ad/cookie removal	Manual scripting	Manual scripting	Manual scripting	Single parameter
Cost	Free (compute costs)	Free (compute costs)	Free (compute costs)	Free tier + paid plans

Common Pitfalls

Font rendering differs across environments. A screenshot taken on macOS looks different from one taken on Ubuntu because system fonts differ. If visual consistency matters, you need to install the same fonts on your server, or use a screenshot API that handles font rendering in a consistent environment.

Chrome updates break driver compatibility. Selenium and Pyppeteer are both vulnerable to Chrome auto-updates breaking the driver pairing. webdriver-manager helps for Selenium; Pyppeteer bundles its own Chromium but it's an old version.

Memory usage in production. Each headless browser process uses 150-300 MB. If you're taking screenshots of 10 URLs concurrently with Selenium, that's 1.5-3 GB of RAM just for browsers. In production, you need worker pools, process recycling, and memory monitoring.

Docker environments need extra flags. Running Chrome in Docker requires --no-sandbox, --disable-dev-shm-usage, and sometimes --disable-gpu. The /dev/shm size needs to be increased from the default 64 MB or Chrome crashes silently.

Recommendations

Stick with Selenium if you already have it in your test suite and screenshots are a secondary need. Adding screenshot capture to existing Selenium tests is trivial, and rewriting everything for Playwright isn't worth the migration cost.

Use Playwright for new projects. If I were starting from scratch today, I'd pick Playwright over Pyppeteer or Selenium for any Python project that needs to take a screenshot of a URL. It has the best API design, the most active development, and both sync and async support.

Use a screenshot API for production screenshot features. If screenshots are a product feature (OG images, link previews, visual monitoring, PDF generation for customers), an API is the pragmatic choice. You skip the browser infrastructure entirely, and features like cookie banner removal and ad blocking are already built in.

The right tool depends on what you're building. For testing workflows, a local browser makes sense. For production features where you need to take a screenshot of a URL in Python reliably at scale, the less infrastructure you manage, the better.

DEV Community

Python Website Screenshots: Selenium, Pyppeteer, and API Approaches

Method 1: Selenium

Install

Basic Screenshot

Headless Mode

Full-Page Screenshot Workaround

Waiting for Content

Firefox Full-Page (Native Support)

Method 2: Pyppeteer

Install

Basic Screenshot

Full-Page Capture

Viewport and Mobile Emulation

PDF Generation

A Note on Pyppeteer's Maintenance Status

Playwright for Python

Install

Basic Screenshot

Method 3: Screenshot APIs

Using SnapRender's Python SDK

Using Raw requests

Comparison Table

Common Pitfalls

Recommendations

Top comments (0)