Custodia-Admin

Posted on Mar 15 • Edited on Mar 25 • Originally published at pagebolt.dev

Screenshot API for Python Developers: Requests vs Hosted API

#python #api #webdev #tutorial

You're building a Python app and need to take screenshots. Maybe you're:

Building a social media link preview service
Auto-generating OG images for Flask/Django apps
Testing web interfaces programmatically
Monitoring website changes
Archiving web content for compliance

You search "Python screenshot library" and find Selenium. It's been around for years. It's in PyPI. Thousands of projects use it.

Three days later, you're debugging WebDriver timeouts, wrestling with Firefox vs Chrome, and wondering why your screenshots look different on different machines.

There's a simpler way. Let me show you both approaches — Selenium and a hosted API — so you can decide which fits your project.

The Selenium Approach

Selenium is a browser automation framework. Here's the minimal example:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get('https://example.com')
driver.save_screenshot('screenshot.png')
driver.quit()

That's 7 lines. Simple, right?

But in production, this becomes a nightmare. Here's what you'll actually need:

1. Browser Driver Management

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service

service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

Now you need webdriver-manager as a dependency. It works, but adds complexity.

2. Headless Rendering & Options

from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument(f'--window-size=1280,720')
options.add_argument(f'--user-agent=Mozilla/5.0...')

driver = webdriver.Chrome(options=options)

Each flag is a gotcha. On Linux, you need --no-sandbox. On macOS, you don't. In Docker, you need --disable-dev-shm-usage. Get it wrong and your screenshots fail silently.

3. Waits and Timing

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver.get(url)

# Wait for page to load
WebDriverWait(driver, 10).until(
    EC.presence_of_all_elements_located((By.TAG_NAME, 'body'))
)

# Wait for JavaScript to finish (how long? nobody knows)
time.sleep(2)

# Now take screenshot
driver.save_screenshot('screenshot.png')

How long should you wait? 1 second? 2? 5? Too short and your screenshot shows a blank page. Too long and your service times out.

4. Error Handling & Cleanup

try:
    driver = webdriver.Chrome(options=options)
    driver.get(url)
    driver.save_screenshot(filename)
except TimeoutException:
    print('Page took too long to load')
except WebDriverException as e:
    print(f'Selenium error: {e}')
except Exception as e:
    print(f'Unknown error: {e}')
finally:
    driver.quit()

Now multiply this by every function that takes a screenshot. You're writing the same error handling 10+ times.

5. Infrastructure & Scaling

Selenium runs Chrome locally. Chrome takes 100–200MB of RAM per instance. If you have 10 concurrent requests, you need 1–2GB of RAM just for browsers.

In production:

Deploy to a Docker container with Chrome installed (adds 400MB to your image)
Monitor memory usage
Handle browser crashes
Implement a queue for concurrent requests
Scale horizontally (add more servers = add more complexity)

Real-world cost at 10,000 screenshots/month:

Server: $400/month (2GB RAM, 2 CPU)
DevOps/monitoring: $2,000/month (on-call, incident response)
Total: $2,400/month

And that's before you hit the limits of a single server.

The API Approach

Here's the same task with a hosted screenshot API:

import requests

response = requests.post(
    'https://api.pagebolt.io/api/v1/screenshot',
    headers={'x-api-key': YOUR_API_KEY},
    json={'url': 'https://example.com'}
)

with open('screenshot.png', 'wb') as f:
    f.write(response.content)

That's it. 8 lines, including the file write. No browser management. No memory leaks. No infrastructure.

Here's a more realistic production-ready example:

import requests
import logging

logger = logging.getLogger(__name__)

def take_screenshot(url, filename, timeout=10):
    """Take a screenshot of a URL and save to file."""
    try:
        response = requests.post(
            'https://api.pagebolt.io/api/v1/screenshot',
            headers={'x-api-key': os.environ[PAGEBOLT_API_KEY"]}'},
            json={
                'url': url,
                'width': 1280,
                'height': 720,
                'blockAds': True,
                'blockBanners': True
            },
            timeout=timeout
        )

        if response.status_code != 200:
            logger.error(f'Screenshot failed: {response.status_code} {response.text}')
            raise Exception(f'API returned {response.status_code}')

        with open(filename, 'wb') as f:
            f.write(response.content)

        logger.info(f'Screenshot saved: {filename}')
        return filename

    except requests.Timeout:
        logger.error(f'Screenshot request timed out: {url}')
        raise
    except requests.RequestException as e:
        logger.error(f'Screenshot request failed: {e}')
        raise

That's 35 lines of real, production-ready code. Compare it to managing Selenium.

Feature Comparison

Feature	Selenium	Hosted API
Setup time	1 hour	5 minutes
Lines of code	150+ (with pools, error handling)	30–50
Infrastructure	You manage Chrome, memory, scaling	Handled for you
Cost (10k screenshots/month)	$2,400/month	$29/month
Device presets	Manual setup	Built-in (25+ presets)
PDF generation	Extra library, extra complexity	One parameter
Reliable waits	Guessing (sleep)	Built-in (networkidle)
Retry logic	You implement it	Included
Uptime	Depends on your infrastructure	99.9% SLA
Monitoring	You do it	Included

When to Use Selenium

Use Selenium if:

You're testing internal web applications (API can't reach them)
You need to interact with JavaScript heavily (click buttons, fill forms, verify state changes)
You're learning web automation (educational context)
You have existing Selenium test infrastructure and want to integrate screenshots
You can afford the infrastructure and maintenance cost

When to Use an API

Use an API if:

You want screenshots in production without infrastructure headaches
You need reliability and uptime guarantees
You're taking static screenshots (no complex interactions needed)
You want to scale without managing more servers
You want to focus on your app, not on browser management

Real-World Example: Django Link Preview Service

You're building a service that generates preview cards for shared links (like Discord does).

With Selenium:

Install Chrome in Docker
Set up WebDriver
Handle timeouts and retries
Monitor memory usage
Deploy to a server with enough RAM
Scale horizontally as traffic grows
Cost: $2,400+/month in infrastructure

With an API:

pip install requests
Call the API endpoint
Save the image
Return to user
Cost: $29/month

The API approach takes one day. The Selenium approach takes two weeks.

Code Example: Flask Link Preview

from flask import Flask, request, jsonify
import requests
import os

app = Flask(__name__)

@app.route('/api/preview', methods=['POST'])
def create_preview():
    """Generate a link preview card."""
    data = request.get_json()
    url = data.get('url')

    if not url:
        return jsonify({'error': 'URL required'}), 400

    try:
        # Step 1: Take screenshot
        response = requests.post(
            'https://api.pagebolt.io/api/v1/screenshot',
            headers={'x-api-key': os.environ[PAGEBOLT_API_KEY"]}'},
            json={
                'url': url,
                'width': 1200,
                'height': 630,
                'blockAds': True,
                'blockBanners': True
            },
            timeout=10
        )

        if response.status_code != 200:
            return jsonify({'error': 'Screenshot failed'}), 500

        # Step 2: Get page metadata (optional)
        meta_response = requests.post(
            'https://api.pagebolt.io/api/v1/inspect',
            headers={'x-api-key': os.environ[PAGEBOLT_API_KEY"]}'},
            json={'url': url},
            timeout=10
        )

        metadata = meta_response.json() if meta_response.ok else {}

        # Step 3: Return preview card
        return jsonify({
            'title': metadata.get('title', 'Untitled'),
            'description': metadata.get('description', ''),
            'image_url': response.url,  # CDN URL for the screenshot
            'url': url
        })

    except requests.Timeout:
        return jsonify({'error': 'Request timed out'}), 504
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(debug=False)

That's everything. No Selenium. No browser management. No infrastructure.

Hybrid Approach

Some teams use both:

Selenium for QA automation (testing interactions, verifying UI state)
API for customer-facing features (link previews, screenshot galleries)

Selenium shines when you need to interact with the page. APIs shine when you just need static screenshots.

The Bottom Line

Selenium is powerful if you need it. But most teams don't. They need screenshots, and they need them to work without becoming infrastructure engineers.

An API costs $29/month and takes 5 minutes to integrate. Selenium costs $2,400+/month and two weeks to get right.

The choice depends on your use case. But for most Python projects, the API wins on simplicity, cost, and reliability.

Try PageBolt Free

100 requests/month. No credit card. No infrastructure required.

Start your free trial and see how simple screenshots can be in Python.

DEV Community