I built a Screenshot & Metadata API that extracts 50+ fields from any URL

#python #api #webdev #javascript

I built a Screenshot/PDF API that does more than just screenshots. The metadata endpoint extracts 50+ fields from any URL — OG tags, Twitter Card, JSON-LD, content analysis, and more.

What it does

URL to Screenshot — capture any webpage as PNG, JPEG, or WebP
URL to PDF — generate PDFs with custom format, margins, orientation
Metadata Extraction — 50+ fields from any URL (see below)
HTML to Image — render custom HTML/CSS to PNG/JPEG/WebP

The Metadata Endpoint

This is what makes it different. A single GET request extracts:

Basic SEO:
title, description, keywords, author, language, charset, viewport, robots, canonical URL, generator

Open Graph:
og:title, og:description, og:image (+ dimensions), og:url, og:type, og:site_name, og:locale

Twitter Card:
card type, title, description, image, @site, @creator

Icons & Theme:
favicon, apple-touch-icon, manifest, theme-color, color-scheme

Content Analysis:
first h1 text, h2 count, internal links count, external links count, images count, images without alt text, forms count, scripts count, stylesheets count, word count

Structured Data:
JSON-LD (Schema.org) parsed and returned

Feeds:
RSS/Atom feeds auto-detected

Raw dump:
All meta tags as key-value pairs

Quick Start (Python)

import requests

headers = {
    "X-RapidAPI-Key": "YOUR_KEY",
    "X-RapidAPI-Host": "screenshot-pdf-api.p.rapidapi.com"
}

# Screenshot a website
response = requests.get(
    "https://screenshot-pdf-api.p.rapidapi.com/v1/screenshot",
    headers=headers,
    params={"url": "https://github.com", "width": 1280, "format": "png"}
)

with open("screenshot.png", "wb") as f:
    f.write(response.content)

print(f"Saved {len(response.content)} bytes")

Quick Start (JavaScript)

// Screenshot
const response = await fetch(
  "https://screenshot-pdf-api.p.rapidapi.com/v1/screenshot?url=https://github.com&format=png",
  {
    headers: {
      "X-RapidAPI-Key": "YOUR_KEY",
      "X-RapidAPI-Host": "screenshot-pdf-api.p.rapidapi.com"
    }
  }
);
const blob = await response.blob();

// Metadata
const meta = await fetch(
  "https://screenshot-pdf-api.p.rapidapi.com/v1/metadata?url=https://github.com",
  {
    headers: {
      "X-RapidAPI-Key": "YOUR_KEY",
      "X-RapidAPI-Host": "screenshot-pdf-api.p.rapidapi.com"
    }
  }
);
const data = await meta.json();
console.log(data.data.title); // "GitHub · Build and ship software..."
console.log(data.data.og_image); // "https://..."
console.log(data.data.word_count); // 834

cURL

# Screenshot
curl -o screenshot.png \
  -H "X-RapidAPI-Key: YOUR_KEY" \
  -H "X-RapidAPI-Host: screenshot-pdf-api.p.rapidapi.com" \
  "https://screenshot-pdf-api.p.rapidapi.com/v1/screenshot?url=https://github.com"

# Full page capture
curl -o fullpage.png \
  -H "X-RapidAPI-Key: YOUR_KEY" \
  -H "X-RapidAPI-Host: screenshot-pdf-api.p.rapidapi.com" \
  "https://screenshot-pdf-api.p.rapidapi.com/v1/screenshot?url=https://en.wikipedia.org&full_page=true"

Endpoints

Endpoint	Description	Tier
GET /v1/screenshot	Screenshot URL to PNG/JPEG/WebP	Free
GET /v1/health	API status & queue depth	Free
GET /v1/pdf	Generate PDF from URL	Basic
GET /v1/metadata	Extract 50+ metadata fields	Basic
POST /v1/screenshot/html	Render HTML/CSS to image	Pro

Screenshot Parameters

Param	Default	Description
url	required	URL to capture
width	1280	Viewport width
height	800	Viewport height
format	png	png, jpeg, webp
quality	85	JPEG/WebP quality (1-100)
full_page	false	Capture entire scrollable page
delay	0	Wait N seconds before capture (0-5)
selector	null	CSS selector to capture specific element