agenthustler

Posted on Mar 26

How to Scrape Pinterest Boards and Pins with Python

#webdev #python #webscraping #tutorial

Pinterest is a goldmine of visual content data — from trending design ideas to product inspiration. Whether you're building a visual content aggregator, doing market research, or analyzing design trends, extracting Pinterest data programmatically can save you hundreds of hours.

In this guide, I'll show you how to scrape Pinterest boards and pins using Python.

Why Scrape Pinterest?

Pinterest has over 450 million monthly active users pinning images across millions of boards. This data is valuable for:

Market research: See what products and designs are trending
Content strategy: Analyze what visual content gets the most engagement
Competitive analysis: Track competitor boards and pin strategies
Image dataset building: Collect visual data for ML projects

Setting Up Your Environment

import requests
from bs4 import BeautifulSoup
import json
import time

# Use a residential proxy to avoid blocks
PROXY = {
    "http": "http://your-proxy:port",
    "https": "http://your-proxy:port"
}

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}

For reliable proxy rotation, I recommend ThorData residential proxies — they have excellent success rates for image-heavy sites like Pinterest.

Extracting Board Data

Pinterest loads content dynamically, so we need to target their internal API endpoints:

def scrape_pinterest_board(board_url):
    """Extract pins from a Pinterest board."""
    response = requests.get(board_url, headers=HEADERS, proxies=PROXY)
    soup = BeautifulSoup(response.text, "html.parser")

    # Pinterest embeds data in script tags
    scripts = soup.find_all("script", {"type": "application/json"})
    pins = []

    for script in scripts:
        try:
            data = json.loads(script.string)
            # Navigate the nested structure to find pin data
            if "props" in data:
                pin_data = extract_pins_from_json(data)
                pins.extend(pin_data)
        except (json.JSONDecodeError, TypeError):
            continue

    return pins


def extract_pins_from_json(data):
    """Parse pin details from Pinterest's JSON data."""
    pins = []
    # Recursively search for pin objects
    def find_pins(obj):
        if isinstance(obj, dict):
            if "images" in obj and "description" in obj:
                pins.append({
                    "description": obj.get("description", ""),
                    "image_url": obj.get("images", {}).get("orig", {}).get("url", ""),
                    "link": obj.get("link", ""),
                    "repin_count": obj.get("repin_count", 0),
                    "comment_count": obj.get("comment_count", 0),
                })
            for value in obj.values():
                find_pins(value)
        elif isinstance(obj, list):
            for item in obj:
                find_pins(item)

    find_pins(data)
    return pins

Downloading Pin Images

import os

def download_pin_images(pins, output_dir="pinterest_images"):
    os.makedirs(output_dir, exist_ok=True)

    for i, pin in enumerate(pins):
        if pin["image_url"]:
            try:
                img_response = requests.get(pin["image_url"], timeout=10)
                filepath = os.path.join(output_dir, f"pin_{i}.jpg")
                with open(filepath, "wb") as f:
                    f.write(img_response.content)
                print(f"Downloaded pin {i}: {pin['description'][:50]}")
                time.sleep(1)  # Be respectful
            except Exception as e:
                print(f"Failed to download pin {i}: {e}")

Handling Pagination

Pinterest uses infinite scroll, which means content loads dynamically. To get all pins from a board:

def scrape_full_board(board_url, max_pins=500):
    """Scrape all pins from a board with pagination."""
    all_pins = []
    bookmark = None

    while len(all_pins) < max_pins:
        params = {"bookmark": bookmark} if bookmark else {}
        response = requests.get(
            board_url,
            headers=HEADERS,
            params=params,
            proxies=PROXY
        )

        if response.status_code != 200:
            print(f"Request failed: {response.status_code}")
            break

        data = response.json()
        pins = data.get("resource_response", {}).get("data", [])

        if not pins:
            break

        all_pins.extend(pins)
        bookmark = data.get("resource", {}).get("options", {}).get("bookmark")

        if not bookmark or bookmark == "-end-":
            break

        time.sleep(2)  # Rate limiting

    return all_pins[:max_pins]

The Easy Way: Use a Pre-Built Scraper

Building and maintaining a Pinterest scraper is complex — Pinterest frequently changes their frontend structure and API endpoints.

For production use, I recommend the Pinterest Scraper on Apify. It handles all the complexity — dynamic rendering, pagination, proxy rotation, and anti-bot bypasses — so you can focus on analyzing the data instead of fighting with selectors.

Storing Results

import csv

def save_to_csv(pins, filename="pinterest_data.csv"):
    if not pins:
        return

    keys = pins[0].keys()
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        writer.writerows(pins)

    print(f"Saved {len(pins)} pins to {filename}")

Best Practices

Respect rate limits: Add delays between requests (2-5 seconds minimum)
Use proxies: Pinterest actively blocks scrapers. ThorData provides reliable residential proxies perfect for this
Cache results: Don't re-scrape data you already have
Check robots.txt: Always review the site's robots.txt before scraping
Handle errors gracefully: Network issues are common — implement retries with exponential backoff

Conclusion

Pinterest scraping opens up powerful possibilities for visual content analysis, market research, and data collection. Whether you build your own scraper or use a managed solution like the Pinterest Scraper on Apify, always scrape responsibly and respect the platform's terms of service.

Happy scraping!

DEV Community