DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Google Search Results in 2026: SERP Data, Rankings, and Snippets

Google processes over 8.5 billion searches per day. That SERP data — rankings, featured snippets, People Also Ask boxes, local packs — is gold for SEO professionals, market researchers, and competitive analysts.

But Google is arguably the hardest website to scrape. Here's how to actually get SERP data in 2026 without getting blocked.

Understanding Google SERP Structure

A modern Google results page is far more than 10 blue links. In 2026, a typical SERP includes:

  • AI Overview — Google's AI-generated summary at the top
  • Featured Snippets — direct answer boxes
  • People Also Ask (PAA) — expandable question boxes
  • Local Pack — map results for local queries
  • Knowledge Panel — entity information on the right
  • Organic Results — the traditional blue links
  • Shopping Results — product carousels
  • Video Results — YouTube and other video carousels
  • Related Searches — query suggestions at the bottom

Each of these elements requires different parsing logic, and Google frequently changes their HTML structure.

Method 1: Google Custom Search JSON API (Official)

Google offers a legitimate way to get search results programmatically through their Custom Search JSON API.

import requests

def google_search(query, api_key, cx, num=10):
    url = "https://www.googleapis.com/customsearch/v1"
    params = {
        "key": api_key,
        "cx": cx,       # Custom Search Engine ID
        "q": query,
        "num": num
    }
    response = requests.get(url, params=params)
    results = response.json()

    for item in results.get("items", []):
        print(f"Title: {item['title']}")
        print(f"URL: {item['link']}")
        print(f"Snippet: {item['snippet']}")
        print("---")

    return results

# Usage
results = google_search(
    "best python web frameworks 2026",
    api_key="YOUR_API_KEY",
    cx="YOUR_SEARCH_ENGINE_ID"
)
Enter fullscreen mode Exit fullscreen mode

Limitations

  • 100 free queries per day (then $5 per 1,000 queries)
  • No featured snippets, PAA, or AI Overview data
  • No ranking position for specific domains (you get results, but no rank tracking)
  • Max 10 results per query (no deep pagination)
  • Requires creating a Programmable Search Engine first

The official API is fine for basic search results but misses most of what makes SERP data valuable.

Method 2: SerpAPI and Similar Services

Dedicated SERP APIs handle the scraping infrastructure and return structured JSON with all SERP features.

SerpAPI

import requests

params = {
    "engine": "google",
    "q": "best crm software 2026",
    "api_key": "YOUR_SERPAPI_KEY",
    "location": "Austin, Texas",
    "hl": "en",
    "gl": "us"
}

response = requests.get(
    "https://serpapi.com/search",
    params=params
)
data = response.json()

# Organic results
for result in data.get("organic_results", []):
    print(f"#{result['position']}: {result['title']}")
    print(f"  URL: {result['link']}")
    print(f"  Snippet: {result.get('snippet', 'N/A')}")

# People Also Ask
for paa in data.get("related_questions", []):
    print(f"PAA: {paa['question']}")

# Featured Snippet
snippet = data.get("answer_box", {})
if snippet:
    print(f"Featured: {snippet.get('snippet', snippet.get('answer', ''))}")
Enter fullscreen mode Exit fullscreen mode

SerpAPI gives you everything — organic results, featured snippets, PAA boxes, local results, knowledge panels, shopping results, and more. It's the most reliable option for serious SERP data collection.

ScraperAPI for Google

ScraperAPI also handles Google SERPs well. Instead of returning structured JSON, it gives you the raw HTML which you parse yourself — more work but more flexibility:

import requests
from bs4 import BeautifulSoup

params = {
    "api_key": "YOUR_SCRAPERAPI_KEY",
    "url": "https://www.google.com/search?q=best+crm+software+2026",
    "render": "true"
}

response = requests.get(
    "https://api.scraperapi.com",
    params=params
)

soup = BeautifulSoup(response.text, "html.parser")

# Parse organic results
for g in soup.select("div.g"):
    title = g.select_one("h3")
    link = g.select_one("a")
    snippet = g.select_one(".VwiC3b")
    if title and link:
        print(f"Title: {title.text}")
        print(f"URL: {link['href']}")
        print(f"Snippet: {snippet.text if snippet else 'N/A'}")
        print("---")
Enter fullscreen mode Exit fullscreen mode

ScraperAPI is more cost-effective if you're already using it for other scraping tasks since one subscription covers all websites.

Method 3: DIY Scraping With Proxies

Building your own Google scraper is the hardest approach, but gives you complete control.

Basic Approach

import requests
from bs4 import BeautifulSoup
import time
import random

def scrape_google(query, proxy=None):
    url = "https://www.google.com/search"
    params = {"q": query, "num": 10, "hl": "en"}

    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/120.0.0.0 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml",
        "Accept-Language": "en-US,en;q=0.9"
    }

    proxies = {"http": proxy, "https": proxy} if proxy else None

    response = requests.get(
        url,
        params=params,
        headers=headers,
        proxies=proxies,
        timeout=10
    )

    if response.status_code == 429:
        print("Rate limited! Back off.")
        return None

    soup = BeautifulSoup(response.text, "html.parser")
    results = []

    for g in soup.select("div.g"):
        title_el = g.select_one("h3")
        link_el = g.select_one("a[href]")
        snippet_el = g.select_one(".VwiC3b, .IsZvec")

        if title_el and link_el:
            results.append({
                "title": title_el.text,
                "url": link_el["href"],
                "snippet": snippet_el.text if snippet_el else ""
            })

    return results
Enter fullscreen mode Exit fullscreen mode

Why This Is Hard

Google's anti-bot detection is best-in-class:

  1. reCAPTCHA v3 runs silently and scores your behavior
  2. IP reputation tracking — datacenter IPs are blocked almost immediately
  3. TLS fingerprinting detects Python's requests library
  4. Behavioral analysis — uniform request timing is a dead giveaway
  5. Cookie and session tracking — missing cookies trigger blocks

You absolutely need residential proxies for this. ThorData provides residential proxy pools that rotate IPs automatically:

# Using ThorData residential proxy
proxy = "http://user:pass@proxy.thordata.com:9090"
results = scrape_google("python web frameworks", proxy=proxy)
Enter fullscreen mode Exit fullscreen mode

For better fingerprinting, consider ScrapeOps which provides both proxy rotation and fake browser header management — they generate realistic browser fingerprints that help avoid detection.

Realistic Success Rate

Even with good proxies, expect:

  • Without proxies: Blocked after 5-10 queries
  • Datacenter proxies: ~30-50% success rate
  • Residential proxies: ~85-95% success rate
  • SERP API services: ~99%+ success rate

Parsing SERP Features

If you're scraping raw HTML (via ScraperAPI or DIY), here's how to extract key SERP features:

def parse_serp_features(soup):
    features = {}

    # Featured Snippet
    featured = soup.select_one(".xpdopen, .ifM9O")
    if featured:
        features["featured_snippet"] = featured.get_text(strip=True)

    # People Also Ask
    paa = soup.select(".related-question-pair")
    features["people_also_ask"] = [
        q.get_text(strip=True) for q in paa
    ]

    # Related Searches
    related = soup.select(".k8XOCe")
    features["related_searches"] = [
        r.get_text(strip=True) for r in related
    ]

    # Knowledge Panel
    kp = soup.select_one(".kp-wholepage")
    if kp:
        title = kp.select_one(".qrShPb")
        features["knowledge_panel"] = {
            "title": title.text if title else None
        }

    return features
Enter fullscreen mode Exit fullscreen mode

Warning: Google changes these CSS selectors frequently. Plan to update your parser every few weeks.

Which Method Should You Choose?

Factor Custom Search API SerpAPI ScraperAPI DIY
Setup difficulty Easy Easy Easy Hard
Data completeness Basic Full SERP Full (raw HTML) Full (raw HTML)
Reliability 99.9% 99%+ 98%+ 60-90%
Cost (1K queries) $5 ~$50 ~$30 Proxy costs
Maintenance None None Parser updates Constant
Legal risk None Low Low Medium

For most use cases, a dedicated SERP API (SerpAPI for structured data, ScraperAPI for flexibility) is the right choice. The cost savings of DIY scraping rarely justify the maintenance burden.

For low-volume needs (under 100 queries/day), the official Custom Search API is free and perfectly adequate.

For enterprise-scale rank tracking, you'll want SerpAPI or a similar dedicated service with built-in scheduling and historical data storage.

Ethical Considerations

Before scraping Google at scale, consider:

  • Google's ToS prohibit automated queries
  • Excessive scraping wastes Google's resources
  • Your IP or proxy provider could get banned
  • SERP data collected through scraping may not be redistributable

Use official APIs when they meet your needs. When they don't, use scraping services that handle compliance. Reserve DIY scraping for research and prototyping.


Need proxy infrastructure for web scraping? Check out ScraperAPI for managed scraping, ScrapeOps for proxy management, or ThorData for residential proxies.

Top comments (0)