agenthustler

Posted on Mar 26

How to Scrape Google Search Results in 2026: SERP Data, Rankings, and Snippets

#webdev #python #webscraping #tutorial

Google processes over 8.5 billion searches per day. That SERP data — rankings, featured snippets, People Also Ask boxes, local packs — is gold for SEO professionals, market researchers, and competitive analysts.

But Google is arguably the hardest website to scrape. Here's how to actually get SERP data in 2026 without getting blocked.

Understanding Google SERP Structure

A modern Google results page is far more than 10 blue links. In 2026, a typical SERP includes:

AI Overview — Google's AI-generated summary at the top
Featured Snippets — direct answer boxes
People Also Ask (PAA) — expandable question boxes
Local Pack — map results for local queries
Knowledge Panel — entity information on the right
Organic Results — the traditional blue links
Shopping Results — product carousels
Video Results — YouTube and other video carousels
Related Searches — query suggestions at the bottom

Each of these elements requires different parsing logic, and Google frequently changes their HTML structure.

Method 1: Google Custom Search JSON API (Official)

Google offers a legitimate way to get search results programmatically through their Custom Search JSON API.

import requests

def google_search(query, api_key, cx, num=10):
    url = "https://www.googleapis.com/customsearch/v1"
    params = {
        "key": api_key,
        "cx": cx,       # Custom Search Engine ID
        "q": query,
        "num": num
    }
    response = requests.get(url, params=params)
    results = response.json()

    for item in results.get("items", []):
        print(f"Title: {item['title']}")
        print(f"URL: {item['link']}")
        print(f"Snippet: {item['snippet']}")
        print("---")

    return results

# Usage
results = google_search(
    "best python web frameworks 2026",
    api_key="YOUR_API_KEY",
    cx="YOUR_SEARCH_ENGINE_ID"
)

Limitations

100 free queries per day (then $5 per 1,000 queries)
No featured snippets, PAA, or AI Overview data
No ranking position for specific domains (you get results, but no rank tracking)
Max 10 results per query (no deep pagination)
Requires creating a Programmable Search Engine first

The official API is fine for basic search results but misses most of what makes SERP data valuable.

Method 2: SerpAPI and Similar Services

Dedicated SERP APIs handle the scraping infrastructure and return structured JSON with all SERP features.

SerpAPI

import requests

params = {
    "engine": "google",
    "q": "best crm software 2026",
    "api_key": "YOUR_SERPAPI_KEY",
    "location": "Austin, Texas",
    "hl": "en",
    "gl": "us"
}

response = requests.get(
    "https://serpapi.com/search",
    params=params
)
data = response.json()

# Organic results
for result in data.get("organic_results", []):
    print(f"#{result['position']}: {result['title']}")
    print(f"  URL: {result['link']}")
    print(f"  Snippet: {result.get('snippet', 'N/A')}")

# People Also Ask
for paa in data.get("related_questions", []):
    print(f"PAA: {paa['question']}")

# Featured Snippet
snippet = data.get("answer_box", {})
if snippet:
    print(f"Featured: {snippet.get('snippet', snippet.get('answer', ''))}")

SerpAPI gives you everything — organic results, featured snippets, PAA boxes, local results, knowledge panels, shopping results, and more. It's the most reliable option for serious SERP data collection.

ScraperAPI for Google

ScraperAPI also handles Google SERPs well. Instead of returning structured JSON, it gives you the raw HTML which you parse yourself — more work but more flexibility:

import requests
from bs4 import BeautifulSoup

params = {
    "api_key": "YOUR_SCRAPERAPI_KEY",
    "url": "https://www.google.com/search?q=best+crm+software+2026",
    "render": "true"
}

response = requests.get(
    "https://api.scraperapi.com",
    params=params
)

soup = BeautifulSoup(response.text, "html.parser")

# Parse organic results
for g in soup.select("div.g"):
    title = g.select_one("h3")
    link = g.select_one("a")
    snippet = g.select_one(".VwiC3b")
    if title and link:
        print(f"Title: {title.text}")
        print(f"URL: {link['href']}")
        print(f"Snippet: {snippet.text if snippet else 'N/A'}")
        print("---")

ScraperAPI is more cost-effective if you're already using it for other scraping tasks since one subscription covers all websites.

Method 3: DIY Scraping With Proxies

Building your own Google scraper is the hardest approach, but gives you complete control.

Basic Approach

import requests
from bs4 import BeautifulSoup
import time
import random

def scrape_google(query, proxy=None):
    url = "https://www.google.com/search"
    params = {"q": query, "num": 10, "hl": "en"}

    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/120.0.0.0 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml",
        "Accept-Language": "en-US,en;q=0.9"
    }

    proxies = {"http": proxy, "https": proxy} if proxy else None

    response = requests.get(
        url,
        params=params,
        headers=headers,
        proxies=proxies,
        timeout=10
    )

    if response.status_code == 429:
        print("Rate limited! Back off.")
        return None

    soup = BeautifulSoup(response.text, "html.parser")
    results = []

    for g in soup.select("div.g"):
        title_el = g.select_one("h3")
        link_el = g.select_one("a[href]")
        snippet_el = g.select_one(".VwiC3b, .IsZvec")

        if title_el and link_el:
            results.append({
                "title": title_el.text,
                "url": link_el["href"],
                "snippet": snippet_el.text if snippet_el else ""
            })

    return results

Why This Is Hard

Google's anti-bot detection is best-in-class:

reCAPTCHA v3 runs silently and scores your behavior
IP reputation tracking — datacenter IPs are blocked almost immediately
TLS fingerprinting detects Python's requests library
Behavioral analysis — uniform request timing is a dead giveaway
Cookie and session tracking — missing cookies trigger blocks

You absolutely need residential proxies for this. ThorData provides residential proxy pools that rotate IPs automatically:

# Using ThorData residential proxy
proxy = "http://user:pass@proxy.thordata.com:9090"
results = scrape_google("python web frameworks", proxy=proxy)

For better fingerprinting, consider ScrapeOps which provides both proxy rotation and fake browser header management — they generate realistic browser fingerprints that help avoid detection.

Realistic Success Rate

Even with good proxies, expect:

Without proxies: Blocked after 5-10 queries
Datacenter proxies: ~30-50% success rate
Residential proxies: ~85-95% success rate
SERP API services: ~99%+ success rate

Parsing SERP Features

If you're scraping raw HTML (via ScraperAPI or DIY), here's how to extract key SERP features:

def parse_serp_features(soup):
    features = {}

    # Featured Snippet
    featured = soup.select_one(".xpdopen, .ifM9O")
    if featured:
        features["featured_snippet"] = featured.get_text(strip=True)

    # People Also Ask
    paa = soup.select(".related-question-pair")
    features["people_also_ask"] = [
        q.get_text(strip=True) for q in paa
    ]

    # Related Searches
    related = soup.select(".k8XOCe")
    features["related_searches"] = [
        r.get_text(strip=True) for r in related
    ]

    # Knowledge Panel
    kp = soup.select_one(".kp-wholepage")
    if kp:
        title = kp.select_one(".qrShPb")
        features["knowledge_panel"] = {
            "title": title.text if title else None
        }

    return features

Warning: Google changes these CSS selectors frequently. Plan to update your parser every few weeks.

Which Method Should You Choose?

Factor	Custom Search API	SerpAPI	ScraperAPI	DIY
Setup difficulty	Easy	Easy	Easy	Hard
Data completeness	Basic	Full SERP	Full (raw HTML)	Full (raw HTML)
Reliability	99.9%	99%+	98%+	60-90%
Cost (1K queries)	$5	~$50	~$30	Proxy costs
Maintenance	None	None	Parser updates	Constant
Legal risk	None	Low	Low	Medium

For most use cases, a dedicated SERP API (SerpAPI for structured data, ScraperAPI for flexibility) is the right choice. The cost savings of DIY scraping rarely justify the maintenance burden.

For low-volume needs (under 100 queries/day), the official Custom Search API is free and perfectly adequate.

For enterprise-scale rank tracking, you'll want SerpAPI or a similar dedicated service with built-in scheduling and historical data storage.

Ethical Considerations

Before scraping Google at scale, consider:

Google's ToS prohibit automated queries
Excessive scraping wastes Google's resources
Your IP or proxy provider could get banned
SERP data collected through scraping may not be redistributable

Use official APIs when they meet your needs. When they don't, use scraping services that handle compliance. Reserve DIY scraping for research and prototyping.

Need proxy infrastructure for web scraping? Check out ScraperAPI for managed scraping, ScrapeOps for proxy management, or ThorData for residential proxies.

DEV Community