DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Google Search Results in 2026 (SERP Scraping Guide)

Google search results are some of the most valuable data on the internet. SEO tracking, competitor analysis, market research, lead generation — they all start with SERP data.

They're also some of the hardest pages to scrape. Google has world-class anti-bot systems, and they get better every year. In 2026, the game has shifted significantly from even two years ago.

Here's the honest breakdown: what works, what doesn't, and why most people should use an API.

The DIY Approach (And Why It Breaks)

Let's start with the simple approach so you understand why it fails:

import requests
from bs4 import BeautifulSoup

def scrape_google_naive(query):
    """This will work maybe 2-3 times before you get blocked."""
    url = f"https://www.google.com/search?q={query}&num=10"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                      "AppleWebKit/537.36 (KHTML, like Gecko) "
                      "Chrome/131.0.0.0 Safari/537.36",
        "Accept-Language": "en-US,en;q=0.9",
    }

    response = requests.get(url, headers=headers, timeout=10)

    if response.status_code == 429:
        print("Blocked: Too many requests")
        return []

    soup = BeautifulSoup(response.text, "html.parser")
    results = []

    for g in soup.select("div.g"):
        title_el = g.select_one("h3")
        link_el = g.select_one("a")
        snippet_el = g.select_one("div.VwiC3b")

        if title_el and link_el:
            results.append({
                "title": title_el.get_text(),
                "url": link_el["href"],
                "snippet": snippet_el.get_text() if snippet_el else "",
            })

    return results
Enter fullscreen mode Exit fullscreen mode

What happens when you run this:

  • First 2-3 requests: works fine
  • Requests 4-10: you start getting CAPTCHAs
  • After that: your IP gets temporarily banned
  • If you persist: longer bans, sometimes requiring you to solve CAPTCHAs manually in a real browser

Why Google is harder than other sites:

  1. Aggressive rate limiting — even a few requests per minute triggers detection
  2. Browser fingerprinting — they check dozens of browser signals beyond User-Agent
  3. CAPTCHA walls — reCAPTCHA v3 runs invisibly and scores your "humanness"
  4. Dynamic HTML structure — Google changes class names and DOM structure regularly, breaking CSS selectors
  5. Personalization — results vary by IP location, language, search history

Making DIY Work (Partially)

You can extend the DIY approach with proxies and delays. It won't be bulletproof, but it works for low-volume scraping:

import requests
from bs4 import BeautifulSoup
import time
import random

def scrape_google_with_proxies(query, proxies_list):
    """Better approach with proxy rotation and delays."""
    url = f"https://www.google.com/search?q={query}&num=10&hl=en"
    headers = {
        "User-Agent": random.choice([
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/131.0.0.0 Safari/537.36",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Chrome/131.0.0.0 Safari/537.36",
            "Mozilla/5.0 (X11; Linux x86_64) Chrome/131.0.0.0 Safari/537.36",
        ]),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    }

    proxy = random.choice(proxies_list)
    proxy_dict = {"http": proxy, "https": proxy}

    # Random delay to appear more human
    time.sleep(random.uniform(3, 8))

    try:
        response = requests.get(
            url, headers=headers, proxies=proxy_dict, timeout=15
        )

        if response.status_code == 429:
            print(f"Rate limited on proxy {proxy}")
            return None

        if "detected unusual traffic" in response.text.lower():
            print(f"CAPTCHA triggered on proxy {proxy}")
            return None

        return parse_serp(response.text)

    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
        return None

def parse_serp(html):
    """Parse Google SERP HTML into structured results."""
    soup = BeautifulSoup(html, "html.parser")
    results = []

    for g in soup.select("div.g"):
        title_el = g.select_one("h3")
        link_el = g.select_one("a[href]")
        snippet_el = g.select_one("div[data-sncf], div.VwiC3b")

        if title_el and link_el:
            href = link_el.get("href", "")
            if href.startswith("/url?q="):
                href = href.split("/url?q=")[1].split("&")[0]

            results.append({
                "position": len(results) + 1,
                "title": title_el.get_text(strip=True),
                "url": href,
                "snippet": snippet_el.get_text(strip=True) if snippet_el else "",
            })

    return results
Enter fullscreen mode Exit fullscreen mode

The problems with this approach:

  • You need a pool of residential proxies (datacenter proxies get blocked instantly on Google)
  • Residential proxies cost $5-15/GB, and Google pages are ~500KB each
  • You still get blocked frequently — maybe 60-70% success rate on a good day
  • CSS selectors break every few weeks when Google updates their HTML
  • You spend more time maintaining the scraper than using the data

The Realistic Approach: Use an API

I'm not saying this to sell you something. I'm saying it because I've burned dozens of hours trying to maintain DIY Google scrapers and the math doesn't work out.

A scraping API costs $0.001-0.005 per request. It handles proxies, CAPTCHAs, and HTML parsing for you. If you're scraping more than 50 queries, the API pays for itself in saved debugging time.

Using ScraperAPI for Google SERP

ScraperAPI has a dedicated Google search endpoint that returns structured JSON. No HTML parsing needed.

import requests

SCRAPERAPI_KEY = "your_api_key"

def scrape_google_serp(query, num_results=10, country="us"):
    """Scrape Google SERPs using ScraperAPI's structured endpoint."""
    params = {
        "api_key": SCRAPERAPI_KEY,
        "url": f"https://www.google.com/search?q={query}&num={num_results}&gl={country}",
        "render": "true",
    }

    response = requests.get("http://api.scraperapi.com", params=params, timeout=60)
    response.raise_for_status()

    # Parse the returned HTML
    return parse_serp(response.text)

# Example: track rankings for a keyword
results = scrape_google_serp("best python web scraping library 2026")
for r in results:
    print(f"#{r['position']}: {r['title']}")
    print(f"  URL: {r['url']}")
    print(f"  Snippet: {r['snippet'][:100]}...")
    print()
Enter fullscreen mode Exit fullscreen mode

ScraperAPI offers 5,000 free API calls, which is enough to test whether SERP scraping is useful for your project before committing money.

Using Scrape.do

Scrape.do offers a clean API with similar capabilities:

import requests
from urllib.parse import quote

SCRAPEDO_TOKEN = "your_token"

def scrape_google_scrapedo(query, country="us"):
    """Scrape Google using Scrape.do API."""
    target_url = quote(f"https://www.google.com/search?q={query}&gl={country}&num=10")
    api_url = f"https://api.scrape.do?token={SCRAPEDO_TOKEN}&url={target_url}&render=true"

    response = requests.get(api_url, timeout=60)
    response.raise_for_status()
    return parse_serp(response.text)
Enter fullscreen mode Exit fullscreen mode

Monitoring Your SERP Scraper

If you're running SERP scraping as an ongoing operation, ScrapeOps helps you monitor success rates, response times, and costs across different proxy providers. Useful when you want to optimize which API you use for which queries.

Building a Rank Tracker

The most common use case for SERP scraping is rank tracking — monitoring where your site (or a competitor) appears for specific keywords over time.

import sqlite3
from datetime import datetime

def init_rank_db(db_path="rankings.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS rankings (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            keyword TEXT NOT NULL,
            target_domain TEXT NOT NULL,
            position INTEGER,
            url TEXT,
            timestamp TEXT NOT NULL
        )
    """)
    conn.commit()
    return conn

def track_ranking(keyword, target_domain, api_key):
    """Track where target_domain ranks for a given keyword."""
    results = scrape_google_serp(keyword)

    position = None
    matched_url = None

    for result in results:
        if target_domain in result["url"]:
            position = result["position"]
            matched_url = result["url"]
            break

    conn = init_rank_db()
    conn.execute(
        "INSERT INTO rankings (keyword, target_domain, position, url, timestamp) VALUES (?, ?, ?, ?, ?)",
        (keyword, target_domain, position, matched_url, datetime.now().isoformat())
    )
    conn.commit()
    conn.close()

    if position:
        print(f"'{keyword}': {target_domain} ranks #{position}")
    else:
        print(f"'{keyword}': {target_domain} not found in top results")

    return position

# Track multiple keywords
KEYWORDS = [
    "web scraping python tutorial",
    "best scraping API 2026",
    "python price monitoring",
]

TARGET = "yourdomain.com"

for kw in KEYWORDS:
    track_ranking(kw, TARGET, SCRAPERAPI_KEY)
Enter fullscreen mode Exit fullscreen mode

Run this daily via cron, and you have a rank tracker that would cost $30-50/month from commercial tools.

What About Google's Official APIs?

Google offers the Custom Search JSON API. It's legitimate and won't get you blocked. The catch:

  • Limited to 100 queries/day for free, then $5 per 1,000 queries
  • Only searches a custom search engine, not the real Google index
  • Results don't match what actual users see
  • No featured snippets, "People Also Ask", or other SERP features

For SEO tracking, the official API isn't useful because the results don't reflect real rankings. For general data gathering, it can work if 100 queries/day is enough.

Legal Considerations

Scraping Google search results exists in a legal gray area. Google's Terms of Service prohibit automated access, but courts have generally ruled that scraping publicly available data is legal (see hiQ v. LinkedIn).

That said:

  • Don't overload their servers (rate limit your requests)
  • Don't scrape personal data from results
  • Use the data for analysis, not to build a competing search engine
  • Consider using the official API if it fits your needs

The Bottom Line

For 1-10 queries: DIY with proxies works, but expect failures.

For 10-1000 queries/day: Use ScraperAPI or Scrape.do. The cost is negligible compared to the reliability gain.

For 1000+ queries/day: Evaluate dedicated SERP APIs (some offer bulk pricing) and use ScrapeOps to monitor performance across providers.

Don't waste weeks building robust Google scraping infrastructure from scratch. I made that mistake. Use the free tiers to validate your idea, then pay for reliability when it matters.

Go Deeper

My ebook covers SERP scraping in detail — including extracting featured snippets, People Also Ask boxes, local pack results, and building automated SEO dashboards.

Get the Web Scraping Playbook — $9 on Gumroad


Questions about SERP scraping? Email me at hustler@curlship.com — I respond to everything.

Top comments (0)