Google processes over 8.5 billion searches per day. That SERP data — rankings, featured snippets, People Also Ask boxes, local packs — is gold for SEO professionals, market researchers, and competitive analysts.
But Google is arguably the hardest website to scrape. Here's how to actually get SERP data in 2026 without getting blocked.
Understanding Google SERP Structure
A modern Google results page is far more than 10 blue links. In 2026, a typical SERP includes:
- AI Overview — Google's AI-generated summary at the top
- Featured Snippets — direct answer boxes
- People Also Ask (PAA) — expandable question boxes
- Local Pack — map results for local queries
- Knowledge Panel — entity information on the right
- Organic Results — the traditional blue links
- Shopping Results — product carousels
- Video Results — YouTube and other video carousels
- Related Searches — query suggestions at the bottom
Each of these elements requires different parsing logic, and Google frequently changes their HTML structure.
Method 1: Google Custom Search JSON API (Official)
Google offers a legitimate way to get search results programmatically through their Custom Search JSON API.
import requests
def google_search(query, api_key, cx, num=10):
url = "https://www.googleapis.com/customsearch/v1"
params = {
"key": api_key,
"cx": cx, # Custom Search Engine ID
"q": query,
"num": num
}
response = requests.get(url, params=params)
results = response.json()
for item in results.get("items", []):
print(f"Title: {item['title']}")
print(f"URL: {item['link']}")
print(f"Snippet: {item['snippet']}")
print("---")
return results
# Usage
results = google_search(
"best python web frameworks 2026",
api_key="YOUR_API_KEY",
cx="YOUR_SEARCH_ENGINE_ID"
)
Limitations
- 100 free queries per day (then $5 per 1,000 queries)
- No featured snippets, PAA, or AI Overview data
- No ranking position for specific domains (you get results, but no rank tracking)
- Max 10 results per query (no deep pagination)
- Requires creating a Programmable Search Engine first
The official API is fine for basic search results but misses most of what makes SERP data valuable.
Method 2: SerpAPI and Similar Services
Dedicated SERP APIs handle the scraping infrastructure and return structured JSON with all SERP features.
SerpAPI
import requests
params = {
"engine": "google",
"q": "best crm software 2026",
"api_key": "YOUR_SERPAPI_KEY",
"location": "Austin, Texas",
"hl": "en",
"gl": "us"
}
response = requests.get(
"https://serpapi.com/search",
params=params
)
data = response.json()
# Organic results
for result in data.get("organic_results", []):
print(f"#{result['position']}: {result['title']}")
print(f" URL: {result['link']}")
print(f" Snippet: {result.get('snippet', 'N/A')}")
# People Also Ask
for paa in data.get("related_questions", []):
print(f"PAA: {paa['question']}")
# Featured Snippet
snippet = data.get("answer_box", {})
if snippet:
print(f"Featured: {snippet.get('snippet', snippet.get('answer', ''))}")
SerpAPI gives you everything — organic results, featured snippets, PAA boxes, local results, knowledge panels, shopping results, and more. It's the most reliable option for serious SERP data collection.
ScraperAPI for Google
ScraperAPI also handles Google SERPs well. Instead of returning structured JSON, it gives you the raw HTML which you parse yourself — more work but more flexibility:
import requests
from bs4 import BeautifulSoup
params = {
"api_key": "YOUR_SCRAPERAPI_KEY",
"url": "https://www.google.com/search?q=best+crm+software+2026",
"render": "true"
}
response = requests.get(
"https://api.scraperapi.com",
params=params
)
soup = BeautifulSoup(response.text, "html.parser")
# Parse organic results
for g in soup.select("div.g"):
title = g.select_one("h3")
link = g.select_one("a")
snippet = g.select_one(".VwiC3b")
if title and link:
print(f"Title: {title.text}")
print(f"URL: {link['href']}")
print(f"Snippet: {snippet.text if snippet else 'N/A'}")
print("---")
ScraperAPI is more cost-effective if you're already using it for other scraping tasks since one subscription covers all websites.
Method 3: DIY Scraping With Proxies
Building your own Google scraper is the hardest approach, but gives you complete control.
Basic Approach
import requests
from bs4 import BeautifulSoup
import time
import random
def scrape_google(query, proxy=None):
url = "https://www.google.com/search"
params = {"q": query, "num": 10, "hl": "en"}
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml",
"Accept-Language": "en-US,en;q=0.9"
}
proxies = {"http": proxy, "https": proxy} if proxy else None
response = requests.get(
url,
params=params,
headers=headers,
proxies=proxies,
timeout=10
)
if response.status_code == 429:
print("Rate limited! Back off.")
return None
soup = BeautifulSoup(response.text, "html.parser")
results = []
for g in soup.select("div.g"):
title_el = g.select_one("h3")
link_el = g.select_one("a[href]")
snippet_el = g.select_one(".VwiC3b, .IsZvec")
if title_el and link_el:
results.append({
"title": title_el.text,
"url": link_el["href"],
"snippet": snippet_el.text if snippet_el else ""
})
return results
Why This Is Hard
Google's anti-bot detection is best-in-class:
- reCAPTCHA v3 runs silently and scores your behavior
- IP reputation tracking — datacenter IPs are blocked almost immediately
-
TLS fingerprinting detects Python's
requestslibrary - Behavioral analysis — uniform request timing is a dead giveaway
- Cookie and session tracking — missing cookies trigger blocks
You absolutely need residential proxies for this. ThorData provides residential proxy pools that rotate IPs automatically:
# Using ThorData residential proxy
proxy = "http://user:pass@proxy.thordata.com:9090"
results = scrape_google("python web frameworks", proxy=proxy)
For better fingerprinting, consider ScrapeOps which provides both proxy rotation and fake browser header management — they generate realistic browser fingerprints that help avoid detection.
Realistic Success Rate
Even with good proxies, expect:
- Without proxies: Blocked after 5-10 queries
- Datacenter proxies: ~30-50% success rate
- Residential proxies: ~85-95% success rate
- SERP API services: ~99%+ success rate
Parsing SERP Features
If you're scraping raw HTML (via ScraperAPI or DIY), here's how to extract key SERP features:
def parse_serp_features(soup):
features = {}
# Featured Snippet
featured = soup.select_one(".xpdopen, .ifM9O")
if featured:
features["featured_snippet"] = featured.get_text(strip=True)
# People Also Ask
paa = soup.select(".related-question-pair")
features["people_also_ask"] = [
q.get_text(strip=True) for q in paa
]
# Related Searches
related = soup.select(".k8XOCe")
features["related_searches"] = [
r.get_text(strip=True) for r in related
]
# Knowledge Panel
kp = soup.select_one(".kp-wholepage")
if kp:
title = kp.select_one(".qrShPb")
features["knowledge_panel"] = {
"title": title.text if title else None
}
return features
Warning: Google changes these CSS selectors frequently. Plan to update your parser every few weeks.
Which Method Should You Choose?
| Factor | Custom Search API | SerpAPI | ScraperAPI | DIY |
|---|---|---|---|---|
| Setup difficulty | Easy | Easy | Easy | Hard |
| Data completeness | Basic | Full SERP | Full (raw HTML) | Full (raw HTML) |
| Reliability | 99.9% | 99%+ | 98%+ | 60-90% |
| Cost (1K queries) | $5 | ~$50 | ~$30 | Proxy costs |
| Maintenance | None | None | Parser updates | Constant |
| Legal risk | None | Low | Low | Medium |
For most use cases, a dedicated SERP API (SerpAPI for structured data, ScraperAPI for flexibility) is the right choice. The cost savings of DIY scraping rarely justify the maintenance burden.
For low-volume needs (under 100 queries/day), the official Custom Search API is free and perfectly adequate.
For enterprise-scale rank tracking, you'll want SerpAPI or a similar dedicated service with built-in scheduling and historical data storage.
Ethical Considerations
Before scraping Google at scale, consider:
- Google's ToS prohibit automated queries
- Excessive scraping wastes Google's resources
- Your IP or proxy provider could get banned
- SERP data collected through scraping may not be redistributable
Use official APIs when they meet your needs. When they don't, use scraping services that handle compliance. Reserve DIY scraping for research and prototyping.
Need proxy infrastructure for web scraping? Check out ScraperAPI for managed scraping, ScrapeOps for proxy management, or ThorData for residential proxies.
Top comments (0)