This blog was initially posted to Crawlbase Blog
Google is the most used search engine in the world, with over 8.5 billion searches a day. From businesses checking out competitors to researchers studying online trends, Google Search results are a treasure trove of data. By scraping this data you can extract titles, URLs, descriptions, and more and get actionable insights to make better decisions.
Scraping Google, however, is not easy. Its advanced anti-bot measures, frequent updates, JavaScript requirements, and legal considerations make it tough. But Crawlbase Crawling API has got you covered with its built-in Google SERP scraper that handles all these complexities for you.
In this post we will walk you through scraping Google Search results using Python and Crawlbase. Here’s what you will learn:
- Why you need to extract Google Search data.
- What data to extract, titles, links, and snippets.
- Challenges of scraping Google and how Crawlbase makes it easy.
- Using Crawlbase Crawling API to scrape Google SERPs.
Let’s get started.
Key Data Points to Extract from Google Search Results
When scraping Google Search results you should be extracting relevant data. These key data points will help you to analyze trends, improve strategies or feed into AI models. Here’s what to look for:
Understanding the Challenges of Scraping Google
Scraping Google Search results is more complicated than most websites because of Google’s anti-bot measures and technical requirements. Here’s a breakdown of the main challenges and how to tackle them responsibly:
Google’s Anti-Bot Measures
Google has systems in place to block automated bots. Here are some of the challenges:
- CAPTCHAs: Google displays CAPTCHAs to suspicious traffic and stops scraping until resolved.
- IP Blocking: Sending too many requests from the same IP address will get you temporarily or permanently blocked.
- Rate Limiting: Sending too many requests too quickly will trigger Google’s systems and flag your activity as suspicious.
Solution: To overcome these challenges, use the Crawlbase Crawling API with its built-in “google-serp” scraper. This scraper automatically rotates proxies, bypasses CAPTCHAs, and mimics human browsing behavior so you can get the data seamlessly.
Google SERP’s Latest JavaScript Requirement (2025)
As of 2025, Google Search result pages (SERPs) will require JavaScript to be enabled in modern browsers for the search results to load. Without JavaScript, the page will not render, and users (and scrapers) will get an empty page.
Solution: Modern scraping tools like Crawlbase’s “google-serp” scraper handle JavaScript rendering so you can easily get fully rendered Google search results.
Crawlbase Crawling API for Google Search Scraping
Crawlbase Crawling API is the best tool for scraping Google Search results. It handles JavaScript and anti-bot measures. With the built-in Google SERP scraper, you don’t need to configure anything.
Crawlbase Built-in Google SERP Scraper
Crawlbase has a built-in scraper for Google Search results called "google-serp" scraper. This scraper handles JavaScript and bot protections automatically, so scraping is easy.
Benefits of Using Crawlbase Scrapers
- JavaScript Rendering: Handles JavaScript pages.
- Anti-Bot Bypass: Avoids CAPTCHAs and blocks.
- Pre-configured Google SERP Scraper: Scrapes with a ready-to-go scraper.
- IP Rotation & Error Handling: Reduces the risk of being blocked and ensures data collection.
With Crawlbase, scraping Google Search results is a breeze.
Setting Up Your Python Environment
Before you start scraping Google Search results, you’ll need to set up your Python environment. This section will walk you through installing Python, downloading the Crawlbase Python library, and choosing the best IDE for web scraping.
Getting Started with Crawlbase
- Sign Up for Crawlbase To use the Crawlbase Crawling API, sign up on the Crawlbase website. After signing up, you’ll get your API tokens from the dashboard.
- Obtain Your API Token Once you've signed up, you will receive two types of API tokens: a Normal Token for static websites and a JS Token for JavaScript-heavy websites. For scraping Google Search results with the 'google-serp' scraper, you can use the Normal Token.
Installing Python and Required Libraries
If you don’t have Python installed, go to python.org and download the latest version for your operating system. Follow the installation instructions.
After installing Python, you need to install the Crawlbase library. Use the following commands to install Crawlbase:
https://crawlbase.com
Choosing the Right IDE for Scraping
For web scraping, choosing the right Integrated Development Environment (IDE) is important for your workflow. Here are some options:
- VS Code: Lightweight with many Python extensions.
- PyCharm: Feature-rich IDE with good support for Python and web scraping.
- Jupyter Notebook: Great for prototyping and data analysis in an interactive environment.
Pick one that's fit you and you’re ready to start scraping Google Search results!
Scraping Google Search Results
In this section, we'll show you how to scrape Google Search results using Python, leveraging the Crawlbase Crawling API to handle JavaScript rendering and bypass anti-bot measures. We'll also cover pagination and storing the scraped data in a JSON file.
Writing Google SERP Scraper
To scrape Google Search results, we will use the “google-serp” scraper provided by the Crawlbase Crawling API. This scraper handles all the heavy lifting, including rendering JavaScript and bypassing CAPTCHA challenges.
Here's how to write a simple Google SERP scraper using Python:
from crawlbase import CrawlingAPI
# Initialize Crawlbase API
crawling_api = CrawlingAPI({'token': 'YOUR_CRAWLBASE_TOKEN'})
def scrape_google_results(query, page):
url = f"https://www.google.com/search?q={query}&start={page * 10}"
options = {'scraper': 'google-serp'}
response = crawling_api.get(url, options)
if response['headers']['pc_status'] == '200':
response_data = json.loads(response['body'].decode('latin1'))
return response_data.get('body', {})
else:
print("Failed to fetch data.")
return {}
The scrape_google_results
function takes a search query and a page number as inputs, constructs a Google search URL, and sends a request to the Crawlbase API using the built-in “google-serp” scraper. If the response is successful (status code 200), it parses and returns the search results in JSON format; otherwise, it prints an error message and returns an empty list.
Handling Pagination
Pagination is essential when scraping multiple pages of search results. Google paginates its results in sets of 10, so we need to iterate through pages by adjusting the start
parameter in the URL.
Here’s how you can handle pagination while scraping Google:
def scrape_all_pages(query, max_pages):
all_results = []
for page in range(max_pages):
print(f"Scraping page {page + 1}...")
page_results = scrape_google_results(query, page)
if not page_results: # Stop if no more results are found
print("No more results, stopping.")
break
all_results.append(page_results)
return all_results
This function loops through pages starting from page 1 up to the max_pages
limit. If no results are returned, it stops the scraping process.
Storing Scraped Data in a JSON File
Once you’ve gathered the data, you can store it in a structured JSON format for easy access and analysis. Below is a function that saves the scraped results to a .json
file.
import json
def save_to_json(data, filename):
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4)
print(f"Data saved to {filename}")
This function saves the scraped data to a file with the specified filename, ensuring the data is properly formatted.
Complete Code Example
Here’s the complete code that puts everything together:
from crawlbase import CrawlingAPI
import json
# Initialize Crawlbase API
crawling_api = CrawlingAPI({'token': 'YOUR_CRAWLBASE_TOKEN'})
def scrape_google_results(query, page):
url = f"https://www.google.com/search?q={query}&start={page * 10}"
options = {'scraper': 'google-serp'}
response = crawling_api.get(url, options)
if response['headers']['pc_status'] == '200':
response_data = json.loads(response['body'].decode('latin1'))
return response_data.get('body', {})
else:
print("Failed to fetch data.")
return {}
def scrape_all_pages(query, max_pages):
all_results = []
for page in range(max_pages):
print(f"Scraping page {page + 1}...")
page_results = scrape_google_results(query, page)
if not page_results: # Stop if no more results are found
print("No more results, stopping.")
break
all_results.append(page_results)
return all_results
def save_to_json(data, filename):
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4)
print(f"Data saved to {filename}")
# Example usage
if __name__ == "__main__":
query = "web scraping tools"
max_pages = 2
results = scrape_all_pages(query, max_pages)
save_to_json(results, "google_search_results.json")
Example Output:
[
{
"ads": [],
"peopleAlsoAsk": [],
"snackPack": {
"mapLink": "",
"moreLocationsLink": "",
"results": ""
},
"searchResults": [
{
"position": 1,
"title": "Web Scraper - The #1 web scraping extension",
"postDate": "",
"url": "https://webscraper.io/",
"destination": "webscraper.io",
"description": "The most popular web scraping extension. Start scraping in minutes. Automate your tasks with our Cloud Scraper. No software to download, no coding needed."
},
{
"position": 2,
"title": "ParseHub | Free web scraping - The most powerful web scraper",
"postDate": "",
"url": "https://www.parsehub.com/",
"destination": "www.parsehub.com",
"description": "ParseHub is a free web scraping tool. Turn any site into a spreadsheet or API. As easy as clicking on the data you want to extract."
},
.... more
],
"relatedSearches": [
{
"title": "web scraping tools python",
"url": "https://google.com/search?sca_esv=12f4ef73a9b4d288&q=web+scraping+tools+python&sa=X&ved=2ahUKEwis1fmuvJmLAxUiXmwGHW42N3kQ1QJ6BAgIEAE"
},
{
"title": "web scraper",
"url": "https://google.com/search?sca_esv=12f4ef73a9b4d288&q=web+scraper&sa=X&ved=2ahUKEwis1fmuvJmLAxUiXmwGHW42N3kQ1QJ6BAgIEAI"
},
.... more
],
"numberOfResults": null
},
{
"ads": [],
"peopleAlsoAsk": [],
"snackPack": {
"mapLink": "",
"moreLocationsLink": "",
"results": ""
},
"searchResults": [
{
"position": 1,
"title": "What is the best, free, web scraping tool? : r/webscraping - Reddit",
"postDate": "",
"url": "https://www.reddit.com/r/webscraping/comments/zg93ht/what_is_the_best_free_web_scraping_tool/",
"destination": "www.reddit.com â?º webscraping â?º comments â?º what_is_the_best_free_web...",
"description": "8 гÑ?Ñ?д. 2022 Ñ?. · I'm looking for a free web scraping tool that can scrape from multiple sources and pair data sets. Any recommendations?"
},
{
"position": 2,
"title": "15 Web Scraping Tools (Plus Applications and Purpose) | Indeed.com",
"postDate": "",
"url": "https://www.indeed.com/career-advice/career-development/web-scraping-tools",
"destination": "www.indeed.com â?º ... â?º Career development",
"description": "15 Ñ?еÑ?п. 2024 Ñ?. · In this article, we explore what web scraping tools are, their purpose, their applications and a list of some web scraping tools you can consider."
},
.... more
],
"relatedSearches": [
{
"title": "Web scraping",
"url": "https://google.com/search?sca_esv=12f4ef73a9b4d288&q=Web+scraping&sa=X&ved=2ahUKEwjA0oaxvJmLAxW2HhAIHXghBcc4ChDVAnoECAQQAQ"
},
{
"title": "Octoparse",
"url": "https://google.com/search?sca_esv=12f4ef73a9b4d288&q=Octoparse&sa=X&ved=2ahUKEwjA0oaxvJmLAxW2HhAIHXghBcc4ChDVAnoECAQQAg"
},
.... more
],
"numberOfResults": null
}
]
Final Thoughts
Scraping Google Search results is good for SEO, market research, competitor analysis, and AI projects. With Crawlbase Crawling API you can bypass JavaScript rendering and anti-bot measures and make Google scraping easy and fast.
Using the built-in Crawlbase “google-serp” scraper, you can get search results without any configuration. This tool, along with its IP rotation and error-handling features, will make data extraction smooth.
Top comments (0)