DEV Community

Scrapfly for Scrapfly

Posted on • Originally published at scrapfly.io on

How to Scrape Google Search with Python

How to Scrape Google Search with Python

In this web scraping tutorial, we'll be taking a look at how to scrape Google search results using Python.

SERP is a common industry term for search result scraping and this is mostly used in SEO and brand awareness areas of data acquisition. By scraping google we can keep track how our products are performing and even optimize our pages to rank higher on Google.

Scraping Google can be difficult as Google uses a lot of obfuscation and anti-scraping technologies so we'll dive into several technical points like URL formatting, dynamic HTML parsing and how to avoid being blocked.

In this article, we'll approach scraping using Python with traditional tools like HTTP client and HTML parser as well as ScrapFly-SDK.

Why Scrape Google Search?

Google search is probably the biggest public database on the internet and it's a great source of data for many use cases. Since google indexes most of the public web we have access to summaries of many popular data fields which can be used in many ways. From market research to SEO, there are many use cases for scraping Google search results.

Another popular use case is SEO (Search Engine Optimization) where Google search is being scraped to see what keywords your competitors are ranking for and how well they are ranking. In other words, knowing search results can be a great way to get insights into your competitors and your market performance.

Google search also features a snippets system that summarizes data from popular sources like IMDb, Wikipedia, etc. This can be used to scrape data from popular sources without having to scrape them directly.

Setup

In this tutorial we'll be using Python with a few popular community packages:

  • httpx as our HTTP client which we'll use to retrieve search results HTMLs.
  • parsel as our HTML parser. Since Google uses a lot of dynamic HTML we'll be using some clever XPath selectors to find the result data.

There are many popular alternatives to these two packages like beautifulsoup is a popular alternative to parsel, however since Google pages can be difficult to parse we'll be using parsel's XPath selectors which are much more powerful than CSS selectors used by beautifulsoup.

For the HTTP client we chose httpx as it's capable of HTTP/2 which helps to avoid blocking. Though using other clients like requests or aiohttp is also possible.

🤖 Google is notorious for blocking web scraping so to follow along make sure to space out your requests to a few requests per minute to avoid being blocked. See the blocking section for more.

Alternatively, this blog also provides code using ScrapFly SDK which solves many of the problems we'll be discussing in this tutorial automatically.

Scraping Search Results

Let's start by scraping the first page of Google search results for the query "scrapfly blog". To start, let's take a look at what happens when we input this query into Google search.

We can see that once we input the query we are taken to the search results URL that looks like google.com/search?hl=en&q=scrapfly%20blog.

So, the search is using the /search endpoint and the query is being passed as a q parameter. The hl parameter is the language code and we can see that it's set to en which means English.

This URL will get us the page but how do we parse it for the search results? For that, we'll be using XPath selectors and since Google uses dynamic HTML we'll follow the heading elements:

How to Scrape Google Search with Python
Simplified Google search page structure

While Google uses dynamic HTML we can still rely on relative structure for scraping. We can select <h3> elements and treat them as containers for each search result. Let's try it out:

# Python
from collections import defaultdict
from urllib.parse import quote
from httpx import Client
from parsel import Selector

# 1. Create HTTP client with headers that look like a real web browser
client = Client(
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
    },
    follow_redirects=True,
    http2=True, # use HTTP/2 
)

def parse_search_results(selector: Selector):
    """parse search results from google search page"""
    results = []
    for box in selector.xpath("//h1[contains(text(),'Search Results')]/following-sibling::div[1]/div"):
        title = box.xpath(".//h3/text()").get()
        url = box.xpath(".//h3/../@href").get()
        text = "".join(box.xpath(".//div[@data-content-feature=1]//text()").getall())
        if not title or not url:
            continue
        url = url.split("://")[1].replace("www.", "")
        results.append(title, url, text)
    return results

def scrape_search(query: str, page=1):
    """scrape search results for a given keyword"""
    # retrieve the SERP
    url = f"https://www.google.com/search?hl=en&q={quote(query)}" + (f"&start={10*(page-1)}" if page > 1 else "")
    print(f"scraping {query=} {max_pages=}")
    results = defaultdict(list)
    response = client.get(url)
    assert response.status_code200,f"failed status_code={response.status_code}"
    # parse SERP for search result data
    selector = Selector(response.text)
    results["search"].extend(parse_search_results(selector))
    return dict(results)

# example use: scrape 3 pages: 1,2,3
for page in [1, 2, 3]:
    results = scrape_search("scrapfly blog", page=page)
    for result in results["search"]:
        print(result)


from collections import defaultdict
from urllib.parse import quote
from parsel import Selector
from scrapfly import ScrapeConfig, ScrapflyClient

scrapfly = ScrapflyClient("YOUR SCRAPFLY KEY")

def parse_search_results(selector: Selector):
    """parse search results from google search page"""
    results = []
    for box in selector.xpath("//h1[contains(text(),'Search Results')]/following-sibling::div[1]/div"):
        title = box.xpath(".//h3/text()").get()
        url = box.xpath(".//h3/../@href").get()
        text = "".join(box.xpath(".//div[@data-content-feature=1]//text()").getall())
        if not title or not url:
            continue
        url = url.split("://")[1].replace("www.", "")
        results.append((title, url, text))
    return results

def scrape_search(query: str, page=1, country="US"):
    """scrape search results for a given keyword"""
    # retrieve the SERP
    url = f"https://www.google.com/search?hl=en&q={quote(query)}" + (f"&start={10*(page-1)}" if page > 1 else "")
    print(f"scraping {query=} {page=}")
    results = defaultdict(list)
    result = scrapfly.scrape(ScrapeConfig(url, country=country, asp=True))
    # parse SERP for search result data
    results["search"].extend(parse_search_results(result.selector()))
    return dict(results)

# Example use: scrape 3 pages: 1,2,3
for page in [1, 2, 3]:
    results = scrape_search("scrapfly blog", page=page)
    for result in results["search"]:
        print(result)

Enter fullscreen mode Exit fullscreen mode

Example Output:

('Blog - ScrapFly', 'scrapfly.io/blog/', 'Scrapfly - Web Scraping API - Headless browser. Blog on everything web scraping: tutorials, guides, highlights and industry observations.Complete web scraping tutorials for specific web scraping targets like yelp.com,\xa0...')
('Scrapecrow', 'scrapecrow.com/', 'Educational blog about web-scraping, crawling and related data extraction ... a year of professional web scraping blogging at ScrapFly and my key takeaways.')
('Scrapfly Web Scraping API free alternatives service', 'freestuff.dev/alternative/scrapfly-web-scraping-api/', 'Scrapfly is a Web Scraping API providing residential proxies, headless browser to extract data and bypass captcha / anti bot vendors. Tag: scraping, crawling.')
('Scrapfly | Software Reviews & Alternatives - Crozdesk', 'crozdesk.com/software/scrapfly', "Scrapfly Review: 'Simple but powerful Web Scraping API - We provide fully managed web scraping through a simple REST API.'")
('Issues · scrapfly/python-scrapfly - GitHub', 'github.com/scrapfly/python-scrapfly/issues', 'Scrapfly Python SDK for headless browsers and proxy rotation - Issues · scrapfly/python-scrapfly.install: python -m pip install --user --upgrade setuptools wheel. python -m pip install --user --upgrade twine pdoc3 colorama. bump: sed -i "1s/.')
('Web scraping with Python open knowledge | Hacker News', 'news.ycombinator.com/item?id=31531694', 'I recently joined a brilliant web scraping API company called ScrapFly, who provided me with the ... 1 - https://scrapfly.io/blog/parsing-html-with-xpath/.')
('Scrapfly API and Tiktok Unofficial API integrations - Meta API', 'dashboard.meta-api.io/apis/scrapfly/integrations/tiktok-unofficial', 'Connect Scrapfly & Tiktok Unofficial to sync data between apps and create easy to maintain APIs integrations without losing control.')
('Scrapfly - Crunchbase Company Profile & Funding', 'crunchbase.com/organization/scrapfly', 'Contact Email contact@scrapfly.io. Simple but powerful Web Scraping API - We provide fully managed web scraping through a simple REST API.')
...

Enter fullscreen mode Exit fullscreen mode

In the example above we wrote a short google search python scraper. We first created a httpx client with headers that imitate a web browser to prevent being blocked by Google.

Then, we defined our parse_search_results function that parses search results from a given SERP. Note that for parsing we use XPath selectors that use heading text matching instead of usual class or id matching. This is because Google uses dynamic HTML and we can't rely on static class names.

Finally, we defined our scrape_search function that takes a query and page number and returns a list of search results. We can use this function to scrap google results of a given query. So, let's take it for a spin and do some SEO analytics next!

SEO Rankings

Now that we can scrape SERP let's take a look at how we can use this data in SEO practices.

To start, we can use this data to determine our position in search results for given queries or keywords. For example, let's say we want to see how well our blog post about web scraping instagram is ranking for the keyword/query "scrape instagram":

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Python

ScrapFly SDK

import re

def check_ranking(keyword: str, url_match: str, max_pages=3):
    """check ranking of a given url (partial) for a given keyword"""
    rank = 1
    for page in range(1, max_pages + 1):
        results = scrape_search(keyword, page=page)
        for (title, result_url, text) in results["search"]:
            if url_match in result_url:
                print(f"rank found:\n {title}\n {text}\n {result_url}")
                return rank
            rank += 1
    return None

check_ranking(
    keyword="scraping instagram", 
    url_match="scrapfly.com/blog/",
)


import re

def check_ranking(keyword: str, url_match: str, max_pages=3, country="US"):
    """check ranking of a given url (partial) for a given keyword"""
    rank = 1
    for page in range(1, max_pages + 1):
        results = scrape_search(keyword, page=page, country=country)
        for (title, result_url, text) in results["search"]:
            if url_match in result_url:
                print(f"rank found:\n {title}\n {text}\n {result_url}")
                return rank
            rank += 1
    return None

check_ranking(
    keyword="scraping instagram", 
    url_match="scrapfly.com/blog/",
    country="US",
)

Enter fullscreen mode Exit fullscreen mode

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Above, we use our previously defined Google SERP scraper (the scrape_search function) to collect search result data from SERPs until we encounter our blog post. In real life, we would run this Google scraper once in a while to collect our search engine performance. We can also use this data to confirm that search result titles and descriptions appear as we want them to and adjust our content accordingly.

Scraping Keyword Data

Big part of SEO is keyword research - understanding what people are searching for and how to optimize our content for those queries.

When it comes to Google scraping the "People Also Ask" and "Related Searches" sections can be used in keyword research:

How to Scrape Google Search with Python

Above, judging by the "Related Searches" section, we can see that people who are interested in scraping instagram are also interested in doing it with Python or R. In the "People also ask" section we can also see that people are interested in how to scrape Instagram profiles and the legality of scraping it. It's easy to imagine how these two data fields can be useful in keyword research!

Let's take a look at how we can scrape them by continuing with our research for web scraping instagram article:

Python

ScrapFly SDK

from collections import defaultdict
import json
from urllib.parse import quote
from httpx import Client
from parsel import Selector

# 1. Create HTTP client with headers that look like a real web browser
client = Client(
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
    },
    follow_redirects=True,
    http2=True,
)

def parse_related_search(selector: Selector):
    """get related search keywords of current SERP"""
    results = []
    for suggestion in selector.xpath(
        "//div[re:test(div/div/span/text(),'related searches','i')]/following-sibling::div//a"
    ):
        results.append("".join(suggestion.xpath(".//text()").getall()))
    return results

def parse_people_also_ask(selector: Selector):
    """get people also ask questions of current SERP"""
    return selector.css(".related-question-pair span::text").getall()

def scrape_search(query: str, page=1):
    """scrape search results for a given keyword"""
    # retrieve the SERP
    url = f"https://www.google.com/search?hl=en&q={quote(query)}" + (f"&start={10*(page-1)}" if page > 1 else "")
    print(f"scraping {query=} {page=}")
    results = defaultdict(list)
    response = client.get(url)
    assert response.status_code == 200, f"failed status_code={response.status_code}"
    # parse SERP for search result data
    selector = Selector(response.text)
    results["related_search"].extend(parse_related_search(selector))
    results["people_also_ask"].extend(parse_people_also_ask(selector))
    return dict(results)

# Example use: 
results = scrape_search("scraping instagram")
print(json.dumps(results, indent=2))


from collections import defaultdict
import json
from urllib.parse import quote
from parsel import Selector
from scrapfly import ScrapeConfig, ScrapflyClient

scrapfly = ScrapflyClient("YOUR SCRAPFLY KEY")

def parse_related_search(selector: Selector):
    """get related search keywords of current SERP"""
    results = []
    for suggestion in selector.xpath(
        "//div[re:test(div/div/span/text(),'related searches','i')]/following-sibling::div//a"
    ):
        results.append("".join(suggestion.xpath(".//text()").getall()))
    return results

def parse_people_also_ask(selector: Selector):
    """get people also ask questions of current SERP"""
    return selector.css(".related-question-pair span::text").getall()

def scrape_search(query: str, page=1, country="US"):
    """scrape search results for a given keyword"""
    # retrieve the SERP
    url = f"https://www.google.com/search?hl=en&q={quote(query)}" + (f"&start={10*(page-1)}" if page > 1 else "")
    print(f"scraping {query=} {page=}")
    results = defaultdict(list)
    result = scrapfly.scrape(ScrapeConfig(url, country=country, asp=True))
    # parse SERP for search result data
    results["related_search"].extend(parse_related_search(result.selector))
    results["people_also_ask"].extend(parse_people_also_ask(result.selector))
    return dict(results)

# Example use: 
results = scrape_search("scraping instagram", country="US")
print(json.dumps(results, indent=2))

Enter fullscreen mode Exit fullscreen mode

Example Output

{
  "related_search": [
    "scraping instagram with python",
    "is scraping instagram legal",
    "instagram scraping api",
    "scraping instagram data with r",
    "instagram-scraper python github",
    "instagram comment scraper python",
    "instagram scraper free",
    "instagram-scraper github"
  ],
  "people_also_ask": [
    "Does Instagram allow scraping?",
    "What does scraping Instagram mean?",
    "How do you scrape an Instagram account?",
    "Can you scrape Instagram with Python?"
  ]
}

Enter fullscreen mode Exit fullscreen mode

Above, we defined two functions to parse related searches and related questions which we can use in SEO keyword research.

Scraping Rich Results

Google also offers rich results in the form of snippets. These are summaries of popular data sources like Wikipedia, IMDb, etc. For example, here's Google's own snippet about itself:

How to Scrape Google Search with Python
rich company overview results are available on the right

Snippet can aggregate data from multiple sources in a concise, predictable format so it's a popular web scraping target as we can easily gather information about popular targets like companies, public figures and bodies of work. Let's take a look how to scrape them:

Python

ScrapFly

from parsel import Selector
from httpx import Client

client = Client(
    headers={
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
    },
    follow_redirects=True,
)

def parse_search_snippet(selector: Selector):
    snippet = selector.xpath("//h2[re:test(.,'complementary results','i')]/following-sibling::div[1]")
    data = {
        "title": snippet.xpath(".//*[@data-attrid='title']//text()").get(),
        "subtitle": snippet.xpath(".//*[@data-attrid='subtitle']//text()").get(),
        "website": snippet.xpath(".//a[@data-attrid='visit_official_site']/@href").get(),
        "description": snippet.xpath(".//div[@data-attrid='description']//span//text()").get(),
        "description_more_link": snippet.xpath(".//div[@data-attrid='description']//@href").get(),
    }
    # get summary info rows
    data["info"] = {}
    for row in snippet.xpath(".//div[contains(@class,'__wholepage-card')]//div[re:test(@data-attrid, 'kc:|hw:')]"):
        text = "".join(row.xpath(".//text()").getall()).strip()
        label, value = text.split(": ")
        data["info"][label.lower()] = value.strip()
    # get social media links
    data["socials"] = {}
    for profile in snippet.xpath(".//div[@data-attrid='kc:/common/topic:social media presence']//g-link/a"):
        label = profile.xpath(".//text()").get()
        url = profile.xpath(".//@href").get()
        data["socials"][label] = url
    return data

def scrape_search(query: str, page=1):
    """scrape search results for a given keyword"""
    url = f"https://www.google.com/search?hl=en&q={quote(query)}" + (f"&start={10*(page-1)}" if page > 1 else "")
    print(f"scraping {query=} {max_pages=}")
    results = defaultdict(list)
    response = client.get(url)
    assert response.status_code == 200, f"failed status_code={response.status_code}"
    selector = Selector(response.text)
    results["search"].extend(parse_search_results(selector))
    results["rich_snippets"] = parse_search_snippet(selector)
    return dict(results)

# example:
print(scrape_search("google")["rich_snippets"])


from collections import defaultdict
from urllib.parse import quote
from parsel import Selector
from scrapfly import ScrapeConfig, ScrapflyClient

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY API KEY")

def parse_search_snippet(selector: Selector):
    """parse rich snippet data from google SERP"""
    snippet = selector.xpath("//h2[re:test(.,'complementary results','i')]/following-sibling::div[1]")
    data = {
        "title": snippet.xpath(".//*[@data-attrid='title']//text()").get(),
        "subtitle": snippet.xpath(".//*[@data-attrid='subtitle']//text()").get(),
        "website": snippet.xpath(".//a[@data-attrid='visit_official_site']/@href").get(),
        "description": snippet.xpath(".//div[@data-attrid='description']//span//text()").get(),
        "description_more_link": snippet.xpath(".//div[@data-attrid='description']//@href").get(),
    }
    # get summary info rows
    data["info"] = {}
    for row in snippet.xpath(".//div[contains(@class,'__wholepage-card')]//div[re:test(@data-attrid, 'kc:|hw:')]"):
        text = "".join(row.xpath(".//text()").getall()).strip()
        label, value = text.split(": ")
        data["info"][label.lower()] = value.strip()
    # get social media links
    data["socials"] = {}
    for profile in snippet.xpath(".//div[@data-attrid='kc:/common/topic:social media presence']//g-link/a"):
        label = profile.xpath(".//text()").get()
        url = profile.xpath(".//@href").get()
        data["socials"][label] = url
    return data

def scrape_search(query: str, page=1, country="US"):
    """scrape search results for a given keyword and country"""
    url = f"https://www.google.com/search?hl=en&q={quote(query)}" + (f"&start={10*(page-1)}" if page > 1 else "")
    print(f"scraping {query=} {page=}")
    results = defaultdict(list)
    result = scrapfly.scrape(ScrapeConfig(url=url, country=country, asp=True))
    results["rich_snippets"] = parse_search_snippet(result.selector)
    return dict(results)

# example:
print(scrape_search("google", country="US")["rich_snippets"])

Enter fullscreen mode Exit fullscreen mode

Example Output

{
  "title": "Google",
  "subtitle": "Technology company",
  "website": "http://www.google.com/",
  "description": "Google LLC is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics.",
  "description_more_link": "https://en.wikipedia.org/wiki/Google",
  "info": {
    "founders": "Larry Page, Sergey Brin",
    "ceo": "Sundar Pichai (Oct 2, 2015\u2013)",
    "parent organization": "Alphabet Inc.",
    "founded": "September 4, 1998, Menlo Park, CA",
    "headquarters": "Mountain View, CA",
    "subsidiaries": "YouTube, Kaggle, Mandiant, Firebase, MORE"
  },
  "socials": {
    "Twitter": "https://twitter.com/Google",
    "Facebook": "https://www.facebook.com/Google",
    "LinkedIn": "https://www.linkedin.com/company/google",
    "YouTube": "https://www.youtube.com/c/google",
    "Instagram": "https://www.instagram.com/google"
  }
}

Enter fullscreen mode Exit fullscreen mode

In this example, we collected details from the rich company overview snippet. For parsing this we relied on headings and data- attributes which can be reliably used to parse dynamic HTML documents. Note that rich snippets vary highly depending on subject and in our example we only cover one kind of rich snippet. However, most of the scraping logic can be reused for other details.

Google offers many different kinds of rich snippets and they can be scraped in a similar way.

Blocking and Geo-Targeting

Our demo google search scraper covered in this article works great with two exceptions:

  • We have no way to specify search results for a specific country.
  • If we scale them up Google will start blocking us.

Unfortunately, the only way to see results of a specific country is to use a proxy IP address or a web scraping API like ScrapFly.

ScrapFly acts a middleware between your scraper and your target automatically retrieving hard to reach content for you. It does this by employing millions of different proxies and smart request routing. So, we can solve both problems by using ScrapFly.

How to Scrape Google Search with Python
ScrapFly service does the heavy lifting for you!

To use ScrapFly in web scraping we can use ScrapFly-SDK which replaces your HTTP client (in our case httpx):

from collections import defaultdict
from urllib.parse import quote
from parsel import Selector
from scrapfly import ScrapeConfig, ScrapflyClient

scrapfly = ScrapflyClient("YOUR SCRAPFLY KEY")

def parse_search_results(selector: Selector):
    """parse search results from google search page"""
    results = []
    for box in selector.xpath("//h1[contains(text(),'Search Results')]/following-sibling::div[1]/div"):
        title = box.xpath(".//h3/text()").get()
        url = box.xpath(".//h3/../@href").get()
        text = "".join(box.xpath(".//div[@data-content-feature=1]//text()").getall())
        if not title or not url:
            continue
        url = url.split("://")[1].replace("www.", "")
        results.append((title, url, text))
    return results

def scrape_search(query: str, page=1, country="US"):
    """scrape search results for a given keyword"""
    # retrieve the SERP
    url = f"https://www.google.com/search?hl=en&q={quote(query)}" + (f"&start={10*(page-1)}" if page > 1 else "")
    print(f"scraping {query=} {page=}")
    results = defaultdict(list)
    result = scrapfly.scrape(ScrapeConfig(url, country=country, asp=True))
    # parse SERP for search result data
    results["search"].extend(parse_search_results(result.selector()))
    return dict(results)

# Example use: scrape 3 pages: 1,2,3
for page in [1, 2, 3]:
    results = scrape_search("scrapfly blog", page=page)
    for result in results["search"]:
        print(result)

Enter fullscreen mode Exit fullscreen mode

By replacing httpx client with scrapfly SDK one we can scrape Google search results for a specific country without worrying about blocking.

FAQ

To wrap our Google search results scraper tutorial, let's take a look at some frequently asked questions:

Is it legal to scrape Google search results?

Yes, it is perfectly legal to scrape Google search results as it's public, non-copyrighted data. However, attention should be paid to copyrighted images and videos which might be included in search results.

Is there a Google search API?

No, there is no official Google search API. However, you can create our own Google scraper with Python and use it as an API by using the methods covered in this tutorial.

How to scrape Google Maps?

Google Maps is a separate service and can be scraped in a similar way as Google search results. For that see our tutorial on How to scrape google maps.

Summary

In this tutorial, we wrote a Google web scraper using Python with a few community packages. For retrieving the SERP content we used httpx which supports http2 and asynchronous connections and to parse the data we used parsel with XPath selectors to extract data from dynamic HTML pages.

The biggest Google scrape challenges can be split into two categories:

  • Parsing complex HTML pages. For that, we used XPath selectors and focused on HTML structures like heading elements and CSS data- attributes which are less likely to change.
  • Blocking and Geo-Targeting. For that, we used ScrapFly which acts as a proxy and middleware between your scraper and the target website.

This article illustrates how easy it is to scrape google results for free using nothing but Python and common web scraping practices.

<!--kg-card-end: markdown--><!--kg-card-begin: html-->{<br> &quot;@context&quot;: &quot;<a href="https://schema.org">https://schema.org</a>&quot;,<br> &quot;@type&quot;: &quot;FAQPage&quot;,<br> &quot;mainEntity&quot;: [<br> {<br> &quot;@type&quot;: &quot;Question&quot;,<br> &quot;name&quot;: &quot;Is it legal to scrape Google search results?&quot;,<br> &quot;acceptedAnswer&quot;: {<br> &quot;@type&quot;: &quot;Answer&quot;,<br> &quot;text&quot;: &quot;<p>Yes, it is perfectly legal to scrape Google search results as it&#39;s public, non-copyrighted data. However, attention should be paid to copyrighted images and videos which might be included in search results.</p>&quot;<br> }<br> },<br> {<br> &quot;@type&quot;: &quot;Question&quot;,<br> &quot;name&quot;: &quot;Is there a Google search API?&quot;,<br> &quot;acceptedAnswer&quot;: {<br> &quot;@type&quot;: &quot;Answer&quot;,<br> &quot;text&quot;: &quot;<p>No, there is no official Google search API. However, we can create our own Google scraper API by using the methods covered in this tutorial.</p>&quot;<br> }<br> },<br> {<br> &quot;@type&quot;: &quot;Question&quot;,<br> &quot;name&quot;: &quot;How to scrape Google Maps?&quot;,<br> &quot;acceptedAnswer&quot;: {<br> &quot;@type&quot;: &quot;Answer&quot;,<br> &quot;text&quot;: &quot;<p>Google Maps is a separate service and can be scraped in a similar way as Google search results. For that see our tutorial on <a class=\"text-reference\" href=\"https://scrapfly.io/blog/how-to-scrape-google-maps/\">How to scrape google maps</a>.</p>&quot;<br> }<br> }<br> ]<br> }<!--kg-card-end: html-->

Top comments (0)