Oxylabs for Oxylabs

Posted on Dec 17, 2025

How to scrape Google AI Mode: Detailed Guide in 2025

#programming #webscraping #ai #python

Introduction

Google AI Mode has emerged as one of the fastest and most comprehensive AI search experiences available. Unlike standalone chatbots like ChatGPT and Claude that rely on their training data, AI Mode uses live Google Search results and a "query fan-out" technique to simultaneously search multiple data sources in real-time. Because both the Gemini AI model and the search infrastructure are developed by Google, the system seamlessly integrates capabilities from Google Search, Lens, and Image search for exceptionally fast performance.

For SEO professionals and businesses, AI Mode represents a critical shift in how users discover content. This emerging field, known as GEO (Generative Engine Optimization), focuses on appearing in AI-generated responses rather than traditional search results. Unlike the classic top 10 rankings, AI Mode draws from a much broader pool of sources, creating opportunities for brands to get featured even if they don't rank on page one. When your brand appears in these AI responses, it can drive traffic, generate qualified leads, and influence purchase decisions at the exact moment users are researching solutions. Tracking AI Mode visibility is quickly becoming as important as monitoring traditional search rankings.

In this article, we'll explore methods for scraping Google AI Mode results. We'll start with building a custom scraper that uses Playwright and proxy servers, then look at a more scalable, production-ready solution that works reliably at scale without constant maintenance.

What Google AI Mode Contains

Let's begin by understanding the information that Google AI Mode provides. It contains the following data points:

Prompt
Answer to your query
Links
Citations and links to the source pages

Most importantly, AI Mode responses vary by region. The same query will return different results depending on whether you're searching from the United States or France. As mentioned previously, all these data points and the ability to localize responses are essential for GEO and AI Search tracking.

In this article, we'll use Python as our primary coding language. The techniques shown can be adapted to other languages as needed. With this background in mind, let's start with the first method: writing custom code.

Challenges of web scraping Google AI Mode

A simple implementation won't work for scraping AI Mode. There are several reasons for this:

Challenge 1: Google's anti-scraping detection

Your code won't work without proxies. Google will almost immediately block requests by presenting a CAPTCHA, which is difficult to bypass. Using a premium proxy service, such as Residential Proxies, will solve most blocking issues.

However, even with proxies, expect challenges. Google's anti-scraping system is particularly sophisticated for AI Mode. Common issues include:

Sometimes Google can still show a CAPTCHA
Page loads can be slow

Challenge 2: Layout changes break everything

Google frequently updates its page layouts and HTML selectors. Your selectors will inevitably break, causing scraping failures.

For occasional scraping, this might be manageable. However, for production use cases where you're processing hundreds of queries daily, constantly updating and maintaining selectors becomes a significant maintenance burden that wastes developers’ time and resources.

Challenge 3: Geo and language mismatches

AI Mode responses are heavily region-dependent, so selecting proxies with the correct geolocation is critical for accurate results.

Some proxy providers allow you to specify the geolocation of the proxy server, making them ideal for this use case. Additionally, you'll need to set the Accept-Language\ header in your requests to match your target locale.

Challenge 4: Longer, high-maintenance code

These challenges result in complex code that requires constant maintenance. You'll need to use high-quality proxies, update broken selectors, and monitor performance. Both Playwright and Selenium are resource-intensive, consuming significant CPU and memory. The maintenance overhead quickly exceeds initial expectations, making custom scrapers impractical for production environments.

Custom AI Mode web scraper

To create a Google AI Mode scraper, there are three popular headless browser tools available: Selenium, Playwright, and Puppeteer. We'll focus on Playwright as it’s popular, easy to use, and offers several advantages for modern web scraping.

You'll need to install the stealth version of Playwright as the main dependency. Run the following command:

pip install playwright-stealth

These challenges, as previously overviewed, make Google AI Mode scraping considerably more complex. The code below works currently, but expect it to break over time due to selector changes, blocking issues, and other factors discussed earlier.

import json

from playwright.sync_api import sync_playwright
from playwright_stealth import Stealth


query = "most comfortable sneakers for running"

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=False,
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-dev-shm-usage",
            "--no-sandbox"
        ],
        # # Uncomment this to use proxies.
        # proxy={
        #     "server": "http://pr.oxylabs.io:7777",
        #     "username": "customer-USERNAME",
        #     "password": "PASSWORD"
        # }
    )
    context = browser.new_context(
        user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36"
    )
    page = context.new_page()
    Stealth().use_sync(page)

    page.goto(f"https://www.google.com/search?q={query.replace(' ', '+')}&udm=50&hl=en&gl=US")
    page.wait_for_load_state("networkidle")

    container = None
    text_content = ""

    candidates = page.locator("#search div, #rso > div, div[role='main'] div").all()
    for candidate in candidates[:30]:
        if not candidate.is_visible():
            continue
        text = candidate.inner_text()
        if len(text) > 200 and "http" not in text[:100]:
            container = candidate
            text_content = text
            break

    if not container:
        match = page.get_by_text(query).first
        if match.is_visible():
            container = match.locator("xpath=./ancestor::div[3]")
            text_content = container.inner_text()

    if not container or len(text_content) < 100:
        container = page.locator("body")
        text_content = page.inner_text("body")

    links = []
    if container:
        main_links = container.locator("a").all()
        for link in main_links:
            href = link.get_attribute("href")
            title = link.inner_text()
            if href and href.startswith("http"):
                links.append({"title": title.strip(), "url": href})

    output_data = {"content": text_content.strip(), "links": list({l["url"]: l for l in links}.values())}

    print(json.dumps(output_data, indent=2))

    with open("ai_mode_data.json", "w") as f:
        json.dump(output_data, f, indent=2)

    browser.close()
print("Done!")

Running the code should save a JSON file that contains the scraped AI Mode response and citations. Remember that a CAPTCHA or other blocks may hinder the execution.

The best solution: AI Mode Scraper API

As you can see, custom code is overly complex, lengthy, and unreliable. It requires a lot of effort and resources to build and maintain such scrapers. A way better approach is to use dedicated services like Oxylabs Web Scraper API.

The API includes built-in support for Google AI Mode scraping. This dramatically simplifies your code by eliminating the need to manage proxies, handle browser rendering, bypass CAPTCHAs, or deal with selector changes. All these challenges are handled by the API.

To use the API, first install the requests library:

pip install requests

The API returns results in a structured JSON format, making integration straightforward. Here's a minimal code example:

import os
import json
import requests


# API parameters.
payload = {
    "source": "google_ai_mode",
    "query": "most comfortable sneakers for running",
    "render": "html",
    "parse": True,
    "geo_location": "United States"
}


response = requests.post(
    "https://realtime.oxylabs.io/v1/queries",
    # Free trial available at dashboard.oxylabs.io
    auth=("USERNAME", "PASSWORD"),
    json=payload
)
response.raise_for_status()
print(response.text)

with open("AI_Mode_scraper_data.json", "w") as f:
    json.dump(response.json(), f, indent=2)

print("Done!")

After executing the code, the saved JSON file should contain something similar (the links are collapsed for brevity):

As you can see, it's very easy to get AI Mode results with citations, links, and the complete AI response text. Moreover, you can scale to hundreds and thousands of requests without worrying about blocks, interruptions, and maintenance.

The key part of using the API is the payload. Let's examine it a little more carefully.

payload = {
    "source": "google_ai_mode",
    "query": "most comfortable sneakers for running",
    "render": "html",
    "parse": True,
    "geo_location": "United States"
}

The source\ sets the scraper to use, in this case google\_ai\_mode\. What’s neat is that with a single subscription, you get access to every other pre-built source of the API, such as Google Search, Amazon, ChatGPT, and many others.

The render\ parameter ensures that instead of getting the plain HTML, the page is first rendered and then the final rendered HTML is received. This is a necessary parameter that guarantees you get every piece of data loaded (static and dynamic) before scraping it.

Moreover, the parse\ parameter enables automatic data parsing, so you don’t have to build your own parsing logic.

If you want to localize results for a specific region, use the geo\_location\ parameter. You can target any country, state, city, or even precise coordinates. For example:

"geo_location": "New York,New York,United States"

For more details, see the AI Mode scraper documentation.

Advantages of using a web scraping API

The Google AI Mode scraper API makes AI response extraction effortless, with no custom code required. Here's why:

No infrastructure to maintain: No browsers to manage, no retry logic to look after, no IP rotation to code yourself. Just send an API request and get your results.
Premium proxies under the hood: The API has built-in proxy servers that are managed by a smart ML-driven engine, handling proxy management and CAPTCHAs for you.
Resilience to Google layout changes: When Google updates its UI, Oxylabs updates its backend. Your code stays untouched.

Final Thoughts

Scraping Google AI Mode can be straightforward or challenging, depending on the approach you choose. Writing your own code gives you full control, but maintenance becomes a burden over time. A custom solution requires smart browser environment management, logic to bypass strict anti-scraping systems, integration of premium proxy servers, custom data parsing, and continuous maintenance, among many other considerations.

The Oxylabs Web Scraper API handles all of these hurdles for you. Just send a request and receive parsed data in seconds. The API also includes pre-built scrapers and parsers for popular sites like Google Search, Amazon, and ChatGPT, so you don't have to build and maintain separate solutions for each website.

Top comments (1)

OnlineProxy • Dec 21 '25

Trying to scrape Google's AI Mode is basically playing cat-and-mouse with their security team. You're dealing with some seriously sophisticated anti-scraping tech, layouts that change on a whim, and the whole geolocation puzzle that keeps shifting underneath you. Sure, you can build your own Playwright-based scraper if you've got the time, resources, and patience for constant firefighting. You'll get total control over how things work, which is cool but honestly? It's a lot of work. You're basically signing up for perpetual maintenance hell as Google keeps leveling up their defenses. Here's the thing tho - commercial APIs actually get the job done better. They handle all the annoying stuff for you - proxy rotation, captcha solving, data parsing - and bundle it into a service that actually scales. Less headaches. So yeah, custom scrapers let you flex and customize exactly how you want. But if you're serious about production-level work and not just tinkering around, go with a dedicated API solution. You'll spend way less time debugging and way more time actually using your data for GEO. It's just smarter business at scale.