Devil Scrapes

Posted on Jun 1

Google Autocomplete Scraper: bulk keyword research for $1.05/1K

#webscraping #python #apify #seo

Quick answer: Google Autocomplete suggestions come from the public Suggest endpoint at suggestqueries.google.com/complete/search — the same URL your browser hits when you type. There is no official bulk API. A Google autocomplete scraper fans your seed list across every language and country combination, optionally appends A–Z suffixes for long-tail discovery, and returns one typed JSON row per suggestion. The Apify Actor below does all of that for $0.001 per suggestion (~$1.05 per 1,000), with fingerprint rotation, retry logic, and Pydantic-validated rows handled for you.

Every keyword-research SaaS you have paid for — KeywordTool.io, AnswerThePublic, Ubersuggest, SE Ranking — wraps the same Google Suggest endpoint, adds a dashboard, and charges $29–249 per month. The underlying call is a single unauthenticated GET request any browser makes hundreds of times a day. Yet a scraper that survives Google's fingerprint inspection, multi-locale fan-out, A-Z expansion, rate-limit backoff, and response-shape variance is a solid afternoon of work. Here is how we did it.

What is Google Suggest? 🔎

Google Suggest (the Google Autocomplete endpoint) populates the dropdown while you type in Google Search. Each character fires a GET request to suggestqueries.google.com/complete/search with your partial query, a language hint (hl), a country hint (gl), and a client type. Google responds in under 100ms with up to 15 completions — the most common full queries matching your prefix.

The data reflects real, aggregated search behaviour, not editorial choices — what an audience actually types, including language-specific phrasing differences that neither Ahrefs nor Semrush makes easy to pull at scale.

Does Google Autocomplete have an official bulk API?

No. As of 2026, Google publishes no documented bulk API for autocomplete suggestions. The Suggest endpoint is semi-documented — referenced in browser source and older developer guides — but there are no API keys, no OAuth scope, no rate-limit docs, and no bulk-export mode. If you want bulk data across multiple seeds, languages, and countries, you build a scraper — or you use one.

What the data looks like

One Suggest call returns up to 15 suggestions, each a flat denormalised row. A real output row from the chrome client:

{
  "seed": "best laptop",
  "suggestion": "best laptop 2026",
  "position": 1,
  "relevance": 900,
  "subtype": "QUERY",
  "language": "en",
  "country": "us",
  "client": "chrome",
  "is_expansion": false,
  "parent_seed": null,
  "scraped_at": "2026-05-16T12:00:00.000Z"
}

Eleven fields on every row. The relevance score is Google's own google:suggestrelevance integer — only the chrome client returns it. The is_expansion and parent_seed fields track whether a row came from a direct seed or an A–Z expansion variant (parent_seed echoes the original seed, e.g. "best laptop" for an expanded "best laptop a" row). The structure drops straight into Pandas, BigQuery, or a vector store.

The naive approach (and why it falls apart) ⚠️

The intuitive first attempt: open Chrome DevTools, copy the Suggest XHR, replay it with requests.get(), loop. It works the first time and then breaks in interesting ways.

1. TLS fingerprinting. Google's Suggest endpoint inspects the JA3/JA4 signature of your TLS handshake. Python's requests or httpx produce a fingerprint that matches no real browser — the server returns 429s or empty responses before it reads your query. We impersonate a real Chrome 131 TLS + HTTP/2 fingerprint via curl-cffi's AsyncSession(impersonate="chrome131"), replaying the exact ClientHello, ALPN extension order, and HTTP/2 SETTINGS frame Chrome sends. The handshake looks like Chrome because, functionally, it is one.

2. Response-shape variance. The chrome client returns a 5-element array [query, suggestions[], descriptions[], urls[], extras{}]; firefox returns 3. But the extras dictionary's position drifts with the headers you send — we verified it moved from index 3 to index 4 between two request configurations. A parser that trusts a fixed index silently drops all relevance scores. We scan the trailing elements for the extras dict instead, so the parser survives future layout changes without emitting broken rows.

3. Multi-locale fan-out at scale. 100 seeds × 5 languages × 3 countries × 27 alphabet variants = 40,500 API calls. Uncapped, Google rate-limits the session; too slow, large runs take hours. We run at 2 in-flight requests and honour Retry-After headers. On 408 / 429 / 5xx we retry with exponential backoff (start 2s, double, cap 30s, up to 5 attempts) and rotate a fresh session if the block persists. We rotate browser fingerprints across Chrome and Firefox profiles, and thread Apify residential proxies when useProxy is on. No data, no charge — failed tuples never hit your bill.

None of that is exotic. All of it is the gap between a script that works on Tuesday and a feed that still works on Friday.

The Actor ⚙️

The result is packaged as an Apify Actor: Google Autocomplete Bulk Scraper. Paste seeds into the Apify Console form, or drive it with the Python client:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("DevilScrapes/google-autocomplete-bulk").call(
    run_input={
        "seeds": ["best laptop", "best espresso machine", "learn python"],
        "languages": ["en", "de"],
        "countries": ["us", "de"],
        "client": "chrome",
        "maxSuggestionsPerSeed": 15,
        "enableAlphabetExpansion": True,
        "useProxy": False,
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["seed"], item["suggestion"], item["relevance"])

This run produces up to 3 × 2 × 2 × 27 × 15 = 4,860 rows. Key input parameters:

Parameter	Default	Description
`seeds`	required	1–500 seed queries
`languages` / `countries`	`["en"]` / `["us"]`	ISO-639-1 / ISO-3166-1 alpha-2 codes
`client`	`chrome`	`chrome` (15 + relevance) or `firefox` (10)
`maxSuggestionsPerSeed`	`10`	Cap per (seed × locale) tuple, 1–15
`enableAlphabetExpansion`	`false`	Also query `<seed> a` through `<seed> z`
`useProxy`	`false`	Apify residential proxy rotation

Use cases 💡

Long-tail SEO content briefs. Enable alphabet expansion on a keyword cluster: content marketing × a–z × en-us returns up to 390 long-tail variants. Sort by relevance descending, then pass to a clustering tool like Keyword Insights or Surfer. Cost for 390 suggestions: under $0.45.

PPC negative-keyword mining. Expand a broad-match seed across the full alphabet and flag suggestions covering competitor brands or adjacent verticals. Pull quarterly, diff against the last run, add the new entries to your exclusion list.

Multi-market localisation research. Query the same product seed (best laptop, meilleur ordinateur, bestes laptop) across en-us, fr-fr, and de-de in one run. The side-by-side comparison surfaces how markets frame the same intent — useful for landing-page copy, PDP titles, and marketplace listings.

Trend monitoring. Schedule a recurring run on a fixed seed list and diff suggestion lists week-over-week. A suggestion appearing at position: 0 that wasn't there last week signals a spike in that query prefix — often a leading indicator before it reaches the SERP or Google Trends.

Pricing — exact numbers 💰

Pay-Per-Event. You pay for suggestion rows that land in your dataset — no data, no charge beyond the $0.05 start fee.

Event	Price (USD)
Actor start (once per run)	$0.05
Per suggestion row written	$0.001

Concrete run costs:

Run	Suggestions	Cost
10 seeds × 1 locale × 10 suggestions	100	$0.15
50 seeds × 2 locales × 10 suggestions	1,000	$1.05
10 seeds × A–Z expansion × 10 suggestions	2,700	$2.75
100 seeds × A–Z expansion × 15 suggestions	40,500	$40.55

For comparison, AnswerThePublic's Individual plan runs $99/month with a 100 queries/day cap; KeywordTool.io Pro starts at $89/month. Both wrap the same endpoint. The Actor gives you raw data with a $0.15 cost floor for a first experiment.

The technically interesting part

The citable detail: Google Suggest's response shape is not a stable contract. Between curl-cffi with default headers and the same request with an explicit Accept-Encoding: gzip header, the extras dictionary moved from index 3 to index 4 in the response array. A parser doing response[3]["google:suggestrelevance"] silently returns None for every relevance score the moment Google's CDN responds with a different layout. The fix — scan from the tail for the first dict — takes thirty seconds and prevents a data-quality regression no integration test would catch. We also wire-validate against four captured response shapes (standard, lean, minimal, malformed) on every parse cycle to catch drift before it reaches your dataset.

Limitations 🚧

Suggestions only, not SERP results. Returns the autocomplete dropdown, not the search results page. For SERP scraping, use a dedicated SERP Actor.
No search-volume data. The Suggest endpoint returns no monthly volume estimates. The relevance score is a Google-internal signal, not a volume proxy — pair it with a keyword-volume tool or Google Keyword Planner.
Locale is a hint, not a guarantee. Google may return general suggestions where locale-specific signal is thin; expect shorter lists for niche seeds in non-English locales.
A–Z expansion only (no question modifiers). Expands seeds with a–z suffixes. Modifiers like why, how, vs, or 0–9 are not built-in — pass them as explicit seeds.
Concurrency capped at 2. Very large jobs (10k+ tuples) take time; budget accordingly.
7-day default storage on Apify FREE tier. Export immediately after the run or name the dataset to bypass the TTL.
No personalised suggestions. Reflects the unauthenticated public endpoint; signed-in personalised suggestions are out of scope.

FAQ ❓

Is scraping the Google Suggest endpoint legal?
The Suggest endpoint is a public, unauthenticated HTTP API designed for browsers and developer tools to embed. This Actor sends the same request your Chrome browser sends hundreds of times a day, at conservative concurrency, collecting no personal data. Always verify your jurisdiction's data-protection rules and Google's Terms of Service before reselling raw suggestion data.

Is there an official Google Suggest API with docs and rate limits?
No. Google publishes no official documentation, quota, or API key requirement for the endpoint. It is public and undocumented — which is exactly why every keyword-research SaaS re-sells the same data: nobody holds a licensed monopoly on it.

Can I export to Google Sheets, BigQuery, or a data warehouse?
Yes — export CSV, JSON, Excel, or XML from the Apify Console, or webhook the dataset on ACTOR.RUN.SUCCEEDED into Make, Zapier, or n8n. Pull it directly via the Apify API: GET /datasets/{id}/items?format=csv&clean=true.

Why does firefox return fewer suggestions than chrome?
The firefox client asks for a leaner payload — up to 10 suggestions, no relevance scores, no subtype tags. Google serves different payloads per client. Use chrome unless you specifically want the smaller response.

Try it

The Actor is live on the Apify Store: apify.com/DevilScrapes/google-autocomplete-bulk. Apify's free trial gives you $5 in credit, no credit card required — roughly 4,950 suggestion rows. Run it on your top 10 seed keywords with A–Z expansion and you will have up to 3,900 long-tail variants in the dataset in under a minute. Find a field you wish it returned? Drop it in the comments.

External references:

Built by Devil Scrapes — Apify Actors with attitude. Pay-per-event, transparent pricing, no junk fields. 😈

DEV Community