Devil Scrapes

Posted on May 31

Google Ads Transparency Scraper: pull any competitor's ads for $1.20/1K

#apify #webscraping #python #marketing

Quick answer: The Google Ads Transparency Center is a public registry of every ad Google runs — but it ships no API and no bulk export. To get the data programmatically you scrape it. A Google Ads Transparency scraper sends the same RPC call the website uses and returns every ad creative for an advertiser as structured JSON. The Apify Actor below does it for $0.0012 per ad (~$1.20 per 1,000), with the TLS fingerprinting, proxy rotation, and pagination handled for you.

Google's Ads Transparency Center is one of the most underused datasets in marketing. Launched in 2023 under the EU Digital Services Act and parallel US pressure, it indexes every ad campaign currently running on Search, YouTube, Display, Shopping, Maps, and Play — keyed by advertiser. Google's own counter lists 300,000+ active creatives for a brand like Nike. For your nearest competitor, it's usually 50–500.

The catch: there's no download button. Just an interactive UI that paginates 40 creatives at a time. If you want this as a CSV — for a competitor sweep, a trademark audit, or a RAG corpus — you have to extract it yourself. Here's what that actually takes, and how I shortened it to one API call.

What is the Google Ads Transparency Center? 🔎

The Google Ads Transparency Center is a public, Google-operated registry that shows the ad creatives any verified advertiser is running, the date range each ad was shown, and roughly where. Google built it to comply with ad-disclosure regulation, so the data is public by design — you're reading the same registry a regulator would.

What it gives you per advertiser:

Every ad creative currently or recently live (text, image, video)
The landing domain each ad clicks through to
First-shown / last-shown timestamps and a rough impression count
A deep link to each creative inside the Transparency Center

What it does not give you: a search-by-keyword mode, region-filtered results from the server, or — crucially — an API.

Does the Google Ads Transparency Center have an API?

No. As of 2026 Google publishes no official API or bulk export for the Ads Transparency Center. The only programmatic surface is the internal SearchService/SearchCreatives RPC that the website itself calls. That endpoint is undocumented, returns a positional protobuf-style array (not labeled JSON), and inspects your TLS fingerprint before it answers. Scraping it reliably is the whole job — which is why a hosted Actor exists instead of a three-line snippet.

What the data looks like

Each ad creative comes back as one flat, typed row. Concrete beats abstract, so here's a real one:

{
  "advertiser_id": "AR18378488041124659201",
  "advertiser_name": "Nike Retail BV",
  "creative_id": "CR15771942603307614209",
  "creative_url": "https://adstransparency.google.com/advertiser/AR18378488041124659201/creative/CR15771942603307614209?region=anywhere",
  "landing_domain": "nike.com",
  "format_type": 1,
  "first_shown_ts": 1761145807,
  "last_shown_ts": 1778871417,
  "impressions": 205,
  "preview_image_url": "https://tpc.googlesyndication.com/archive/simgad/12774179880874022668",
  "preview_content_js_url": null,
  "region": "anywhere",
  "scraped_at": "2026-05-15T19:17:59+00:00"
}

Thirteen fields, the same shape every time, validated with Pydantic before it's written. It drops straight into Pandas, BigQuery, or a vector store — no positional-array wrangling on your side.

The naive approach (and why it falls apart)

The first thing every scraper-aware person tries:

Open Chrome DevTools, find the XHR call to SearchCreatives
Replay it with requests.post()
Parse the JSON, paginate, done

It breaks on the first request. Three reasons, and they're the reasons a hosted Actor earns its keep:

1. TLS fingerprinting. Google's endpoint inspects the JA3/JA4 signature of your TLS handshake. Python's stdlib SSL doesn't match any real browser, so the server returns 403 before it even reads your payload. We get around it by impersonating a real Firefox 147 TLS + HTTP/2 fingerprint via curl-cffi — so the handshake looks like a browser, because functionally it is one.

2. Cookie continuity across pagination. The pagination cursor is bound to a session cookie. Rotate IPs naively between pages and the server invalidates your cursor mid-scrape. We thread Apify residential proxies with sticky sessions so each advertiser's pagination keeps one stable exit IP and cookie jar, and we pace requests at ~1/sec to stay polite.

3. A positional, protobuf-flavored response. The reply isn't keyed JSON — it's nested arrays where meaning depends on position. One Google A/B rotation and a naive parser silently emits garbage. We pin the parser against four captured creative shapes (still image, rich video, minimal, malformed) and run live wire-validation to catch contract drift before it reaches your dataset. On 408/429/5xx we retry with exponential backoff and fail loud on partial success rather than handing you a half-empty file.

None of that is glamorous. All of it is the difference between a script that worked once on your laptop and a feed that survives Google's quarterly cipher rotation.

The Actor

I packaged the result as an Apify Actor: Google Ads Transparency Scraper.

Paste a domain in the Apify Console and click Start, or run it programmatically:

from apify_client import ApifyClient

client = ApifyClient("APIFY_TOKEN")
run = client.actor("DevilScrapes/google-ads-transparency").call(
    run_input={
        "searchDomains": ["nike.com", "adidas.com"],
        "maxResults": 5000,
    }
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

You search by landing domain (returns every ad pointing at that domain — including ones bought by resellers and affiliates) or by advertiser ID when you already know the exact advertiser. Multiple targets per run, deduplicated automatically.

What you'd actually use this for

Four concrete patterns, not generic "competitive intelligence":

Weekly competitor sweep. Schedule a run on your top 5 competitors, diff this week's creative IDs against last week's, and alert when a new product line launches. Five competitors × ~200 ads each = roughly $1.20/week of data.
Trademark enforcement. Sweep your own domain and you'll see ads other people bought against your brand keyword — resellers, affiliates, competitors. Cross-reference advertiser IDs against your trademark portfolio and flag the unlicensed ones.
Affiliate-fraud detection. Pull every advertiser whose landing_domain doesn't match the advertiser_name. Mismatches are common in crypto, nutra, and supplement verticals: [c for c in creatives if c["landing_domain"] not in c["advertiser_name"].lower()].
AI / RAG ingestion. Feed creative metadata plus image URLs into a vector store for image-grounded competitive analysis.

Pricing — exact numbers 💰

Pay-per-event. You pay for ads you get, nothing for ads you ask for. No data, no charge.

$0.005 per run (covers warm-up + cookie handshake)
$0.0012 per ad written to the dataset

Pull	Cost
100 ads	$0.13
1,000 ads	$1.21
10,000 ads	$12.01
100,000 ads (monthly sweep)	$120.05

Apify's $5 free trial credit covers your first ~4,000 ads with no credit card. For comparison, the nearest SaaS substitutes (Adbeat, SpyFu) start around $249/month for a slice of the same Google data.

The part other scrapers won't tell you

Region filtering doesn't work — and we say so. The region parameter on Google's SearchCreatives RPC is server-ignored. We tested every plausible request-body shape and none of them returned a region-narrowed result set; the browser UI shows a region selector, but the server hands back the same creative set regardless of what you pass.

So we expose region only as a metadata tag — useful for labeling exports by intended market when you run parallel campaigns, useless as a filter. No public Actor offers real region-narrowed scraping, because Google's endpoint doesn't support it. We'd rather under-promise than ship a filter that silently does nothing.

Limitations (the honest list)

No keyword search. You search by advertiser/domain, not by ad copy — Google's RPC exposes no keyword mode.
Video creatives return a JS bundle, not an MP4. You get a preview_content_js_url; rendering the actual frame needs a headless browser and is out of scope for v1.
~12 months of history. Google purges older creatives, so a wider date range just clips to what they retain.
Big brands hit a cap. Google stops paginating past ~1,000 ads per query, so full-history pulls on a Nike-sized advertiser need maxPages raised deliberately.

FAQ

Is scraping the Google Ads Transparency Center legal?
The Center is a public registry Google operates under regulatory pressure. This Actor reads only what the public UI exposes, at ~1 request/second per session, collects only advertiser-level metadata (no personal data), and bypasses no authentication. As always, check your own jurisdiction and use case.

How is this different from the Facebook Ad Library?
Different platform, different endpoint. This covers Google's network (Search, YouTube, Display, Shopping, Maps, Play). For Meta, use a dedicated Facebook Ad Library scraper.

Can I export to Google Sheets or a warehouse?
Yes — export CSV/Excel/JSON from the Apify Console, webhook the dataset on ACTOR.RUN.SUCCEEDED into Make/Zapier/n8n, or pull it via the Apify API.

Why are some preview_image_url values null?
Those are rich/video/animated creatives — Google renders them with JavaScript, so you get a content.js URL instead of a static image.

Try it

The Actor is on the Apify Store: apify.com/DevilScrapes/google-ads-transparency.

Free $5 trial credit, no credit card. Run it on nike.com and you'll have ~1,000 creatives in your dataset in under a minute. Find a use case I missed, or a field you wish it returned? Drop it in the comments — I ship based on what people actually need.

Built by Devil Scrapes — Apify Actors with attitude. Pay-per-event, transparent pricing, no junk fields. 😈

DEV Community