How to Scrape the Facebook Ad Library for Competitor Ad Intelligence (No Login)

#webscraping #marketing #facebook #ai

The Facebook (Meta) Ad Library is one of the most underrated datasets in marketing. Because of ad-transparency regulation, Meta is legally required to publish every ad running across Facebook, Instagram, Messenger, and Audience Network — searchable by advertiser, keyword, and country, by anyone, with no account.

That means every competitor's live creative strategy is sitting in a public endpoint. The problem is getting it out cleanly. Let me walk through how the Ad Library actually serves its data and how to scrape it without a Facebook login.

The Ad Library is public — but the data is in XHR, not HTML

Open https://www.facebook.com/ads/library/ and search a brand. The visible page is a React app; the ad cards you see are not in the initial HTML. They arrive via background GraphQL calls (/api/graphql/) that the page fires after load. So a naive fetch + HTML parse gets you almost nothing.

The robust approach is browser-intercept: drive a headless browser to the search URL, let the page make its own signed GraphQL requests, and capture the JSON responses as they come back. The page signs its own requests (tokens, doc IDs, session params), so you ride along instead of trying to forge them.

Intercepting the GraphQL responses

With Playwright, you hook the network layer and grab the responses whose payload contains ad nodes:

import { chromium } from 'playwright';

const ads = [];
const browser = await chromium.launch();
const page = await browser.newPage();

page.on('response', async (res) => {
  const url = res.url();
  if (!url.includes('/api/graphql/')) return;
  const ct = res.headers()['content-type'] || '';
  if (!ct.includes('application/json')) return;

  try {
    const json = await res.json();
    // ad nodes live under search results edges in the GraphQL payload
    const results =
      json?.data?.ad_library_main?.search_results_connection?.edges ?? [];
    for (const edge of results) {
      const node = edge?.node?.collated_results?.[0] ?? edge?.node;
      if (!node) continue;
      const snap = node.snapshot ?? {};
      ads.push({
        adArchiveId: node.ad_archive_id,
        pageName: snap.page_name,
        title: snap.title,
        body: snap.body?.text,
        ctaText: snap.cta_text,
        ctaType: snap.cta_type,
        linkUrl: snap.link_url,
        startDate: node.start_date,
        isActive: node.is_active,
        platforms: node.publisher_platform,
      });
    }
  } catch (_) { /* not the payload we want */ }
});

const q = encodeURIComponent('Nike');
await page.goto(
  `https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=US&q=${q}`,
  { waitUntil: 'networkidle' }
);
// scroll to trigger pagination, then read `ads`

Scroll the page to trigger more GraphQL pages, dedupe by adArchiveId, and you've got a structured feed of a competitor's entire active ad set.

What you actually get out

The interesting fields for ad spying / competitor research:

Creative: title, body, caption, plus imageUrls / videoUrls for the actual assets
Carousel cards parsed out individually — each card's image, headline, and link
Call-to-action: ctaText ("Shop Now", "Sign Up") and ctaType
Targeting signals: platforms (which of FB/IG/Messenger it runs on), startDate / endDate
Advertiser context: pageName, pageLikeCount, pageCategories
For political/issue ads only (extra transparency): spend, impressions, currency

The gold here is duration + variant count. If a competitor has been running the same creative for three months across five variants, that ad is working — they don't burn budget on losers. You just reverse-engineered their winning hook for free.

Gotchas

Use residential proxies keyed to the country you're querying — the Ad Library is geo-partitioned and datacenter IPs get throttled fast.
GraphQL doc IDs change; that's why intercepting the page's own requests beats hardcoding the query. Let Meta sign it.
Respect the obvious: this is public transparency data, not private user data. Scrape creatives and CTAs, not people.

If you'd rather not maintain the browser-intercept plumbing yourself, I packaged this exact approach into a Facebook Ad Library Scraper — you pass a searchQuery, country, and adType, and it returns the 20+ fields above (including separated carousel cards and spend/impressions for political ads) without any login. Either way, the lesson is the same: the best competitor-ad dataset on the internet is public by law, and you reach it by riding the page's own GraphQL calls.