DEV Community

agenthustler
agenthustler

Posted on

Facebook Ads Library Scraping: Extract Political Ads and Advertiser Data

Skip the Setup — Use a Ready-Made Facebook Ads Scraper

Building a Facebook Ads Library scraper from scratch means handling React rendering, anti-bot detection, and constant UI changes. Our Facebook Ads Scraper on Apify is production-ready: political ads, commercial ads, spend data, creatives, and advertiser profiles — all extracted to structured JSON.

Try it free on Apify →

Free plan included. No credit card required.

What Is the Facebook Ads Library?

The Facebook Ads Library is Meta's publicly accessible archive of advertisements. Launched in 2019, it was designed to increase transparency around advertising on Facebook's platforms, particularly political and issue-based ads.

Key features of the library:

  • Political ads are permanently archived: Unlike commercial ads, political and issue ads remain in the library for 7 years after last being shown
  • Spend and impression ranges: Political ads include declared spending ranges and estimated impression counts
  • Advertiser information: Including page name, disclaimers, and "Paid for by" disclosures
  • Ad creative access: The actual text, images, and videos used in ads
  • Targeting information: For political ads, some demographic targeting data is disclosed
  • Global coverage: Ads from all countries where Meta operates

Why Scrape the Facebook Ads Library?

While Meta provides a web interface and a basic API, there are compelling reasons to build automated scraping solutions:

  • The official API is limited: The Ad Library API has strict rate limits, limited search capabilities, and frequently breaks or changes without notice
  • Bulk data export isn't available: You can't simply download all ads for a given country or time period
  • Research at scale: Academic researchers studying political advertising need thousands or millions of records
  • Competitive intelligence: Marketers want to analyze competitor ad strategies, creatives, and messaging patterns
  • Journalism: Investigative journalists tracking dark money, misleading claims, or coordinated campaigns need comprehensive data
  • Campaign monitoring: Political watchdogs monitoring election integrity need real-time ad tracking

Understanding the Data Structure

Before scraping, let's understand what data is available for different ad types:

Political and Issue Ads

These are ads about social issues, elections, or politics. They contain the richest data:

// Typical political ad data structure
const politicalAdSchema = {
    adId: "string",              // Unique ad identifier
    pageId: "string",            // Advertiser's Facebook page ID
    pageName: "string",          // Page name
    disclaimer: "string",        // "Paid for by" disclosure
    adCreativeBody: "string",    // Ad text content
    adCreativeLinkCaption: "string",
    adCreativeLinkTitle: "string",
    adCreativeLinkDescription: "string",
    adDeliveryStartTime: "date", // When the ad started running
    adDeliveryStopTime: "date",  // When it stopped (null if active)
    currency: "string",          // Currency code
    spendLower: "number",        // Minimum spend range
    spendUpper: "number",        // Maximum spend range
    impressionsLower: "number",  // Minimum impression range
    impressionsUpper: "number",  // Maximum impression range
    demographicDistribution: [   // Age/gender breakdown
        { percentage: "1.5%", age: "18-24", gender: "female" }
    ],
    deliveryByRegion: [          // Geographic distribution
        { percentage: "25%", region: "California" }
    ],
    publisherPlatforms: ["facebook", "instagram"],
    adSnapshot: {                // Visual creative data
        images: ["url"],
        videos: ["url"],
        cards: []                // Carousel card data
    }
};
Enter fullscreen mode Exit fullscreen mode

Commercial Ads

Regular commercial ads contain less data (no spend/impressions):

const commercialAdSchema = {
    adId: "string",
    pageId: "string",
    pageName: "string",
    adCreativeBody: "string",
    adDeliveryStartTime: "date",
    adDeliveryStopTime: "date",
    publisherPlatforms: ["facebook", "instagram"],
    adSnapshot: {
        images: ["url"],
        videos: ["url"]
    }
    // No spend, impressions, or demographic data
};
Enter fullscreen mode Exit fullscreen mode

Setting Up Your Scraping Environment

The Facebook Ads Library is a dynamic web application built with React. Scraping it effectively requires a browser automation approach. Let's set up a robust scraping pipeline:

// facebook-ads-scraper.js
const { chromium } = require('playwright');

class FacebookAdsLibraryScraper {
    constructor(options = {}) {
        this.browser = null;
        this.context = null;
        this.page = null;
        this.baseUrl = 'https://www.facebook.com/ads/library/';
        this.delay = options.delay || 2000;
        this.results = [];
    }

    async init() {
        this.browser = await chromium.launch({
            headless: true,
            args: [
                '--no-sandbox',
                '--disable-setuid-sandbox',
                '--disable-dev-shm-usage'
            ]
        });

        this.context = await this.browser.newContext({
            viewport: { width: 1440, height: 900 },
            userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
                'AppleWebKit/537.36 (KHTML, like Gecko) ' +
                'Chrome/120.0.0.0 Safari/537.36',
            locale: 'en-US'
        });

        this.page = await this.context.newPage();
    }

    async sleep(ms) {
        return new Promise(resolve => setTimeout(resolve, ms));
    }

    async close() {
        if (this.browser) await this.browser.close();
    }
}
Enter fullscreen mode Exit fullscreen mode

Searching for Political Ads

The first step is navigating the search interface to find relevant ads:

async function searchPoliticalAds(scraper, params) {
    const {
        query = '',
        country = 'US',
        adType = 'political_and_issue_ads',
        dateFrom = null,
        dateTo = null
    } = params;

    // Build the search URL with query parameters
    const searchUrl = new URL(scraper.baseUrl);
    searchUrl.searchParams.set('active_status', 'all');
    searchUrl.searchParams.set('ad_type', adType);
    searchUrl.searchParams.set('country', country);
    searchUrl.searchParams.set('media_type', 'all');

    if (query) {
        searchUrl.searchParams.set('q', query);
    }

    await scraper.page.goto(searchUrl.toString(), {
        waitUntil: 'networkidle',
        timeout: 30000
    });

    // Wait for results to load
    await scraper.page.waitForSelector(
        '[class*="adLibrary"]',
        { timeout: 15000 }
    ).catch(() => {
        console.log('Initial selector not found, checking for results...');
    });

    // Apply date filters if specified
    if (dateFrom || dateTo) {
        await applyDateFilter(scraper, dateFrom, dateTo);
    }

    return scraper;
}

async function applyDateFilter(scraper, dateFrom, dateTo) {
    // Click the date filter button
    const filterButtons = await scraper.page.$$('button');
    for (const btn of filterButtons) {
        const text = await btn.textContent();
        if (text.includes('Date') || text.includes('Start date')) {
            await btn.click();
            await scraper.sleep(1000);
            break;
        }
    }

    // Fill in date inputs
    if (dateFrom) {
        const fromInput = await scraper.page.$(
            'input[placeholder*="From"], input[aria-label*="start"]'
        );
        if (fromInput) {
            await fromInput.fill(dateFrom);
        }
    }

    if (dateTo) {
        const toInput = await scraper.page.$(
            'input[placeholder*="To"], input[aria-label*="end"]'
        );
        if (toInput) {
            await toInput.fill(dateTo);
        }
    }

    // Apply the filter
    const applyBtn = await scraper.page.$('button:has-text("Apply")');
    if (applyBtn) await applyBtn.click();

    await scraper.sleep(2000);
}
Enter fullscreen mode Exit fullscreen mode

Extracting Ad Data

Once we have search results, we need to extract structured data from each ad card:

async function extractAdCards(scraper, maxAds = 100) {
    const ads = [];
    let loadMoreAttempts = 0;
    const maxLoadMoreAttempts = 50;

    while (ads.length < maxAds && loadMoreAttempts < maxLoadMoreAttempts) {
        // Extract visible ad cards
        const newAds = await scraper.page.evaluate(() => {
            const adCards = document.querySelectorAll(
                '[class*="adCard"], [data-testid="ad_card"],' +
                ' [class*="SearchResultCard"]'
            );
            const extracted = [];

            adCards.forEach(card => {
                const bodyEl = card.querySelector(
                    '[class*="adText"], [class*="body"]'
                );
                const pageNameEl = card.querySelector(
                    '[class*="pageName"], a[href*="/ads/library/?view_all_page_id"]'
                );
                const disclaimerEl = card.querySelector(
                    '[class*="disclaimer"]'
                );
                const dateEl = card.querySelector(
                    '[class*="startDate"], span:has-text("Started running")'
                );
                const platformEls = card.querySelectorAll(
                    '[class*="platform"] img, [aria-label*="Facebook"],' +
                    ' [aria-label*="Instagram"]'
                );
                const statusEl = card.querySelector(
                    '[class*="status"]'
                );

                // Extract spend data (political ads only)
                const spendEl = card.querySelector(
                    '[class*="spend"], span:has-text("Spent")'
                );
                const impressionEl = card.querySelector(
                    '[class*="impression"]'
                );

                // Extract image URLs
                const images = [];
                card.querySelectorAll('img[src*="scontent"]').forEach(img => {
                    images.push(img.src);
                });

                extracted.push({
                    pageName: pageNameEl
                        ? pageNameEl.textContent.trim() : '',
                    disclaimer: disclaimerEl
                        ? disclaimerEl.textContent.trim() : '',
                    adBody: bodyEl
                        ? bodyEl.textContent.trim() : '',
                    startDate: dateEl
                        ? dateEl.textContent.trim() : '',
                    status: statusEl
                        ? statusEl.textContent.trim() : 'Unknown',
                    spend: spendEl
                        ? spendEl.textContent.trim() : '',
                    impressions: impressionEl
                        ? impressionEl.textContent.trim() : '',
                    platforms: Array.from(platformEls).map(
                        el => el.getAttribute('aria-label') || ''
                    ),
                    imageUrls: images,
                    scrapedAt: new Date().toISOString()
                });
            });

            return extracted;
        });

        // Deduplicate and add new ads
        for (const ad of newAds) {
            const isDuplicate = ads.some(
                a => a.adBody === ad.adBody && a.pageName === ad.pageName
            );
            if (!isDuplicate) ads.push(ad);
        }

        // Scroll to trigger loading more results
        await scraper.page.evaluate(
            () => window.scrollTo(0, document.documentElement.scrollHeight)
        );
        await scraper.sleep(scraper.delay);

        // Check for "See more" or "Load more" buttons
        const loadMoreBtn = await scraper.page.$(
            'button:has-text("See more results"),' +
            ' button:has-text("Load more")'
        );
        if (loadMoreBtn) {
            await loadMoreBtn.click();
            await scraper.sleep(2000);
        }

        loadMoreAttempts++;
        console.log(
            `Collected ${ads.length}/${maxAds} ads ` +
            `(attempt ${loadMoreAttempts})`
        );
    }

    return ads.slice(0, maxAds);
}
Enter fullscreen mode Exit fullscreen mode

Extracting Advertiser Profiles

To build a complete picture of political advertising, we need to analyze advertiser pages:

async function scrapeAdvertiserProfile(scraper, pageId) {
    const profileUrl =
        `${scraper.baseUrl}?active_status=all&ad_type=political_and_issue_ads` +
        `&country=US&view_all_page_id=${pageId}&media_type=all`;

    await scraper.page.goto(profileUrl, {
        waitUntil: 'networkidle',
        timeout: 30000
    });

    await scraper.sleep(3000);

    const profile = await scraper.page.evaluate(() => {
        // Extract page-level summary data
        const pageNameEl = document.querySelector('h2, [class*="pageTitle"]');
        const totalAdsEl = document.querySelector(
            'span:has-text("total ads")'
        );

        // Extract spending summary if available
        const spendingSection = document.querySelector(
            '[class*="spendingSummary"]'
        );
        let totalSpend = '';
        if (spendingSection) {
            totalSpend = spendingSection.textContent.trim();
        }

        // Get all visible ad previews
        const adPreviews = [];
        document.querySelectorAll('[class*="adCard"]').forEach(card => {
            const body = card.querySelector('[class*="adText"]');
            const date = card.querySelector('[class*="startDate"]');
            adPreviews.push({
                text: body ? body.textContent.trim().slice(0, 200) : '',
                date: date ? date.textContent.trim() : ''
            });
        });

        return {
            pageId: new URLSearchParams(window.location.search)
                .get('view_all_page_id'),
            pageName: pageNameEl
                ? pageNameEl.textContent.trim() : '',
            totalAds: totalAdsEl
                ? totalAdsEl.textContent.trim() : '',
            totalSpend: totalSpend,
            recentAds: adPreviews.slice(0, 10)
        };
    });

    return profile;
}
Enter fullscreen mode Exit fullscreen mode

Extracting Spend and Targeting Data

Political ad spending data is crucial for election transparency research:

async function extractSpendData(scraper, adDetailUrl) {
    await scraper.page.goto(adDetailUrl, {
        waitUntil: 'networkidle',
        timeout: 30000
    });

    await scraper.sleep(3000);

    const spendData = await scraper.page.evaluate(() => {
        const data = {
            spend: { lower: null, upper: null, currency: 'USD' },
            impressions: { lower: null, upper: null },
            demographics: [],
            regions: []
        };

        // Parse spend range
        const spendText = document.querySelector(
            '[class*="spend"]'
        )?.textContent || '';
        const spendMatch = spendText.match(
            /\$?([\d,]+)\s*-\s*\$?([\d,]+)/
        );
        if (spendMatch) {
            data.spend.lower = parseInt(
                spendMatch[1].replace(/,/g, '')
            );
            data.spend.upper = parseInt(
                spendMatch[2].replace(/,/g, '')
            );
        }

        // Parse impression range
        const impText = document.querySelector(
            '[class*="impression"]'
        )?.textContent || '';
        const impMatch = impText.match(
            /([\d,]+)\s*-\s*([\d,]+)/
        );
        if (impMatch) {
            data.impressions.lower = parseInt(
                impMatch[1].replace(/,/g, '')
            );
            data.impressions.upper = parseInt(
                impMatch[2].replace(/,/g, '')
            );
        }

        // Parse demographic distribution
        const demoRows = document.querySelectorAll(
            '[class*="demographic"] tr,' +
            ' [class*="ageGender"] [class*="row"]'
        );
        demoRows.forEach(row => {
            const cells = row.querySelectorAll('td, span');
            if (cells.length >= 3) {
                data.demographics.push({
                    age: cells[0]?.textContent?.trim(),
                    gender: cells[1]?.textContent?.trim(),
                    percentage: cells[2]?.textContent?.trim()
                });
            }
        });

        // Parse regional distribution
        const regionRows = document.querySelectorAll(
            '[class*="region"] tr,' +
            ' [class*="deliveryRegion"] [class*="row"]'
        );
        regionRows.forEach(row => {
            const cells = row.querySelectorAll('td, span');
            if (cells.length >= 2) {
                data.regions.push({
                    region: cells[0]?.textContent?.trim(),
                    percentage: cells[1]?.textContent?.trim()
                });
            }
        });

        return data;
    });

    return spendData;
}
Enter fullscreen mode Exit fullscreen mode

Scaling with Apify

For production-scale Facebook Ads Library scraping, Apify provides the infrastructure you need. Here's how to set up and run a comprehensive scraping operation:

const { ApifyClient } = require('apify-client');

const client = new ApifyClient({
    token: 'YOUR_APIFY_TOKEN'
});

async function runFacebookAdsActor() {
    const run = await client.actor('facebook-ads-library-scraper').call({
        searchTerms: [
            'climate change',
            'immigration',
            'election 2024',
            'healthcare reform'
        ],
        country: 'US',
        adType: 'political_and_issue_ads',
        maxAdsPerSearch: 500,
        dateRange: {
            from: '2024-01-01',
            to: '2024-12-31'
        },
        extractCreatives: true,
        extractDemographics: true,
        proxy: {
            useApifyProxy: true,
            apifyProxyGroups: ['RESIDENTIAL']
        }
    });

    const { items } = await client
        .dataset(run.defaultDatasetId)
        .listItems();

    console.log(`Collected ${items.length} political ads`);
    return items;
}
Enter fullscreen mode Exit fullscreen mode

Building a Custom Apify Actor

For fine-tuned control over the scraping process:

// src/main.js - Custom Facebook Ads Library Actor
const Apify = require('apify');
const { PlaywrightCrawler } = require('crawlee');

Apify.main(async () => {
    const input = await Apify.getInput();
    const {
        searchTerms = [],
        country = 'US',
        adType = 'political_and_issue_ads',
        maxAdsPerSearch = 100,
        extractCreatives = false
    } = input;

    const proxyConfiguration = await Apify.createProxyConfiguration({
        useApifyProxy: true,
        apifyProxyGroups: ['RESIDENTIAL']
    });

    // Build initial request list from search terms
    const requests = searchTerms.map(term => ({
        url: `https://www.facebook.com/ads/library/?active_status=all` +
            `&ad_type=${adType}&country=${country}` +
            `&q=${encodeURIComponent(term)}&media_type=all`,
        userData: { searchTerm: term, adsCollected: 0 }
    }));

    const crawler = new PlaywrightCrawler({
        proxyConfiguration,
        maxConcurrency: 3,
        navigationTimeoutSecs: 60,
        requestHandlerTimeoutSecs: 300,

        async requestHandler({ page, request, log }) {
            const { searchTerm, adsCollected } = request.userData;
            log.info(
                `Scraping "${searchTerm}" - ${adsCollected} ads so far`
            );

            // Wait for ad cards to load
            await page.waitForSelector(
                '[class*="adCard"], [class*="SearchResult"]',
                { timeout: 20000 }
            ).catch(() => log.warning('No ad cards found'));

            // Scroll and collect
            let collected = adsCollected;
            let scrollAttempts = 0;

            while (
                collected < maxAdsPerSearch &&
                scrollAttempts < 100
            ) {
                const ads = await page.evaluate(() => {
                    // Extraction logic
                    const cards = document.querySelectorAll(
                        '[class*="adCard"]'
                    );
                    return Array.from(cards).map(card => ({
                        body: card.querySelector(
                            '[class*="adText"]'
                        )?.textContent?.trim() || '',
                        pageName: card.querySelector(
                            '[class*="pageName"]'
                        )?.textContent?.trim() || ''
                    }));
                });

                for (const ad of ads) {
                    ad.searchTerm = searchTerm;
                    ad.country = country;
                    await Apify.pushData(ad);
                    collected++;
                }

                await page.evaluate(() =>
                    window.scrollTo(
                        0, document.documentElement.scrollHeight
                    )
                );
                await page.waitForTimeout(2000);
                scrollAttempts++;
            }

            log.info(
                `Finished "${searchTerm}": ${collected} ads collected`
            );
        }
    });

    await crawler.run(requests);
});
Enter fullscreen mode Exit fullscreen mode

Analyzing the Collected Data

Once you have a substantial dataset, analysis reveals powerful insights:

function analyzeSpendingPatterns(ads) {
    // Group by advertiser
    const byAdvertiser = {};
    ads.forEach(ad => {
        const name = ad.pageName || 'Unknown';
        if (!byAdvertiser[name]) {
            byAdvertiser[name] = {
                totalAds: 0,
                totalSpendLower: 0,
                totalSpendUpper: 0,
                platforms: new Set(),
                dateRange: { earliest: null, latest: null }
            };
        }

        const entry = byAdvertiser[name];
        entry.totalAds++;

        if (ad.spendLower) entry.totalSpendLower += ad.spendLower;
        if (ad.spendUpper) entry.totalSpendUpper += ad.spendUpper;

        if (ad.platforms) {
            ad.platforms.forEach(p => entry.platforms.add(p));
        }

        const startDate = new Date(ad.adDeliveryStartTime);
        if (!entry.dateRange.earliest ||
            startDate < entry.dateRange.earliest) {
            entry.dateRange.earliest = startDate;
        }
        if (!entry.dateRange.latest ||
            startDate > entry.dateRange.latest) {
            entry.dateRange.latest = startDate;
        }
    });

    // Sort by estimated total spend
    const sorted = Object.entries(byAdvertiser)
        .map(([name, data]) => ({
            advertiser: name,
            ...data,
            platforms: Array.from(data.platforms),
            estimatedAvgSpend: (
                data.totalSpendLower + data.totalSpendUpper
            ) / 2
        }))
        .sort((a, b) => b.estimatedAvgSpend - a.estimatedAvgSpend);

    return sorted;
}

function analyzeAdCreatives(ads) {
    // Common messaging themes
    const wordFrequency = {};
    ads.forEach(ad => {
        if (!ad.adBody) return;
        const words = ad.adBody.toLowerCase()
            .replace(/[^\w\s]/g, '')
            .split(/\s+/)
            .filter(w => w.length > 4);

        words.forEach(word => {
            wordFrequency[word] = (wordFrequency[word] || 0) + 1;
        });
    });

    const topWords = Object.entries(wordFrequency)
        .sort((a, b) => b[1] - a[1])
        .slice(0, 50);

    return { topWords };
}
Enter fullscreen mode Exit fullscreen mode

Legal and Ethical Considerations

Facebook Ads Library scraping occupies a unique legal position:

  1. The data is intentionally public: Meta created the Ad Library specifically for transparency. Political ads are required by law to be publicly disclosed in many jurisdictions.

  2. Respect rate limits: Even though the data is public, aggressive scraping can affect service availability. Use reasonable delays (2-5 seconds between requests).

  3. Academic and journalistic use: Courts have generally been favorable toward scraping public data for research and journalism purposes, especially when the data was made public for transparency.

  4. Commercial use considerations: Using scraped ad data for commercial purposes may face additional scrutiny. Consult legal counsel if you plan to commercialize insights derived from the data.

  5. Don't scrape user data: The Ads Library shows advertiser information, not user data. Never attempt to correlate ad targeting data with individual users.

  6. Credit your sources: When publishing research based on Ad Library data, cite Meta's Ad Library as your source.

  7. GDPR compliance: If you're collecting data about EU-based advertisers, ensure your data handling practices comply with GDPR requirements.

Automating Ongoing Monitoring

For continuous political ad monitoring, set up scheduled scraping runs:

const { ApifyClient } = require('apify-client');

async function setupScheduledMonitoring() {
    const client = new ApifyClient({ token: 'YOUR_TOKEN' });

    // Create a scheduled task that runs daily
    const schedule = await client.schedules().create({
        name: 'Daily Political Ad Monitor',
        cronExpression: '0 6 * * *', // Every day at 6 AM UTC
        actions: [{
            type: 'RUN_ACTOR',
            actorId: 'your-fb-ads-actor-id',
            runInput: {
                searchTerms: [
                    'election',
                    'vote',
                    'candidate names...'
                ],
                country: 'US',
                adType: 'political_and_issue_ads',
                maxAdsPerSearch: 200,
                dateRange: {
                    from: 'yesterday',
                    to: 'today'
                }
            }
        }]
    });

    console.log(`Schedule created: ${schedule.id}`);
}
Enter fullscreen mode Exit fullscreen mode

Extract Facebook Ads Data Without Building Infrastructure

Skip the Playwright setup, proxy management, and maintenance. The Facebook Ads Scraper by cryptosignals handles political ads, commercial ads, spend data, and creatives with residential proxies built in.

Try it free on Apify → — no credit card, free plan included.

Conclusion

The Facebook Ads Library is a goldmine of transparency data that enables powerful analysis of political advertising, competitive intelligence, and market research. While Meta provides a web interface and basic API, automated scraping with tools like Playwright and Apify unlocks the ability to collect and analyze data at scale.

By combining the techniques in this guide — from basic ad card extraction to sophisticated spend analysis and ongoing monitoring — you can build comprehensive datasets that reveal how organizations use political advertising to influence public opinion.

Whether you're a journalist investigating campaign spending, a researcher studying political communication, or a marketer analyzing competitive strategies, the Facebook Ads Library contains insights waiting to be extracted. Start with small, focused scraping runs, validate your data quality, and scale up with Apify when you're ready to go big.

Remember: this data was made public for a reason. Use it responsibly, respect rate limits, and contribute to the transparency that makes democratic discourse healthier.

Top comments (0)