DEV Community

agenthustler
agenthustler

Posted on

Zillow Agent Scraping: Extract Realtor Profiles and Listings Data

Web scraping Zillow agent data opens doors to powerful real estate market intelligence. Whether you're building a CRM for mortgage brokers, analyzing market coverage by agents in specific zip codes, or comparing agent performance across regions, extracting realtor profiles and their associated listings gives you a competitive edge that manual research simply can't match.

In this comprehensive guide, we'll walk through how Zillow structures its agent data, practical approaches to extracting profiles, listing portfolios, reviews, and geographic data, and how to scale the entire process using Apify's cloud infrastructure.

Why Scrape Zillow Agent Data?

Zillow is the largest real estate marketplace in the United States, with over 200 million monthly visitors. Beyond property listings, Zillow hosts detailed profiles for more than 3 million real estate agents and brokers. Each profile contains a wealth of structured data:

  • Agent contact information — name, brokerage, phone number, website
  • Active and sold listings — the agent's current portfolio and historical transactions
  • Client reviews and ratings — star ratings, review text, reviewer names
  • Service areas — zip codes and neighborhoods the agent covers
  • Specializations — buyer's agent, seller's agent, relocation, foreclosures
  • Sales history — total homes sold, price ranges, average sale price

This data powers use cases like:

  1. Lead generation platforms that match buyers/sellers with top agents
  2. Market analysis — which agents dominate which zip codes?
  3. Competitive intelligence for brokerages entering new markets
  4. Mortgage and title companies building referral networks
  5. PropTech startups that need agent data for their products

Understanding Zillow's Agent Page Structure

Before writing any code, let's understand how Zillow organizes agent data. There are several entry points:

Agent Directory Pages

Zillow's agent finder is accessible at URLs like:

https://www.zillow.com/professionals/real-estate-agent-reviews/[city]-[state]/
Enter fullscreen mode Exit fullscreen mode

These directory pages list agents with basic information — name, photo, brokerage, rating, and number of reviews. Pagination is handled through query parameters.

Individual Agent Profiles

Each agent has a dedicated profile page:

https://www.zillow.com/profile/[agent-username]/
Enter fullscreen mode Exit fullscreen mode

The profile page contains the full dataset: bio, all contact details, active listings, past sales, reviews, and service areas.

Agent Search API

Zillow uses internal APIs to populate its search results. The most useful endpoint returns JSON data when you search for agents by location. The request structure typically includes:

// Zillow's internal agent search parameters
const searchParams = {
  location: "San Francisco, CA",
  page: 1,
  specialties: "buyer_agent",
  sort: "recommended",
  language: "english"
};
Enter fullscreen mode Exit fullscreen mode

Setting Up Your Scraping Environment

Let's build a scraper step by step. First, we need the right tools:

// package.json dependencies
{
  "dependencies": {
    "crawlee": "^3.8.0",
    "playwright": "^1.42.0"
  }
}
Enter fullscreen mode Exit fullscreen mode

Crawlee is Apify's open-source web scraping library that handles retries, proxy rotation, and request queuing out of the box.

const { PlaywrightCrawler, Dataset } = require('crawlee');

const crawler = new PlaywrightCrawler({
    maxConcurrency: 5,
    requestHandlerTimeoutSecs: 120,

    async requestHandler({ page, request, log }) {
        const url = request.url;

        if (url.includes('/professionals/')) {
            await handleDirectoryPage(page, log);
        } else if (url.includes('/profile/')) {
            await handleProfilePage(page, request, log);
        }
    },
});
Enter fullscreen mode Exit fullscreen mode

Extracting Agent Directory Listings

The directory page is our starting point. Here's how to extract the agent cards:

async function handleDirectoryPage(page, log) {
    // Wait for agent cards to render
    await page.waitForSelector('[data-test="professional-card"]', {
        timeout: 30000
    });

    // Extract basic info from each card
    const agents = await page.$$eval(
        '[data-test="professional-card"]',
        (cards) => cards.map(card => {
            const name = card.querySelector(
                '[data-test="professional-card-name"]'
            )?.textContent?.trim();

            const brokerage = card.querySelector(
                '[data-test="professional-card-brokerage"]'
            )?.textContent?.trim();

            const rating = card.querySelector(
                '.professional-card-rating'
            )?.textContent?.trim();

            const reviewCount = card.querySelector(
                '.review-count'
            )?.textContent?.match(/\d+/)?.[0];

            const profileLink = card.querySelector(
                'a[href*="/profile/"]'
            )?.getAttribute('href');

            return {
                name,
                brokerage,
                rating: parseFloat(rating) || null,
                reviewCount: parseInt(reviewCount) || 0,
                profileUrl: profileLink 
                    ? `https://www.zillow.com${profileLink}` 
                    : null,
            };
        })
    );

    log.info(`Found ${agents.length} agents on directory page`);

    // Queue individual profile pages for detailed scraping
    for (const agent of agents) {
        if (agent.profileUrl) {
            await crawler.addRequests([{
                url: agent.profileUrl,
                userData: { basicInfo: agent }
            }]);
        }
    }

    // Handle pagination
    const nextButton = await page.$('a[rel="next"]');
    if (nextButton) {
        const nextUrl = await nextButton.getAttribute('href');
        await crawler.addRequests([{
            url: `https://www.zillow.com${nextUrl}`
        }]);
    }
}
Enter fullscreen mode Exit fullscreen mode

Deep-Diving into Agent Profiles

The individual profile page is where the richest data lives. Let's extract everything:

async function handleProfilePage(page, request, log) {
    const basicInfo = request.userData.basicInfo || {};

    // Wait for the profile content to fully load
    await page.waitForSelector('.profile-info', { timeout: 30000 });

    // Extract contact information
    const contactInfo = await page.evaluate(() => {
        const phoneEl = document.querySelector(
            '[data-test="profile-phone"]'
        );
        const emailEl = document.querySelector(
            '[data-test="profile-email"]'
        );
        const websiteEl = document.querySelector(
            'a[data-test="profile-website"]'
        );
        const licenseEl = document.querySelector(
            '.license-number'
        );

        return {
            phone: phoneEl?.textContent?.trim() || null,
            email: emailEl?.textContent?.trim() || null,
            website: websiteEl?.getAttribute('href') || null,
            licenseNumber: licenseEl?.textContent?.trim() || null,
        };
    });

    // Extract service areas and specializations
    const serviceAreas = await page.evaluate(() => {
        const areas = document.querySelectorAll(
            '.service-areas-list li'
        );
        return Array.from(areas).map(a => a.textContent.trim());
    });

    const specializations = await page.evaluate(() => {
        const specs = document.querySelectorAll(
            '.specializations-list li'
        );
        return Array.from(specs).map(s => s.textContent.trim());
    });

    // Extract sales statistics
    const salesStats = await page.evaluate(() => {
        const statsContainer = document.querySelector(
            '.sales-statistics'
        );
        if (!statsContainer) return {};

        return {
            totalSales: parseInt(
                statsContainer.querySelector(
                    '.total-sales'
                )?.textContent?.match(/\d+/)?.[0]
            ) || 0,
            listingsActive: parseInt(
                statsContainer.querySelector(
                    '.active-listings'
                )?.textContent?.match(/\d+/)?.[0]
            ) || 0,
            avgSalePrice: statsContainer.querySelector(
                '.avg-price'
            )?.textContent?.trim() || null,
            priceRange: statsContainer.querySelector(
                '.price-range'
            )?.textContent?.trim() || null,
        };
    });

    // Combine all data
    const agentData = {
        ...basicInfo,
        ...contactInfo,
        serviceAreas,
        specializations,
        salesStats,
        scrapedAt: new Date().toISOString(),
        sourceUrl: request.url,
    };

    await Dataset.pushData(agentData);
    log.info(`Scraped profile: ${agentData.name}`);
}
Enter fullscreen mode Exit fullscreen mode

Extracting Agent Reviews

Reviews are particularly valuable for sentiment analysis and agent ranking. Zillow often loads reviews dynamically, so we need to handle scroll-based loading:

async function extractReviews(page, log) {
    const reviews = [];
    let previousCount = 0;

    // Scroll to load all reviews
    while (true) {
        const currentReviews = await page.$$eval(
            '.review-card',
            (cards) => cards.map(card => ({
                reviewer: card.querySelector(
                    '.reviewer-name'
                )?.textContent?.trim(),
                rating: parseFloat(
                    card.querySelector(
                        '.review-rating'
                    )?.getAttribute('data-rating')
                ) || null,
                date: card.querySelector(
                    '.review-date'
                )?.textContent?.trim(),
                text: card.querySelector(
                    '.review-text'
                )?.textContent?.trim(),
                transactionType: card.querySelector(
                    '.transaction-type'
                )?.textContent?.trim(),
                priceRange: card.querySelector(
                    '.transaction-price'
                )?.textContent?.trim(),
            }))
        );

        if (currentReviews.length === previousCount) break;
        previousCount = currentReviews.length;

        // Click "Show More" if available
        const showMore = await page.$(
            'button[data-test="show-more-reviews"]'
        );
        if (showMore) {
            await showMore.click();
            await page.waitForTimeout(2000);
        } else {
            break;
        }
    }

    log.info(`Extracted ${reviews.length} reviews`);
    return reviews;
}
Enter fullscreen mode Exit fullscreen mode

Geographic Search: Finding Agents by Market

One of the most powerful features is searching for agents by geographic area. This lets you map agent coverage across entire markets:

async function scrapeAgentsByZipCode(zipCodes) {
    const allAgents = [];

    for (const zip of zipCodes) {
        const searchUrl = `https://www.zillow.com/professionals/`
            + `real-estate-agent-reviews/?locationText=${zip}`;

        await crawler.addRequests([{
            url: searchUrl,
            userData: { 
                searchType: 'geographic',
                zipCode: zip 
            }
        }]);
    }

    return allAgents;
}

// Example: scrape agents across the San Francisco Bay Area
const bayAreaZips = [
    '94102', '94103', '94104', '94105', '94107',
    '94108', '94109', '94110', '94111', '94112',
    '94114', '94115', '94116', '94117', '94118',
    '94121', '94122', '94123', '94124', '94127',
    '94129', '94130', '94131', '94132', '94133',
    '94134', '94158'
];

scrapeAgentsByZipCode(bayAreaZips);
Enter fullscreen mode Exit fullscreen mode

This approach lets you build a comprehensive map of agent density, average ratings, and market specializations across any metro area.

Handling Zillow's Anti-Scraping Measures

Zillow employs several anti-scraping protections. Here's how to handle them responsibly:

Proxy Rotation

const { PlaywrightCrawler } = require('crawlee');

const crawler = new PlaywrightCrawler({
    proxyConfiguration: new ProxyConfiguration({
        proxyUrls: [
            'http://proxy1.example.com:8080',
            'http://proxy2.example.com:8080',
        ],
    }),

    // Rotate user agents
    launchContext: {
        launchOptions: {
            args: ['--disable-blink-features=AutomationControlled'],
        },
    },

    preNavigationHooks: [
        async ({ page }) => {
            await page.setExtraHTTPHeaders({
                'Accept-Language': 'en-US,en;q=0.9',
            });
        },
    ],
});
Enter fullscreen mode Exit fullscreen mode

Rate Limiting

Always implement respectful rate limiting:

const crawler = new PlaywrightCrawler({
    maxConcurrency: 3,
    maxRequestsPerMinute: 20,

    // Add random delays between requests
    async requestHandler({ page, request }) {
        const delay = Math.random() * 3000 + 2000;
        await page.waitForTimeout(delay);

        // Your scraping logic here
    },
});
Enter fullscreen mode Exit fullscreen mode

Session Management

const { SessionPool } = require('crawlee');

const sessionPool = new SessionPool({
    maxPoolSize: 10,
    sessionOptions: {
        maxAgeSecs: 3600,
        maxUsageCount: 50,
    },
});
Enter fullscreen mode Exit fullscreen mode

Scaling with Apify

While local scraping works for small datasets, real estate data at scale requires cloud infrastructure. Apify provides everything you need:

Deploying as an Apify Actor

const { Actor } = require('apify');
const { PlaywrightCrawler, Dataset } = require('crawlee');

Actor.main(async () => {
    const input = await Actor.getInput();
    const {
        locations = ['San Francisco, CA'],
        maxAgents = 100,
        includeReviews = true,
        includeListings = true,
    } = input;

    const proxyConfiguration = await Actor.createProxyConfiguration({
        groups: ['RESIDENTIAL'],
        countryCode: 'US',
    });

    const crawler = new PlaywrightCrawler({
        proxyConfiguration,
        maxConcurrency: 5,
        maxRequestsPerMinute: 30,

        async requestHandler({ page, request, log }) {
            // Full scraping logic here
            const agentData = await extractAgentData(page);

            if (includeReviews) {
                agentData.reviews = await extractReviews(page, log);
            }

            if (includeListings) {
                agentData.listings = await extractListings(page, log);
            }

            await Dataset.pushData(agentData);
        },

        failedRequestHandler({ request, log }) {
            log.error(`Failed: ${request.url}`);
        },
    });

    // Build start URLs from locations
    const startUrls = locations.map(loc => ({
        url: `https://www.zillow.com/professionals/`
            + `real-estate-agent-reviews/`
            + `?locationText=${encodeURIComponent(loc)}`,
    }));

    await crawler.run(startUrls);
});
Enter fullscreen mode Exit fullscreen mode

Why Use Apify for Zillow Agent Scraping?

  1. Residential proxies — Apify provides US residential proxies that are essential for accessing Zillow reliably
  2. Automatic scaling — scrape thousands of agent profiles concurrently without managing infrastructure
  3. Built-in storage — results are automatically stored in datasets you can export as JSON, CSV, or Excel
  4. Scheduling — set up recurring scrapes to track agent activity over time
  5. Ready-made actors — Apify Store has pre-built Zillow scrapers you can use immediately

Extracting Agent Listing Portfolios

An agent's active and sold listings tell you about their market focus:

async function extractListings(page, log) {
    const listings = [];

    // Navigate to listings tab
    const listingsTab = await page.$(
        '[data-test="listings-tab"]'
    );
    if (listingsTab) {
        await listingsTab.click();
        await page.waitForTimeout(2000);
    }

    const listingCards = await page.$$eval(
        '.listing-card',
        (cards) => cards.map(card => ({
            address: card.querySelector(
                '.listing-address'
            )?.textContent?.trim(),
            price: card.querySelector(
                '.listing-price'
            )?.textContent?.trim(),
            status: card.querySelector(
                '.listing-status'
            )?.textContent?.trim(),
            beds: parseInt(
                card.querySelector('.beds')?.textContent
            ) || null,
            baths: parseFloat(
                card.querySelector('.baths')?.textContent
            ) || null,
            sqft: parseInt(
                card.querySelector('.sqft')
                    ?.textContent?.replace(/,/g, '')
            ) || null,
            listingUrl: card.querySelector('a')
                ?.getAttribute('href'),
            photoUrl: card.querySelector('img')
                ?.getAttribute('src'),
        }))
    );

    log.info(`Found ${listingCards.length} listings`);
    return listingCards;
}
Enter fullscreen mode Exit fullscreen mode

Data Processing and Analysis

Once you've collected agent data, here's how to derive insights:

function analyzeMarketCoverage(agents) {
    // Group agents by zip code
    const byZip = {};
    for (const agent of agents) {
        for (const area of agent.serviceAreas) {
            if (!byZip[area]) byZip[area] = [];
            byZip[area].push(agent);
        }
    }

    // Calculate market metrics per zip
    const marketMetrics = Object.entries(byZip).map(
        ([zip, zipAgents]) => ({
            zipCode: zip,
            agentCount: zipAgents.length,
            avgRating: (
                zipAgents.reduce((s, a) => s + (a.rating || 0), 0) 
                / zipAgents.length
            ).toFixed(2),
            topAgent: zipAgents.sort(
                (a, b) => (b.salesStats?.totalSales || 0) 
                    - (a.salesStats?.totalSales || 0)
            )[0]?.name,
            avgHomePrice: calculateAvgPrice(zipAgents),
        })
    );

    return marketMetrics.sort(
        (a, b) => b.agentCount - a.agentCount
    );
}
Enter fullscreen mode Exit fullscreen mode

Legal and Ethical Considerations

When scraping Zillow agent data, keep these guidelines in mind:

  • Respect robots.txt — check Zillow's robots.txt for restricted paths
  • Rate limit your requests — don't overwhelm their servers
  • Use data responsibly — agent contact info is for legitimate business purposes
  • Don't misrepresent yourself — don't create fake accounts to access data
  • Review Zillow's Terms of Service — understand what they permit
  • Comply with data protection laws — CCPA, state real estate regulations

Conclusion

Zillow agent data scraping is a powerful tool for real estate market intelligence. By extracting realtor profiles, listing portfolios, reviews, and geographic coverage data, you can build applications that provide genuine value to the real estate industry.

The combination of Crawlee's robust scraping framework and Apify's cloud infrastructure makes it possible to collect and maintain this data at scale. Whether you're building a lead generation platform, conducting market research, or developing PropTech applications, the techniques covered in this guide give you a solid foundation to work from.

Start small with a specific geographic area, validate your data quality, and scale up as you refine your approach. The real estate data landscape is rich with opportunity for those who know how to extract and analyze it effectively.


Looking for ready-to-use Zillow scraping solutions? Check out the Apify Store for pre-built actors that handle all the complexity for you.

Top comments (0)