Web scraping Zillow agent data opens doors to powerful real estate market intelligence. Whether you're building a CRM for mortgage brokers, analyzing market coverage by agents in specific zip codes, or comparing agent performance across regions, extracting realtor profiles and their associated listings gives you a competitive edge that manual research simply can't match.
In this comprehensive guide, we'll walk through how Zillow structures its agent data, practical approaches to extracting profiles, listing portfolios, reviews, and geographic data, and how to scale the entire process using Apify's cloud infrastructure.
Why Scrape Zillow Agent Data?
Zillow is the largest real estate marketplace in the United States, with over 200 million monthly visitors. Beyond property listings, Zillow hosts detailed profiles for more than 3 million real estate agents and brokers. Each profile contains a wealth of structured data:
- Agent contact information — name, brokerage, phone number, website
- Active and sold listings — the agent's current portfolio and historical transactions
- Client reviews and ratings — star ratings, review text, reviewer names
- Service areas — zip codes and neighborhoods the agent covers
- Specializations — buyer's agent, seller's agent, relocation, foreclosures
- Sales history — total homes sold, price ranges, average sale price
This data powers use cases like:
- Lead generation platforms that match buyers/sellers with top agents
- Market analysis — which agents dominate which zip codes?
- Competitive intelligence for brokerages entering new markets
- Mortgage and title companies building referral networks
- PropTech startups that need agent data for their products
Understanding Zillow's Agent Page Structure
Before writing any code, let's understand how Zillow organizes agent data. There are several entry points:
Agent Directory Pages
Zillow's agent finder is accessible at URLs like:
https://www.zillow.com/professionals/real-estate-agent-reviews/[city]-[state]/
These directory pages list agents with basic information — name, photo, brokerage, rating, and number of reviews. Pagination is handled through query parameters.
Individual Agent Profiles
Each agent has a dedicated profile page:
https://www.zillow.com/profile/[agent-username]/
The profile page contains the full dataset: bio, all contact details, active listings, past sales, reviews, and service areas.
Agent Search API
Zillow uses internal APIs to populate its search results. The most useful endpoint returns JSON data when you search for agents by location. The request structure typically includes:
// Zillow's internal agent search parameters
const searchParams = {
location: "San Francisco, CA",
page: 1,
specialties: "buyer_agent",
sort: "recommended",
language: "english"
};
Setting Up Your Scraping Environment
Let's build a scraper step by step. First, we need the right tools:
// package.json dependencies
{
"dependencies": {
"crawlee": "^3.8.0",
"playwright": "^1.42.0"
}
}
Crawlee is Apify's open-source web scraping library that handles retries, proxy rotation, and request queuing out of the box.
const { PlaywrightCrawler, Dataset } = require('crawlee');
const crawler = new PlaywrightCrawler({
maxConcurrency: 5,
requestHandlerTimeoutSecs: 120,
async requestHandler({ page, request, log }) {
const url = request.url;
if (url.includes('/professionals/')) {
await handleDirectoryPage(page, log);
} else if (url.includes('/profile/')) {
await handleProfilePage(page, request, log);
}
},
});
Extracting Agent Directory Listings
The directory page is our starting point. Here's how to extract the agent cards:
async function handleDirectoryPage(page, log) {
// Wait for agent cards to render
await page.waitForSelector('[data-test="professional-card"]', {
timeout: 30000
});
// Extract basic info from each card
const agents = await page.$$eval(
'[data-test="professional-card"]',
(cards) => cards.map(card => {
const name = card.querySelector(
'[data-test="professional-card-name"]'
)?.textContent?.trim();
const brokerage = card.querySelector(
'[data-test="professional-card-brokerage"]'
)?.textContent?.trim();
const rating = card.querySelector(
'.professional-card-rating'
)?.textContent?.trim();
const reviewCount = card.querySelector(
'.review-count'
)?.textContent?.match(/\d+/)?.[0];
const profileLink = card.querySelector(
'a[href*="/profile/"]'
)?.getAttribute('href');
return {
name,
brokerage,
rating: parseFloat(rating) || null,
reviewCount: parseInt(reviewCount) || 0,
profileUrl: profileLink
? `https://www.zillow.com${profileLink}`
: null,
};
})
);
log.info(`Found ${agents.length} agents on directory page`);
// Queue individual profile pages for detailed scraping
for (const agent of agents) {
if (agent.profileUrl) {
await crawler.addRequests([{
url: agent.profileUrl,
userData: { basicInfo: agent }
}]);
}
}
// Handle pagination
const nextButton = await page.$('a[rel="next"]');
if (nextButton) {
const nextUrl = await nextButton.getAttribute('href');
await crawler.addRequests([{
url: `https://www.zillow.com${nextUrl}`
}]);
}
}
Deep-Diving into Agent Profiles
The individual profile page is where the richest data lives. Let's extract everything:
async function handleProfilePage(page, request, log) {
const basicInfo = request.userData.basicInfo || {};
// Wait for the profile content to fully load
await page.waitForSelector('.profile-info', { timeout: 30000 });
// Extract contact information
const contactInfo = await page.evaluate(() => {
const phoneEl = document.querySelector(
'[data-test="profile-phone"]'
);
const emailEl = document.querySelector(
'[data-test="profile-email"]'
);
const websiteEl = document.querySelector(
'a[data-test="profile-website"]'
);
const licenseEl = document.querySelector(
'.license-number'
);
return {
phone: phoneEl?.textContent?.trim() || null,
email: emailEl?.textContent?.trim() || null,
website: websiteEl?.getAttribute('href') || null,
licenseNumber: licenseEl?.textContent?.trim() || null,
};
});
// Extract service areas and specializations
const serviceAreas = await page.evaluate(() => {
const areas = document.querySelectorAll(
'.service-areas-list li'
);
return Array.from(areas).map(a => a.textContent.trim());
});
const specializations = await page.evaluate(() => {
const specs = document.querySelectorAll(
'.specializations-list li'
);
return Array.from(specs).map(s => s.textContent.trim());
});
// Extract sales statistics
const salesStats = await page.evaluate(() => {
const statsContainer = document.querySelector(
'.sales-statistics'
);
if (!statsContainer) return {};
return {
totalSales: parseInt(
statsContainer.querySelector(
'.total-sales'
)?.textContent?.match(/\d+/)?.[0]
) || 0,
listingsActive: parseInt(
statsContainer.querySelector(
'.active-listings'
)?.textContent?.match(/\d+/)?.[0]
) || 0,
avgSalePrice: statsContainer.querySelector(
'.avg-price'
)?.textContent?.trim() || null,
priceRange: statsContainer.querySelector(
'.price-range'
)?.textContent?.trim() || null,
};
});
// Combine all data
const agentData = {
...basicInfo,
...contactInfo,
serviceAreas,
specializations,
salesStats,
scrapedAt: new Date().toISOString(),
sourceUrl: request.url,
};
await Dataset.pushData(agentData);
log.info(`Scraped profile: ${agentData.name}`);
}
Extracting Agent Reviews
Reviews are particularly valuable for sentiment analysis and agent ranking. Zillow often loads reviews dynamically, so we need to handle scroll-based loading:
async function extractReviews(page, log) {
const reviews = [];
let previousCount = 0;
// Scroll to load all reviews
while (true) {
const currentReviews = await page.$$eval(
'.review-card',
(cards) => cards.map(card => ({
reviewer: card.querySelector(
'.reviewer-name'
)?.textContent?.trim(),
rating: parseFloat(
card.querySelector(
'.review-rating'
)?.getAttribute('data-rating')
) || null,
date: card.querySelector(
'.review-date'
)?.textContent?.trim(),
text: card.querySelector(
'.review-text'
)?.textContent?.trim(),
transactionType: card.querySelector(
'.transaction-type'
)?.textContent?.trim(),
priceRange: card.querySelector(
'.transaction-price'
)?.textContent?.trim(),
}))
);
if (currentReviews.length === previousCount) break;
previousCount = currentReviews.length;
// Click "Show More" if available
const showMore = await page.$(
'button[data-test="show-more-reviews"]'
);
if (showMore) {
await showMore.click();
await page.waitForTimeout(2000);
} else {
break;
}
}
log.info(`Extracted ${reviews.length} reviews`);
return reviews;
}
Geographic Search: Finding Agents by Market
One of the most powerful features is searching for agents by geographic area. This lets you map agent coverage across entire markets:
async function scrapeAgentsByZipCode(zipCodes) {
const allAgents = [];
for (const zip of zipCodes) {
const searchUrl = `https://www.zillow.com/professionals/`
+ `real-estate-agent-reviews/?locationText=${zip}`;
await crawler.addRequests([{
url: searchUrl,
userData: {
searchType: 'geographic',
zipCode: zip
}
}]);
}
return allAgents;
}
// Example: scrape agents across the San Francisco Bay Area
const bayAreaZips = [
'94102', '94103', '94104', '94105', '94107',
'94108', '94109', '94110', '94111', '94112',
'94114', '94115', '94116', '94117', '94118',
'94121', '94122', '94123', '94124', '94127',
'94129', '94130', '94131', '94132', '94133',
'94134', '94158'
];
scrapeAgentsByZipCode(bayAreaZips);
This approach lets you build a comprehensive map of agent density, average ratings, and market specializations across any metro area.
Handling Zillow's Anti-Scraping Measures
Zillow employs several anti-scraping protections. Here's how to handle them responsibly:
Proxy Rotation
const { PlaywrightCrawler } = require('crawlee');
const crawler = new PlaywrightCrawler({
proxyConfiguration: new ProxyConfiguration({
proxyUrls: [
'http://proxy1.example.com:8080',
'http://proxy2.example.com:8080',
],
}),
// Rotate user agents
launchContext: {
launchOptions: {
args: ['--disable-blink-features=AutomationControlled'],
},
},
preNavigationHooks: [
async ({ page }) => {
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-US,en;q=0.9',
});
},
],
});
Rate Limiting
Always implement respectful rate limiting:
const crawler = new PlaywrightCrawler({
maxConcurrency: 3,
maxRequestsPerMinute: 20,
// Add random delays between requests
async requestHandler({ page, request }) {
const delay = Math.random() * 3000 + 2000;
await page.waitForTimeout(delay);
// Your scraping logic here
},
});
Session Management
const { SessionPool } = require('crawlee');
const sessionPool = new SessionPool({
maxPoolSize: 10,
sessionOptions: {
maxAgeSecs: 3600,
maxUsageCount: 50,
},
});
Scaling with Apify
While local scraping works for small datasets, real estate data at scale requires cloud infrastructure. Apify provides everything you need:
Deploying as an Apify Actor
const { Actor } = require('apify');
const { PlaywrightCrawler, Dataset } = require('crawlee');
Actor.main(async () => {
const input = await Actor.getInput();
const {
locations = ['San Francisco, CA'],
maxAgents = 100,
includeReviews = true,
includeListings = true,
} = input;
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'US',
});
const crawler = new PlaywrightCrawler({
proxyConfiguration,
maxConcurrency: 5,
maxRequestsPerMinute: 30,
async requestHandler({ page, request, log }) {
// Full scraping logic here
const agentData = await extractAgentData(page);
if (includeReviews) {
agentData.reviews = await extractReviews(page, log);
}
if (includeListings) {
agentData.listings = await extractListings(page, log);
}
await Dataset.pushData(agentData);
},
failedRequestHandler({ request, log }) {
log.error(`Failed: ${request.url}`);
},
});
// Build start URLs from locations
const startUrls = locations.map(loc => ({
url: `https://www.zillow.com/professionals/`
+ `real-estate-agent-reviews/`
+ `?locationText=${encodeURIComponent(loc)}`,
}));
await crawler.run(startUrls);
});
Why Use Apify for Zillow Agent Scraping?
- Residential proxies — Apify provides US residential proxies that are essential for accessing Zillow reliably
- Automatic scaling — scrape thousands of agent profiles concurrently without managing infrastructure
- Built-in storage — results are automatically stored in datasets you can export as JSON, CSV, or Excel
- Scheduling — set up recurring scrapes to track agent activity over time
- Ready-made actors — Apify Store has pre-built Zillow scrapers you can use immediately
Extracting Agent Listing Portfolios
An agent's active and sold listings tell you about their market focus:
async function extractListings(page, log) {
const listings = [];
// Navigate to listings tab
const listingsTab = await page.$(
'[data-test="listings-tab"]'
);
if (listingsTab) {
await listingsTab.click();
await page.waitForTimeout(2000);
}
const listingCards = await page.$$eval(
'.listing-card',
(cards) => cards.map(card => ({
address: card.querySelector(
'.listing-address'
)?.textContent?.trim(),
price: card.querySelector(
'.listing-price'
)?.textContent?.trim(),
status: card.querySelector(
'.listing-status'
)?.textContent?.trim(),
beds: parseInt(
card.querySelector('.beds')?.textContent
) || null,
baths: parseFloat(
card.querySelector('.baths')?.textContent
) || null,
sqft: parseInt(
card.querySelector('.sqft')
?.textContent?.replace(/,/g, '')
) || null,
listingUrl: card.querySelector('a')
?.getAttribute('href'),
photoUrl: card.querySelector('img')
?.getAttribute('src'),
}))
);
log.info(`Found ${listingCards.length} listings`);
return listingCards;
}
Data Processing and Analysis
Once you've collected agent data, here's how to derive insights:
function analyzeMarketCoverage(agents) {
// Group agents by zip code
const byZip = {};
for (const agent of agents) {
for (const area of agent.serviceAreas) {
if (!byZip[area]) byZip[area] = [];
byZip[area].push(agent);
}
}
// Calculate market metrics per zip
const marketMetrics = Object.entries(byZip).map(
([zip, zipAgents]) => ({
zipCode: zip,
agentCount: zipAgents.length,
avgRating: (
zipAgents.reduce((s, a) => s + (a.rating || 0), 0)
/ zipAgents.length
).toFixed(2),
topAgent: zipAgents.sort(
(a, b) => (b.salesStats?.totalSales || 0)
- (a.salesStats?.totalSales || 0)
)[0]?.name,
avgHomePrice: calculateAvgPrice(zipAgents),
})
);
return marketMetrics.sort(
(a, b) => b.agentCount - a.agentCount
);
}
Legal and Ethical Considerations
When scraping Zillow agent data, keep these guidelines in mind:
- Respect robots.txt — check Zillow's robots.txt for restricted paths
- Rate limit your requests — don't overwhelm their servers
- Use data responsibly — agent contact info is for legitimate business purposes
- Don't misrepresent yourself — don't create fake accounts to access data
- Review Zillow's Terms of Service — understand what they permit
- Comply with data protection laws — CCPA, state real estate regulations
Conclusion
Zillow agent data scraping is a powerful tool for real estate market intelligence. By extracting realtor profiles, listing portfolios, reviews, and geographic coverage data, you can build applications that provide genuine value to the real estate industry.
The combination of Crawlee's robust scraping framework and Apify's cloud infrastructure makes it possible to collect and maintain this data at scale. Whether you're building a lead generation platform, conducting market research, or developing PropTech applications, the techniques covered in this guide give you a solid foundation to work from.
Start small with a specific geographic area, validate your data quality, and scale up as you refine your approach. The real estate data landscape is rich with opportunity for those who know how to extract and analyze it effectively.
Looking for ready-to-use Zillow scraping solutions? Check out the Apify Store for pre-built actors that handle all the complexity for you.
Top comments (0)