Real estate data drives billion-dollar decisions every day. Whether you're a data analyst tracking housing market trends, a real estate investor looking for undervalued properties, or a proptech startup building the next big thing — access to accurate, up-to-date property data is essential.
Zillow is the largest real estate marketplace in the United States, with over 110 million properties in its database. It's an incredibly rich source of property listings, price history, neighborhood statistics, and market trends. In this guide, we'll explore how to extract this data programmatically, understand Zillow's data structure, and build a reliable scraping pipeline.
Understanding Zillow's Data Structure
Before writing any code, it's crucial to understand how Zillow organizes its data. The site is structured around several key entities:
Property Listings
Each property on Zillow has a detailed listing page containing:
- Basic info: Address, city, state, ZIP code
- Price data: List price, Zestimate (Zillow's proprietary valuation), rent Zestimate, price history
- Property details: Bedrooms, bathrooms, square footage, lot size, year built
- Features: Heating/cooling type, parking, appliances, interior features
- Tax information: Annual property tax, tax assessment history
- Photos and virtual tours: High-resolution images, 3D tours
Search Results Pages
Zillow's search is map-based and returns results in a grid/list format. Each search result contains a subset of the full listing data:
{
"zpid": "18429834",
"address": "123 Main St, Austin, TX 78701",
"price": 450000,
"bedrooms": 3,
"bathrooms": 2,
"livingArea": 1850,
"lotSize": 6500,
"zestimate": 465000,
"rentZestimate": 2400,
"daysOnZillow": 14,
"listingStatus": "FOR_SALE",
"latitude": 30.2672,
"longitude": -97.7431
}
Neighborhood Data
Zillow aggregates data at the neighborhood level, providing:
- Median home value and trends
- Median rent prices
- School ratings and proximity
- Walk score, transit score, bike score
- Crime statistics
- Demographics
The Technical Challenges of Scraping Zillow
Zillow is a modern React-based single-page application. This presents several challenges for web scraping:
1. Dynamic Content Rendering
Most property data is loaded via JavaScript after the initial page load. A simple HTTP request with fetch or axios won't return the full page content. You need a headless browser or access to the underlying API endpoints.
// This WON'T work - you'll get an empty shell
const response = await fetch('https://www.zillow.com/homedetails/123-Main-St/18429834_zpid/');
const html = await response.text();
// html contains React app shell, not property data
2. Search Pagination and Map Boundaries
Zillow's search is map-based, meaning results are bounded by geographic coordinates rather than traditional page numbers. As users pan and zoom the map, new API requests are made with updated bounding box coordinates.
The search API uses a query structure like:
const searchQuery = {
searchQueryState: {
mapBounds: {
north: 30.35,
south: 30.20,
east: -97.65,
west: -97.85
},
filterState: {
price: { min: 200000, max: 500000 },
beds: { min: 2 },
baths: { min: 1 },
sqft: { min: 1000 }
},
pagination: { currentPage: 1 }
}
};
3. Rate Limiting and Anti-Bot Measures
Zillow employs several anti-scraping measures:
- Rate limiting based on IP address
- CAPTCHA challenges for suspicious traffic patterns
- Browser fingerprinting
- Request header validation
4. Data Freshness
Property listings change frequently — new listings appear, prices get updated, properties go under contract. Any scraping solution needs to handle incremental updates efficiently.
Building a Zillow Scraper: Step by Step
Let's walk through building a scraper that handles these challenges. We'll use a headless browser approach with Playwright to render JavaScript content.
Step 1: Setting Up the Environment
const { chromium } = require('playwright');
async function initBrowser() {
const browser = await chromium.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage'
]
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1920, height: 1080 }
});
return { browser, context };
}
Step 2: Extracting Search Results
The key insight is intercepting Zillow's internal API calls rather than parsing the rendered HTML:
async function scrapeSearchResults(context, searchUrl) {
const page = await context.newPage();
const results = [];
// Intercept API responses
page.on('response', async (response) => {
const url = response.url();
if (url.includes('GetSearchPageState') ||
url.includes('search/GetSearchResults')) {
try {
const data = await response.json();
const listings = extractListings(data);
results.push(...listings);
} catch (e) {
// Not all responses are JSON
}
}
});
await page.goto(searchUrl, { waitUntil: 'networkidle' });
await page.waitForTimeout(3000);
await page.close();
return results;
}
function extractListings(apiResponse) {
const listings = [];
// Navigate the nested response structure
const searchResults = apiResponse?.cat1?.searchResults?.listResults || [];
for (const result of searchResults) {
listings.push({
zpid: result.zpid,
address: result.address,
price: result.unformattedPrice || result.price,
bedrooms: result.beds,
bathrooms: result.baths,
sqft: result.area,
zestimate: result.zestimate,
listingStatus: result.statusType,
daysOnZillow: result.variableData?.text,
latitude: result.latLong?.latitude,
longitude: result.latLong?.longitude,
detailUrl: result.detailUrl,
imgSrc: result.imgSrc
});
}
return listings;
}
Step 3: Scraping Individual Property Details
For detailed property data, we need to visit each listing page:
async function scrapePropertyDetails(context, propertyUrl) {
const page = await context.newPage();
await page.goto(propertyUrl, { waitUntil: 'networkidle' });
await page.waitForTimeout(2000);
// Extract data from the __NEXT_DATA__ script tag
const propertyData = await page.evaluate(() => {
const scriptTag = document.querySelector('#__NEXT_DATA__');
if (scriptTag) {
const data = JSON.parse(scriptTag.textContent);
return data?.props?.pageProps?.componentProps?.gdpClientCache;
}
return null;
});
if (propertyData) {
// The data is nested under the ZPID key
const zpid = Object.keys(propertyData)[0];
const details = JSON.parse(propertyData[zpid]);
return {
zpid: details.zpid,
address: details.address,
price: details.price,
zestimate: details.zestimate,
rentZestimate: details.rentZestimate,
bedrooms: details.bedrooms,
bathrooms: details.bathrooms,
livingArea: details.livingArea,
lotSize: details.lotAreaValue,
yearBuilt: details.yearBuilt,
propertyType: details.homeType,
description: details.description,
priceHistory: details.priceHistory,
taxHistory: details.taxHistory,
schools: details.schools,
walkScore: details.walkScore,
transitScore: details.transitScore,
bikeScore: details.bikeScore,
photos: details.photos?.map(p => p.url),
lastSoldPrice: details.lastSoldPrice,
lastSoldDate: details.dateSold
};
}
await page.close();
return null;
}
Step 4: Handling Pagination Across Map Regions
Since Zillow limits results to about 40 pages (800 listings) per search, you need to split large areas into smaller geographic tiles:
function generateMapTiles(bounds, gridSize = 4) {
const tiles = [];
const latStep = (bounds.north - bounds.south) / gridSize;
const lngStep = (bounds.east - bounds.west) / gridSize;
for (let i = 0; i < gridSize; i++) {
for (let j = 0; j < gridSize; j++) {
tiles.push({
north: bounds.south + latStep * (i + 1),
south: bounds.south + latStep * i,
east: bounds.west + lngStep * (j + 1),
west: bounds.west + lngStep * j
});
}
}
return tiles;
}
// Example: Split Austin, TX into 16 tiles
const austinBounds = {
north: 30.50, south: 30.15,
east: -97.55, west: -97.95
};
const tiles = generateMapTiles(austinBounds);
Step 5: Extracting the Zestimate and Price History
The Zestimate is one of Zillow's most valuable data points. Here's how to extract price history trends:
function analyzePriceHistory(priceHistory) {
if (!priceHistory || priceHistory.length === 0) return null;
const salesHistory = priceHistory
.filter(event => event.event === 'Sold')
.sort((a, b) => new Date(b.date) - new Date(a.date));
const priceChanges = priceHistory
.filter(event => event.event === 'Price change')
.map(event => ({
date: event.date,
price: event.price,
priceChange: event.priceChangeRate
}));
return {
lastSoldPrice: salesHistory[0]?.price,
lastSoldDate: salesHistory[0]?.date,
numberOfSales: salesHistory.length,
priceChanges: priceChanges,
appreciation: salesHistory.length >= 2
? ((salesHistory[0].price - salesHistory[1].price) / salesHistory[1].price * 100).toFixed(2) + '%'
: 'N/A'
};
}
Scaling with Apify: The Production Approach
Building and maintaining a custom Zillow scraper is complex. You need to handle browser management, proxy rotation, CAPTCHA solving, and infrastructure scaling. This is where the Apify platform shines.
The Apify Store offers ready-made Zillow scrapers that handle all the technical complexity. These actors run in the cloud, manage headless browsers automatically, rotate proxies, and output clean, structured data.
Using an Apify Zillow Actor
Here's how to run a Zillow scraper on Apify:
const { ApifyClient } = require('apify-client');
const client = new ApifyClient({
token: 'YOUR_APIFY_TOKEN'
});
async function scrapeZillowWithApify() {
const run = await client.actor('zillow-scraper').call({
searchUrls: [
'https://www.zillow.com/austin-tx/',
'https://www.zillow.com/denver-co/'
],
maxItems: 500,
includeDetails: true,
proxy: {
useApifyProxy: true,
apifyProxyGroups: ['RESIDENTIAL']
}
});
// Fetch results from the dataset
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} properties`);
// Export to CSV
const csv = items.map(item =>
`${item.address},${item.price},${item.bedrooms},${item.bathrooms},${item.sqft},${item.zestimate}`
).join('\n');
return items;
}
Benefits of Using Apify for Zillow Scraping
- Built-in proxy rotation: Residential proxies that avoid IP bans
- Auto-scaling: Run multiple browser instances in parallel
- Scheduled runs: Set up daily/weekly data collection
- Data storage: Results stored in datasets with export to CSV, JSON, Excel
- Monitoring: Get alerts if scraping fails or data quality drops
- API access: Integrate scraped data into your applications
Use Cases for Zillow Data
Real Estate Investment Analysis
function calculateInvestmentMetrics(property) {
const monthlyRent = property.rentZestimate;
const purchasePrice = property.price;
const annualRent = monthlyRent * 12;
// Gross Rent Multiplier
const grm = purchasePrice / annualRent;
// Cap Rate (simplified)
const operatingExpenses = annualRent * 0.4; // 40% expense ratio
const noi = annualRent - operatingExpenses;
const capRate = (noi / purchasePrice * 100).toFixed(2);
// Cash-on-Cash Return (assuming 25% down, 7% rate, 30yr)
const downPayment = purchasePrice * 0.25;
const loanAmount = purchasePrice * 0.75;
const monthlyPayment = loanAmount * (0.07/12) / (1 - Math.pow(1 + 0.07/12, -360));
const annualCashFlow = noi - (monthlyPayment * 12);
const cashOnCash = (annualCashFlow / downPayment * 100).toFixed(2);
return {
grossRentMultiplier: grm.toFixed(1),
capRate: capRate + '%',
cashOnCashReturn: cashOnCash + '%',
monthlyNOI: (noi / 12).toFixed(0),
monthlyMortgage: monthlyPayment.toFixed(0)
};
}
Market Trend Analysis
function analyzeMarketTrends(properties) {
const byZip = {};
for (const prop of properties) {
const zip = prop.zipCode;
if (!byZip[zip]) byZip[zip] = [];
byZip[zip].push(prop);
}
const trends = Object.entries(byZip).map(([zip, props]) => {
const prices = props.map(p => p.price).filter(Boolean);
const zestimates = props.map(p => p.zestimate).filter(Boolean);
const daysOnMarket = props.map(p => p.daysOnZillow).filter(Boolean);
return {
zipCode: zip,
listingCount: props.length,
medianPrice: median(prices),
avgPricePerSqft: average(props.map(p => p.price / p.sqft).filter(Boolean)),
medianDaysOnMarket: median(daysOnMarket),
avgZestimateVsPrice: average(
props.map(p => p.zestimate && p.price
? ((p.zestimate - p.price) / p.price * 100) : null
).filter(Boolean)
).toFixed(1) + '%'
};
});
return trends.sort((a, b) => b.listingCount - a.listingCount);
}
Comparative Market Analysis (CMA)
function findComparables(target, allProperties, radius = 0.5) {
return allProperties
.filter(p => p.zpid !== target.zpid)
.filter(p => {
const distance = haversine(
target.latitude, target.longitude,
p.latitude, p.longitude
);
return distance <= radius; // Within 0.5 miles
})
.filter(p => {
return Math.abs(p.bedrooms - target.bedrooms) <= 1
&& Math.abs(p.sqft - target.sqft) / target.sqft <= 0.2
&& Math.abs(p.yearBuilt - target.yearBuilt) <= 10;
})
.sort((a, b) => {
// Score by similarity
const scoreA = similarity(target, a);
const scoreB = similarity(target, b);
return scoreB - scoreA;
})
.slice(0, 5);
}
Best Practices and Legal Considerations
- Respect robots.txt: Always check the site's robots.txt file
- Rate limiting: Add delays between requests (2-5 seconds minimum)
- Caching: Store results locally to avoid redundant requests
- Terms of Service: Review Zillow's ToS regarding automated access
- Data usage: Be mindful of how you use and redistribute the data
- Proxy rotation: Use residential proxies to distribute requests
Conclusion
Scraping Zillow provides access to one of the richest real estate datasets available. Whether you're building investment analysis tools, market research dashboards, or proptech applications, the property data available through Zillow is invaluable.
For production use cases, leveraging a platform like Apify with its ready-made actors, proxy management, and scheduling capabilities will save you significant development and maintenance time. Check the Apify Store for Zillow-specific actors that handle all the complexity of browser automation, anti-bot bypassing, and data extraction.
The real estate market moves fast — having automated, reliable data collection gives you the edge to make informed decisions before opportunities disappear.
This article is for educational purposes. Always ensure your scraping activities comply with applicable terms of service and local regulations.
Top comments (0)