Real estate data drives investment decisions, market analysis, and price comparison tools. Zillow holds the largest database of US property listings — over 100 million homes — including Zestimates, price histories, and listing details.
This guide covers practical methods to extract Zillow property data in 2026, what works, what does not, and the tradeoffs between each approach.
Why Zillow Data Is Valuable
Zillow has data that matters for real estate analysis:
- Active listings: price, beds, baths, sqft, listing agent, days on market
- Zestimates: proprietary home value estimates (updated regularly)
- Price history: past sales, tax assessments, listing price changes
- Neighborhood data: school ratings, walkability scores, crime stats
- Rental Zestimates: estimated monthly rent for any property
Investors, proptech startups, and data analysts scrape this data for portfolio analysis, comp research, and automated market monitoring.
Method 1: Hidden API Endpoints
The Zillow frontend makes API calls to internal endpoints that return structured JSON. This is much cleaner than parsing HTML.
The main endpoint for search results:
import requests
import json
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "application/json",
"Referer": "https://www.zillow.com/"
}
# Zillow search API - returns listings for a region
search_url = "https://www.zillow.com/search/GetSearchPageState.htm"
params = {
"searchQueryState": json.dumps({
"pagination": {},
"mapBounds": {
"north": 37.8199,
"south": 37.7034,
"east": -122.3482,
"west": -122.5277
},
"filterState": {
"sort": {"value": "days"},
"ah": {"value": True}, # Include all homes
"price": {"min": 500000, "max": 2000000},
"beds": {"min": 2}
},
"isMapVisible": True,
"isListVisible": True
}),
"wants": json.dumps({
"cat1": ["listResults", "mapResults"]
}),
"requestId": 3
}
response = requests.get(search_url, params=params, headers=headers)
if response.status_code == 200:
data = response.json()
results = data.get("cat1", {}).get("searchResults", {}).get("listResults", [])
for listing in results[:5]:
detail = listing.get("hdpData", {}).get("homeInfo", {})
print(f"Address: {detail.get('streetAddress')}, {detail.get('city')}")
print(f"Price: ${detail.get('price', 'N/A'):,}")
print(f"Beds: {detail.get('bedrooms')} | Baths: {detail.get('bathrooms')}")
print(f"Sqft: {detail.get('livingArea', 'N/A')}")
print(f"Zestimate: ${detail.get('zestimate', 'N/A'):,}")
print(f"Days on Zillow: {detail.get('daysOnZillow', 'N/A')}")
print("---")
else:
print(f"Request failed: {response.status_code}")
What the API returns: Each result includes zpid (Zillow Property ID), price, address, coordinates, home type, listing status, and basic property details.
Getting Detailed Property Data
For individual property details (including price history and tax records), use the property detail endpoint:
def get_property_details(zpid: int) -> dict:
"""Fetch full property details by Zillow Property ID."""
url = "https://www.zillow.com/graphql/"
payload = {
"query": """query GetHomeDetails($zpid: ID!) {
property(zpid: $zpid) {
address { streetAddress city state zipcode }
price
zestimate
rentZestimate
bedrooms
bathrooms
livingArea
yearBuilt
homeType
priceHistory { date price event }
taxHistory { year taxPaid value }
}
}""",
"variables": {"zpid": str(zpid)}
}
headers = {
"Content-Type": "application/json",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
resp = requests.post(url, json=payload, headers=headers)
return resp.json()
# Example: get details for a specific property
details = get_property_details(2077546867)
Important caveat: These internal APIs are undocumented and change without notice. Zillow frequently modifies endpoint URLs, query parameters, and response schemas. Code that works today may break next month.
Method 2: Playwright for JavaScript-Rendered Pages
Zillow search pages rely heavily on JavaScript rendering. If the API approach stops working, browser automation is the fallback:
import asyncio
from playwright.async_api import async_playwright
async def scrape_zillow_listings(location: str, max_pages: int = 3):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
page = await context.new_page()
search_url = f"https://www.zillow.com/{location.replace(' ', '-')}"
await page.goto(search_url, wait_until="networkidle")
all_listings = []
for page_num in range(max_pages):
await page.wait_for_selector(
'[data-test="property-card"]', timeout=15000
)
cards = await page.query_selector_all(
'[data-test="property-card"]'
)
for card in cards:
try:
price_el = await card.query_selector(
'[data-test="property-card-price"]'
)
addr_el = await card.query_selector('address')
details_el = await card.query_selector(
'[data-test="property-card-details"]'
)
listing = {
"price": await price_el.inner_text() if price_el else None,
"address": await addr_el.inner_text() if addr_el else None,
"details": await details_el.inner_text() if details_el else None,
}
all_listings.append(listing)
except Exception:
continue
next_btn = await page.query_selector('a[rel="next"]')
if next_btn and page_num < max_pages - 1:
await next_btn.click()
await page.wait_for_load_state("networkidle")
await asyncio.sleep(2)
else:
break
await browser.close()
return all_listings
listings = asyncio.run(scrape_zillow_listings("san-francisco-ca"))
for l in listings[:5]:
print(l)
The Anti-Bot Problem
Zillow uses aggressive bot detection in 2026. You will encounter:
- Akamai Bot Manager: Fingerprints browser behavior, TLS signatures, and JavaScript execution
- Rate limiting: Too many requests from one IP triggers CAPTCHAs or blocks
- JavaScript challenges: Pages require JS execution to render content
- Session validation: Cookies and tokens are checked across requests
Dealing with Blocks
For small-scale scraping (under 1,000 pages), rotating your IP and using realistic headers is usually enough. For anything larger, you need a proxy service with residential IPs.
ScraperAPI handles the anti-bot bypass for you — it rotates IPs, manages headers, and renders JavaScript automatically. For Zillow specifically, enable the render=true parameter:
import requests
SCRAPERAPI_KEY = "YOUR_KEY"
def scrape_with_proxy(url: str) -> str:
"""Use ScraperAPI to bypass anti-bot protection."""
proxy_url = (
f"http://api.scraperapi.com"
f"?api_key={SCRAPERAPI_KEY}&url={url}&render=true"
)
resp = requests.get(proxy_url, timeout=60)
return resp.text
html = scrape_with_proxy("https://www.zillow.com/san-francisco-ca/")
If you are doing high-volume residential property monitoring, ThorData residential proxies give you a pool of real residential IPs that look like normal home internet traffic — which matters for Zillow since they flag datacenter IP ranges aggressively.
Structuring the Output
Once you have raw listings, normalize them into a clean format:
import csv
from datetime import datetime
def save_listings_to_csv(listings: list, filename: str = "zillow_data.csv"):
fieldnames = [
"address", "city", "state", "zip", "price", "zestimate",
"beds", "baths", "sqft", "year_built", "home_type",
"days_on_zillow", "scraped_at"
]
with open(filename, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for listing in listings:
info = listing.get("hdpData", {}).get("homeInfo", {})
writer.writerow({
"address": info.get("streetAddress", ""),
"city": info.get("city", ""),
"state": info.get("state", ""),
"zip": info.get("zipcode", ""),
"price": info.get("price", ""),
"zestimate": info.get("zestimate", ""),
"beds": info.get("bedrooms", ""),
"baths": info.get("bathrooms", ""),
"sqft": info.get("livingArea", ""),
"year_built": info.get("yearBuilt", ""),
"home_type": info.get("homeType", ""),
"days_on_zillow": info.get("daysOnZillow", ""),
"scraped_at": datetime.now().isoformat()
})
print(f"Saved {len(listings)} listings to {filename}")
Legal Considerations
Zillow Terms of Service prohibit automated scraping. The robots.txt blocks most crawlers. Legally, the landscape is nuanced:
- hiQ v. LinkedIn (2022): The Ninth Circuit ruled that scraping publicly accessible data does not violate the CFAA. This precedent generally applies to public listing pages.
- But: The ToS is a contract you implicitly agree to by using the site. Violating it could expose you to civil (not criminal) liability.
- Practical advice: Do not scrape at aggressive rates, do not bypass authentication, and do not redistribute raw data as-is at scale. Use the data for analysis, not to clone Zillow.
What Actually Works at Scale
For production use cases (monitoring hundreds of markets, tracking thousands of properties), building and maintaining your own Zillow scraper is expensive. The anti-bot systems change frequently, and one bad deployment can get your entire IP range blocked.
If you are scraping real estate data regularly, check out the scrapers on the Apify Store — they handle anti-bot, proxy rotation, and output formatting so you can focus on analysis rather than infrastructure.
Summary
| Method | Best For | Difficulty | Reliability |
|---|---|---|---|
| Hidden API | Quick data pulls, search results | Medium | Breaks periodically |
| Playwright | Full page data, price history | High | More resilient to API changes |
| Proxy service | Bypassing blocks at scale | Low | Depends on provider |
| Pre-built scraper | Production monitoring | Low | Maintained by developer |
Start with the API approach for exploration, graduate to Playwright when you need detail, and use a proxy service or managed scraper when you need reliability at scale.
Top comments (0)