How to Scrape Zillow Zestimate and Property History
Zillow is the largest real estate marketplace in the US, and its Zestimate algorithm provides property valuations for over 100 million homes. Extracting this data programmatically opens up powerful research and investment analysis opportunities.
What Data Is Available?
- Zestimate: Zillow's automated property valuation
- Price history: Past sales, price changes, listings
- Property details: beds, baths, square footage, lot size
- Tax assessment: County tax valuations
- Rental Zestimate: Estimated monthly rent
Setup
pip install requests beautifulsoup4 pandas lxml
Scraping Property Data
Zillow serves most content via JavaScript, but property data is embedded in the page source:
import requests
from bs4 import BeautifulSoup
import json
def scrape_zillow_property(zpid):
url = f"https://www.zillow.com/homedetails/{zpid}_zpid/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9"
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
script_tags = soup.find_all("script", {"type": "application/json"})
for script in script_tags:
try:
data = json.loads(script.string)
if "props" in str(data)[:100]:
return data
except (json.JSONDecodeError, TypeError):
continue
return parse_meta_tags(soup)
def parse_meta_tags(soup):
info = {}
og_title = soup.find("meta", {"property": "og:title"})
if og_title:
info["title"] = og_title.get("content", "")
description = soup.find("meta", {"name": "description"})
if description:
info["description"] = description.get("content", "")
return info
Searching Properties by Location
def search_zillow(location, page=1):
url = "https://www.zillow.com/search/GetSearchPageState.htm"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "application/json"
}
params = {
"searchQueryState": json.dumps({
"usersSearchTerm": location,
"isMapVisible": False,
"filterState": {
"sort": {"value": "days"},
"ah": {"value": True}
},
"pagination": {"currentPage": page}
}),
"wants": json.dumps({"cat1": ["listResults"]}),
"requestId": 1
}
response = requests.get(url, headers=headers, params=params)
if response.status_code == 200:
return response.json()
return None
results = search_zillow("Austin, TX")
Building a Comparable Sales Analyzer
import pandas as pd
def analyze_comps(properties):
df = pd.DataFrame(properties)
print("=== Comparable Sales Analysis ===")
print(f"Properties analyzed: {len(df)}")
print(f"\nPrice Statistics:")
print(f" Median: ${df['price'].median():,.0f}")
print(f" Average: ${df['price'].mean():,.0f}")
print(f" Price/sqft: ${(df['price'] / df['sqft']).median():,.0f}")
print(f"\nProperty Details:")
print(f" Avg beds: {df['beds'].mean():.1f}")
print(f" Avg baths: {df['baths'].mean():.1f}")
print(f" Avg sqft: {df['sqft'].mean():,.0f}")
return df
Handling Zillow's Anti-Bot Protection
Zillow has aggressive bot detection. For reliable scraping, you need proxy rotation. ScraperAPI handles Zillow's JavaScript rendering and CAPTCHA challenges:
def zillow_via_proxy(url):
params = {
"api_key": "YOUR_SCRAPERAPI_KEY",
"url": url,
"render": "true",
"country_code": "us"
}
return requests.get("https://api.scraperapi.com", params=params)
ThorData residential proxies are essential for Zillow since they detect datacenter IPs. ScrapeOps helps you track which proxy configurations have the best success rates against Zillow.
Exporting Data
def export_results(properties, filename="zillow_data"):
df = pd.DataFrame(properties)
df.to_csv(f"{filename}.csv", index=False)
df.to_json(f"{filename}.json", orient="records", indent=2)
print(f"Exported {len(df)} properties to {filename}.csv and {filename}.json")
Use Cases
- Investment analysis: Compare Zestimates vs actual sale prices
- Market trends: Track neighborhood appreciation over time
- Rental yield: Calculate cap rates using Zestimate and rental Zestimate
- Flipping analysis: Find undervalued properties vs their Zestimate
Legal Considerations
Zillow's Terms of Service prohibit automated scraping. This tutorial is for educational purposes. For commercial real estate data needs, consider Zillow's official API, ATTOM Data, or Redfin's data products.
Follow for more Python data collection tutorials!
Top comments (0)