agenthustler

Posted on Mar 26

How to Scrape Zillow Zestimate and Property History

#python #tutorial #webdev #programming

How to Scrape Zillow Zestimate and Property History

Zillow is the largest real estate marketplace in the US, and its Zestimate algorithm provides property valuations for over 100 million homes. Extracting this data programmatically opens up powerful research and investment analysis opportunities.

What Data Is Available?

Zestimate: Zillow's automated property valuation
Price history: Past sales, price changes, listings
Property details: beds, baths, square footage, lot size
Tax assessment: County tax valuations
Rental Zestimate: Estimated monthly rent

Setup

pip install requests beautifulsoup4 pandas lxml

Scraping Property Data

Zillow serves most content via JavaScript, but property data is embedded in the page source:

import requests
from bs4 import BeautifulSoup
import json

def scrape_zillow_property(zpid):
    url = f"https://www.zillow.com/homedetails/{zpid}_zpid/"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9"
    }

    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")

    script_tags = soup.find_all("script", {"type": "application/json"})
    for script in script_tags:
        try:
            data = json.loads(script.string)
            if "props" in str(data)[:100]:
                return data
        except (json.JSONDecodeError, TypeError):
            continue

    return parse_meta_tags(soup)

def parse_meta_tags(soup):
    info = {}
    og_title = soup.find("meta", {"property": "og:title"})
    if og_title:
        info["title"] = og_title.get("content", "")
    description = soup.find("meta", {"name": "description"})
    if description:
        info["description"] = description.get("content", "")
    return info

Searching Properties by Location

def search_zillow(location, page=1):
    url = "https://www.zillow.com/search/GetSearchPageState.htm"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept": "application/json"
    }

    params = {
        "searchQueryState": json.dumps({
            "usersSearchTerm": location,
            "isMapVisible": False,
            "filterState": {
                "sort": {"value": "days"},
                "ah": {"value": True}
            },
            "pagination": {"currentPage": page}
        }),
        "wants": json.dumps({"cat1": ["listResults"]}),
        "requestId": 1
    }

    response = requests.get(url, headers=headers, params=params)
    if response.status_code == 200:
        return response.json()
    return None

results = search_zillow("Austin, TX")

Building a Comparable Sales Analyzer

import pandas as pd

def analyze_comps(properties):
    df = pd.DataFrame(properties)

    print("=== Comparable Sales Analysis ===")
    print(f"Properties analyzed: {len(df)}")
    print(f"\nPrice Statistics:")
    print(f"  Median: ${df['price'].median():,.0f}")
    print(f"  Average: ${df['price'].mean():,.0f}")
    print(f"  Price/sqft: ${(df['price'] / df['sqft']).median():,.0f}")

    print(f"\nProperty Details:")
    print(f"  Avg beds: {df['beds'].mean():.1f}")
    print(f"  Avg baths: {df['baths'].mean():.1f}")
    print(f"  Avg sqft: {df['sqft'].mean():,.0f}")

    return df

Handling Zillow's Anti-Bot Protection

Zillow has aggressive bot detection. For reliable scraping, you need proxy rotation. ScraperAPI handles Zillow's JavaScript rendering and CAPTCHA challenges:

def zillow_via_proxy(url):
    params = {
        "api_key": "YOUR_SCRAPERAPI_KEY",
        "url": url,
        "render": "true",
        "country_code": "us"
    }
    return requests.get("https://api.scraperapi.com", params=params)

ThorData residential proxies are essential for Zillow since they detect datacenter IPs. ScrapeOps helps you track which proxy configurations have the best success rates against Zillow.

Exporting Data

def export_results(properties, filename="zillow_data"):
    df = pd.DataFrame(properties)
    df.to_csv(f"{filename}.csv", index=False)
    df.to_json(f"{filename}.json", orient="records", indent=2)
    print(f"Exported {len(df)} properties to {filename}.csv and {filename}.json")

Use Cases

Investment analysis: Compare Zestimates vs actual sale prices
Market trends: Track neighborhood appreciation over time
Rental yield: Calculate cap rates using Zestimate and rental Zestimate
Flipping analysis: Find undervalued properties vs their Zestimate

Legal Considerations

Zillow's Terms of Service prohibit automated scraping. This tutorial is for educational purposes. For commercial real estate data needs, consider Zillow's official API, ATTOM Data, or Redfin's data products.

Follow for more Python data collection tutorials!

DEV Community

How to Scrape Zillow Zestimate and Property History

How to Scrape Zillow Zestimate and Property History

What Data Is Available?

Setup

Scraping Property Data

Searching Properties by Location

Building a Comparable Sales Analyzer

Handling Zillow's Anti-Bot Protection

Exporting Data

Use Cases

Legal Considerations

Top comments (0)