DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Zillow Zestimate and Property History

How to Scrape Zillow Zestimate and Property History

Zillow is the largest real estate marketplace in the US, and its Zestimate algorithm provides property valuations for over 100 million homes. Extracting this data programmatically opens up powerful research and investment analysis opportunities.

What Data Is Available?

  • Zestimate: Zillow's automated property valuation
  • Price history: Past sales, price changes, listings
  • Property details: beds, baths, square footage, lot size
  • Tax assessment: County tax valuations
  • Rental Zestimate: Estimated monthly rent

Setup

pip install requests beautifulsoup4 pandas lxml
Enter fullscreen mode Exit fullscreen mode

Scraping Property Data

Zillow serves most content via JavaScript, but property data is embedded in the page source:

import requests
from bs4 import BeautifulSoup
import json

def scrape_zillow_property(zpid):
    url = f"https://www.zillow.com/homedetails/{zpid}_zpid/"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9"
    }

    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, "html.parser")

    script_tags = soup.find_all("script", {"type": "application/json"})
    for script in script_tags:
        try:
            data = json.loads(script.string)
            if "props" in str(data)[:100]:
                return data
        except (json.JSONDecodeError, TypeError):
            continue

    return parse_meta_tags(soup)

def parse_meta_tags(soup):
    info = {}
    og_title = soup.find("meta", {"property": "og:title"})
    if og_title:
        info["title"] = og_title.get("content", "")
    description = soup.find("meta", {"name": "description"})
    if description:
        info["description"] = description.get("content", "")
    return info
Enter fullscreen mode Exit fullscreen mode

Searching Properties by Location

def search_zillow(location, page=1):
    url = "https://www.zillow.com/search/GetSearchPageState.htm"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept": "application/json"
    }

    params = {
        "searchQueryState": json.dumps({
            "usersSearchTerm": location,
            "isMapVisible": False,
            "filterState": {
                "sort": {"value": "days"},
                "ah": {"value": True}
            },
            "pagination": {"currentPage": page}
        }),
        "wants": json.dumps({"cat1": ["listResults"]}),
        "requestId": 1
    }

    response = requests.get(url, headers=headers, params=params)
    if response.status_code == 200:
        return response.json()
    return None

results = search_zillow("Austin, TX")
Enter fullscreen mode Exit fullscreen mode

Building a Comparable Sales Analyzer

import pandas as pd

def analyze_comps(properties):
    df = pd.DataFrame(properties)

    print("=== Comparable Sales Analysis ===")
    print(f"Properties analyzed: {len(df)}")
    print(f"\nPrice Statistics:")
    print(f"  Median: ${df['price'].median():,.0f}")
    print(f"  Average: ${df['price'].mean():,.0f}")
    print(f"  Price/sqft: ${(df['price'] / df['sqft']).median():,.0f}")

    print(f"\nProperty Details:")
    print(f"  Avg beds: {df['beds'].mean():.1f}")
    print(f"  Avg baths: {df['baths'].mean():.1f}")
    print(f"  Avg sqft: {df['sqft'].mean():,.0f}")

    return df
Enter fullscreen mode Exit fullscreen mode

Handling Zillow's Anti-Bot Protection

Zillow has aggressive bot detection. For reliable scraping, you need proxy rotation. ScraperAPI handles Zillow's JavaScript rendering and CAPTCHA challenges:

def zillow_via_proxy(url):
    params = {
        "api_key": "YOUR_SCRAPERAPI_KEY",
        "url": url,
        "render": "true",
        "country_code": "us"
    }
    return requests.get("https://api.scraperapi.com", params=params)
Enter fullscreen mode Exit fullscreen mode

ThorData residential proxies are essential for Zillow since they detect datacenter IPs. ScrapeOps helps you track which proxy configurations have the best success rates against Zillow.

Exporting Data

def export_results(properties, filename="zillow_data"):
    df = pd.DataFrame(properties)
    df.to_csv(f"{filename}.csv", index=False)
    df.to_json(f"{filename}.json", orient="records", indent=2)
    print(f"Exported {len(df)} properties to {filename}.csv and {filename}.json")
Enter fullscreen mode Exit fullscreen mode

Use Cases

  1. Investment analysis: Compare Zestimates vs actual sale prices
  2. Market trends: Track neighborhood appreciation over time
  3. Rental yield: Calculate cap rates using Zestimate and rental Zestimate
  4. Flipping analysis: Find undervalued properties vs their Zestimate

Legal Considerations

Zillow's Terms of Service prohibit automated scraping. This tutorial is for educational purposes. For commercial real estate data needs, consider Zillow's official API, ATTOM Data, or Redfin's data products.


Follow for more Python data collection tutorials!

Top comments (0)