DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Zillow in 2026: Property Data, Listings, and Home Prices

Real estate data drives investment decisions, market analysis, and price comparison tools. Zillow holds the largest database of US property listings — over 100 million homes — including Zestimates, price histories, and listing details.

This guide covers practical methods to extract Zillow property data in 2026, what works, what does not, and the tradeoffs between each approach.

Why Zillow Data Is Valuable

Zillow has data that matters for real estate analysis:

  • Active listings: price, beds, baths, sqft, listing agent, days on market
  • Zestimates: proprietary home value estimates (updated regularly)
  • Price history: past sales, tax assessments, listing price changes
  • Neighborhood data: school ratings, walkability scores, crime stats
  • Rental Zestimates: estimated monthly rent for any property

Investors, proptech startups, and data analysts scrape this data for portfolio analysis, comp research, and automated market monitoring.

Method 1: Hidden API Endpoints

The Zillow frontend makes API calls to internal endpoints that return structured JSON. This is much cleaner than parsing HTML.

The main endpoint for search results:

import requests
import json

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept": "application/json",
    "Referer": "https://www.zillow.com/"
}

# Zillow search API - returns listings for a region
search_url = "https://www.zillow.com/search/GetSearchPageState.htm"

params = {
    "searchQueryState": json.dumps({
        "pagination": {},
        "mapBounds": {
            "north": 37.8199,
            "south": 37.7034,
            "east": -122.3482,
            "west": -122.5277
        },
        "filterState": {
            "sort": {"value": "days"},
            "ah": {"value": True},  # Include all homes
            "price": {"min": 500000, "max": 2000000},
            "beds": {"min": 2}
        },
        "isMapVisible": True,
        "isListVisible": True
    }),
    "wants": json.dumps({
        "cat1": ["listResults", "mapResults"]
    }),
    "requestId": 3
}

response = requests.get(search_url, params=params, headers=headers)

if response.status_code == 200:
    data = response.json()
    results = data.get("cat1", {}).get("searchResults", {}).get("listResults", [])

    for listing in results[:5]:
        detail = listing.get("hdpData", {}).get("homeInfo", {})
        print(f"Address: {detail.get('streetAddress')}, {detail.get('city')}")
        print(f"Price: ${detail.get('price', 'N/A'):,}")
        print(f"Beds: {detail.get('bedrooms')} | Baths: {detail.get('bathrooms')}")
        print(f"Sqft: {detail.get('livingArea', 'N/A')}")
        print(f"Zestimate: ${detail.get('zestimate', 'N/A'):,}")
        print(f"Days on Zillow: {detail.get('daysOnZillow', 'N/A')}")
        print("---")
else:
    print(f"Request failed: {response.status_code}")
Enter fullscreen mode Exit fullscreen mode

What the API returns: Each result includes zpid (Zillow Property ID), price, address, coordinates, home type, listing status, and basic property details.

Getting Detailed Property Data

For individual property details (including price history and tax records), use the property detail endpoint:

def get_property_details(zpid: int) -> dict:
    """Fetch full property details by Zillow Property ID."""
    url = "https://www.zillow.com/graphql/"

    payload = {
        "query": """query GetHomeDetails($zpid: ID!) {
            property(zpid: $zpid) {
                address { streetAddress city state zipcode }
                price
                zestimate
                rentZestimate
                bedrooms
                bathrooms
                livingArea
                yearBuilt
                homeType
                priceHistory { date price event }
                taxHistory { year taxPaid value }
            }
        }""",
        "variables": {"zpid": str(zpid)}
    }

    headers = {
        "Content-Type": "application/json",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    }

    resp = requests.post(url, json=payload, headers=headers)
    return resp.json()

# Example: get details for a specific property
details = get_property_details(2077546867)
Enter fullscreen mode Exit fullscreen mode

Important caveat: These internal APIs are undocumented and change without notice. Zillow frequently modifies endpoint URLs, query parameters, and response schemas. Code that works today may break next month.

Method 2: Playwright for JavaScript-Rendered Pages

Zillow search pages rely heavily on JavaScript rendering. If the API approach stops working, browser automation is the fallback:

import asyncio
from playwright.async_api import async_playwright

async def scrape_zillow_listings(location: str, max_pages: int = 3):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/120.0.0.0 Safari/537.36"
        )
        page = await context.new_page()

        search_url = f"https://www.zillow.com/{location.replace(' ', '-')}"
        await page.goto(search_url, wait_until="networkidle")

        all_listings = []

        for page_num in range(max_pages):
            await page.wait_for_selector(
                '[data-test="property-card"]', timeout=15000
            )

            cards = await page.query_selector_all(
                '[data-test="property-card"]'
            )

            for card in cards:
                try:
                    price_el = await card.query_selector(
                        '[data-test="property-card-price"]'
                    )
                    addr_el = await card.query_selector('address')
                    details_el = await card.query_selector(
                        '[data-test="property-card-details"]'
                    )

                    listing = {
                        "price": await price_el.inner_text() if price_el else None,
                        "address": await addr_el.inner_text() if addr_el else None,
                        "details": await details_el.inner_text() if details_el else None,
                    }
                    all_listings.append(listing)
                except Exception:
                    continue

            next_btn = await page.query_selector('a[rel="next"]')
            if next_btn and page_num < max_pages - 1:
                await next_btn.click()
                await page.wait_for_load_state("networkidle")
                await asyncio.sleep(2)
            else:
                break

        await browser.close()
        return all_listings

listings = asyncio.run(scrape_zillow_listings("san-francisco-ca"))
for l in listings[:5]:
    print(l)
Enter fullscreen mode Exit fullscreen mode

The Anti-Bot Problem

Zillow uses aggressive bot detection in 2026. You will encounter:

  1. Akamai Bot Manager: Fingerprints browser behavior, TLS signatures, and JavaScript execution
  2. Rate limiting: Too many requests from one IP triggers CAPTCHAs or blocks
  3. JavaScript challenges: Pages require JS execution to render content
  4. Session validation: Cookies and tokens are checked across requests

Dealing with Blocks

For small-scale scraping (under 1,000 pages), rotating your IP and using realistic headers is usually enough. For anything larger, you need a proxy service with residential IPs.

ScraperAPI handles the anti-bot bypass for you — it rotates IPs, manages headers, and renders JavaScript automatically. For Zillow specifically, enable the render=true parameter:

import requests

SCRAPERAPI_KEY = "YOUR_KEY"

def scrape_with_proxy(url: str) -> str:
    """Use ScraperAPI to bypass anti-bot protection."""
    proxy_url = (
        f"http://api.scraperapi.com"
        f"?api_key={SCRAPERAPI_KEY}&url={url}&render=true"
    )
    resp = requests.get(proxy_url, timeout=60)
    return resp.text

html = scrape_with_proxy("https://www.zillow.com/san-francisco-ca/")
Enter fullscreen mode Exit fullscreen mode

If you are doing high-volume residential property monitoring, ThorData residential proxies give you a pool of real residential IPs that look like normal home internet traffic — which matters for Zillow since they flag datacenter IP ranges aggressively.

Structuring the Output

Once you have raw listings, normalize them into a clean format:

import csv
from datetime import datetime

def save_listings_to_csv(listings: list, filename: str = "zillow_data.csv"):
    fieldnames = [
        "address", "city", "state", "zip", "price", "zestimate",
        "beds", "baths", "sqft", "year_built", "home_type",
        "days_on_zillow", "scraped_at"
    ]

    with open(filename, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()

        for listing in listings:
            info = listing.get("hdpData", {}).get("homeInfo", {})
            writer.writerow({
                "address": info.get("streetAddress", ""),
                "city": info.get("city", ""),
                "state": info.get("state", ""),
                "zip": info.get("zipcode", ""),
                "price": info.get("price", ""),
                "zestimate": info.get("zestimate", ""),
                "beds": info.get("bedrooms", ""),
                "baths": info.get("bathrooms", ""),
                "sqft": info.get("livingArea", ""),
                "year_built": info.get("yearBuilt", ""),
                "home_type": info.get("homeType", ""),
                "days_on_zillow": info.get("daysOnZillow", ""),
                "scraped_at": datetime.now().isoformat()
            })

    print(f"Saved {len(listings)} listings to {filename}")
Enter fullscreen mode Exit fullscreen mode

Legal Considerations

Zillow Terms of Service prohibit automated scraping. The robots.txt blocks most crawlers. Legally, the landscape is nuanced:

  • hiQ v. LinkedIn (2022): The Ninth Circuit ruled that scraping publicly accessible data does not violate the CFAA. This precedent generally applies to public listing pages.
  • But: The ToS is a contract you implicitly agree to by using the site. Violating it could expose you to civil (not criminal) liability.
  • Practical advice: Do not scrape at aggressive rates, do not bypass authentication, and do not redistribute raw data as-is at scale. Use the data for analysis, not to clone Zillow.

What Actually Works at Scale

For production use cases (monitoring hundreds of markets, tracking thousands of properties), building and maintaining your own Zillow scraper is expensive. The anti-bot systems change frequently, and one bad deployment can get your entire IP range blocked.

If you are scraping real estate data regularly, check out the scrapers on the Apify Store — they handle anti-bot, proxy rotation, and output formatting so you can focus on analysis rather than infrastructure.

Summary

Method Best For Difficulty Reliability
Hidden API Quick data pulls, search results Medium Breaks periodically
Playwright Full page data, price history High More resilient to API changes
Proxy service Bypassing blocks at scale Low Depends on provider
Pre-built scraper Production monitoring Low Maintained by developer

Start with the API approach for exploration, graduate to Playwright when you need detail, and use a proxy service or managed scraper when you need reliability at scale.

Top comments (0)