DEV Community

The Ultimate Guide to Scraping Zillow Data with Python

Real estate market insights are invaluable, and Zillow, one of the largest real estate databases, is an excellent source of this data. Whether you’re analyzing market trends or exploring investment opportunities, scraping Zillow data using Python gives you an edge. To go from raw HTML to usable data, follow these steps.

Preparing What You’ll Need

Before we get started, ensure you have Python up and running on your machine. Then, grab the following libraries with just a couple of commands:

pip install requests
pip install lxml
Enter fullscreen mode Exit fullscreen mode

Step 1: Get to Know Zillow’s HTML Structure

To scrape data from Zillow, understanding how the page is structured is key. Open up a property listing on Zillow and inspect the page. You're looking for key elements like:
Property title
Rent estimate price
Assessment price
By identifying these elements in the page’s HTML, you can tell your script exactly what to look for.

Step 2: Make Your First HTTP Request

Now that we know what we’re after, it's time to send a request to Zillow and grab the page. We'll use requests for this task. Here’s how you do it:

import requests

url = "https://www.zillow.com/homedetails/1234-Main-St-Some-City-CA-90210/12345678_zpid/"

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}

response = requests.get(url, headers=headers)
response.raise_for_status()  # Will raise an error if the request fails
Enter fullscreen mode Exit fullscreen mode

Step 3: Parse the HTML with lxml

Once you have the page, you need to extract the data. To do this, we’ll use lxml, a powerful HTML/XML parser. We’ll turn the raw HTML into something Python can easily read and search through.

from lxml.html import fromstring

parser = fromstring(response.text)
Enter fullscreen mode Exit fullscreen mode

Now, the HTML content is in the parser variable, and we can start pulling out specific details.

Step 4: Extract Key Data Points Using XPath

XPath is a powerful query language that lets you search the HTML like a pro. Here’s how you can grab the property title, rent estimate, and assessment price:

# Extracting the title
title = ' '.join(parser.xpath('//h1[@class="Text-c11n-8-99-3__sc-aiai24-0 dFxMdJ"]/text()'))

# Extracting the rent estimate price
rent_estimate_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-2]

# Extracting the assessment price
assessment_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-1]
Enter fullscreen mode Exit fullscreen mode

You’ll use XPath expressions that match the HTML tags and classes associated with each piece of data.

Step 5: Save the Data to JSON

Once you’ve got your data, it’s time to store it. The easiest way is by saving it as a JSON file, which is perfect for further analysis or storage.

import json

property_data = {
    'title': title,
    'Rent estimate price': rent_estimate_price,
    'Assessment price': assessment_price
}

# Save data to a JSON file
with open('zillow_property_data.json', 'w') as f:
    json.dump(property_data, f, indent=4)

print("Data saved to zillow_property_data.json")
Enter fullscreen mode Exit fullscreen mode

Step 6: Deal with Multiple Listings

Want to scrape data from multiple Zillow pages? It’s easy. You just need to loop through a list of URLs. Here's how:

urls = [
    "https://www.zillow.com/homedetails/1234-Main-St-Some-City-CA-90210/12345678_zpid/",
    "https://www.zillow.com/homedetails/5678-Another-St-Some-City-CA-90210/87654321_zpid/"
]

all_properties = []

for url in urls:
    response = requests.get(url, headers=headers)
    response.raise_for_status()

    parser = fromstring(response.text)

    # Extract data for each property
    title = ' '.join(parser.xpath('//h1[@class="Text-c11n-8-99-3__sc-aiai24-0 dFxMdJ"]/text()'))
    rent_estimate_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-2]
    assessment_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-1]

    property_data = {
        'title': title,
        'Rent estimate price': rent_estimate_price,
        'Assessment price': assessment_price
    }

    all_properties.append(property_data)

# Save all property data to a JSON file
with open('all_zillow_properties.json', 'w') as f:
    json.dump(all_properties, f, indent=4)

print("All property data saved to all_zillow_properties.json")
Enter fullscreen mode Exit fullscreen mode

Full Script for Scraping Multiple Listings

Here's the complete script, from sending requests to saving data:

import requests
from lxml.html import fromstring
import json

# Define URLs to scrape
urls = [
    "https://www.zillow.com/homedetails/1234-Main-St-Some-City-CA-90210/12345678_zpid/",
    "https://www.zillow.com/homedetails/5678-Another-St-Some-City-CA-90210/87654321_zpid/"
]

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}

all_properties = []

for url in urls:
    response = requests.get(url, headers=headers)
    response.raise_for_status()

    parser = fromstring(response.text)

    # Extract data for each property
    title = ' '.join(parser.xpath('//h1[@class="Text-c11n-8-99-3__sc-aiai24-0 dFxMdJ"]/text()'))
    rent_estimate_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-2]
    assessment_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-1]

    property_data = {
        'title': title,
        'Rent estimate price': rent_estimate_price,
        'Assessment price': assessment_price
    }

    all_properties.append(property_data)

# Save the data
with open('all_zillow_properties.json', 'w') as f:
    json.dump(all_properties, f, indent=4)

print("All property data saved successfully!")
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

By following these steps, you can easily extract property data from Zillow to fuel your real estate analysis. Be sure to respect the website's terms of use, and if you’re scraping at scale, consider using proxies and rotating user agents to avoid being blocked.

Image of AssemblyAI tool

Transforming Interviews into Publishable Stories with AssemblyAI

Insightview is a modern web application that streamlines the interview workflow for journalists. By leveraging AssemblyAI's LeMUR and Universal-2 technology, it transforms raw interview recordings into structured, actionable content, dramatically reducing the time from recording to publication.

Key Features:
🎥 Audio/video file upload with real-time preview
🗣️ Advanced transcription with speaker identification
⭐ Automatic highlight extraction of key moments
✍️ AI-powered article draft generation
📤 Export interview's subtitles in VTT format

Read full post

Top comments (0)

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay