Swiftproxy - Residential Proxies

Posted on Jan 23

The Ultimate Guide to Scraping Zillow Data with Python

#scrapingzillow

Real estate market insights are invaluable, and Zillow, one of the largest real estate databases, is an excellent source of this data. Whether you’re analyzing market trends or exploring investment opportunities, scraping Zillow data using Python gives you an edge. To go from raw HTML to usable data, follow these steps.

Preparing What You’ll Need

Before we get started, ensure you have Python up and running on your machine. Then, grab the following libraries with just a couple of commands:

pip install requests
pip install lxml

Step 1: Get to Know Zillow’s HTML Structure

To scrape data from Zillow, understanding how the page is structured is key. Open up a property listing on Zillow and inspect the page. You're looking for key elements like:
Property title
Rent estimate price
Assessment price
By identifying these elements in the page’s HTML, you can tell your script exactly what to look for.

Step 2: Make Your First HTTP Request

Now that we know what we’re after, it's time to send a request to Zillow and grab the page. We'll use requests for this task. Here’s how you do it:

import requests

url = "https://www.zillow.com/homedetails/1234-Main-St-Some-City-CA-90210/12345678_zpid/"

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}

response = requests.get(url, headers=headers)
response.raise_for_status()  # Will raise an error if the request fails

Step 3: Parse the HTML with lxml

Once you have the page, you need to extract the data. To do this, we’ll use lxml, a powerful HTML/XML parser. We’ll turn the raw HTML into something Python can easily read and search through.

from lxml.html import fromstring

parser = fromstring(response.text)

Now, the HTML content is in the parser variable, and we can start pulling out specific details.

Step 4: Extract Key Data Points Using XPath

XPath is a powerful query language that lets you search the HTML like a pro. Here’s how you can grab the property title, rent estimate, and assessment price:

# Extracting the title
title = ' '.join(parser.xpath('//h1[@class="Text-c11n-8-99-3__sc-aiai24-0 dFxMdJ"]/text()'))

# Extracting the rent estimate price
rent_estimate_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-2]

# Extracting the assessment price
assessment_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-1]

You’ll use XPath expressions that match the HTML tags and classes associated with each piece of data.

Step 5: Save the Data to JSON

Once you’ve got your data, it’s time to store it. The easiest way is by saving it as a JSON file, which is perfect for further analysis or storage.

import json

property_data = {
    'title': title,
    'Rent estimate price': rent_estimate_price,
    'Assessment price': assessment_price
}

# Save data to a JSON file
with open('zillow_property_data.json', 'w') as f:
    json.dump(property_data, f, indent=4)

print("Data saved to zillow_property_data.json")

Step 6: Deal with Multiple Listings

Want to scrape data from multiple Zillow pages? It’s easy. You just need to loop through a list of URLs. Here's how:

urls = [
    "https://www.zillow.com/homedetails/1234-Main-St-Some-City-CA-90210/12345678_zpid/",
    "https://www.zillow.com/homedetails/5678-Another-St-Some-City-CA-90210/87654321_zpid/"
]

all_properties = []

for url in urls:
    response = requests.get(url, headers=headers)
    response.raise_for_status()

    parser = fromstring(response.text)

    # Extract data for each property
    title = ' '.join(parser.xpath('//h1[@class="Text-c11n-8-99-3__sc-aiai24-0 dFxMdJ"]/text()'))
    rent_estimate_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-2]
    assessment_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-1]

    property_data = {
        'title': title,
        'Rent estimate price': rent_estimate_price,
        'Assessment price': assessment_price
    }

    all_properties.append(property_data)

# Save all property data to a JSON file
with open('all_zillow_properties.json', 'w') as f:
    json.dump(all_properties, f, indent=4)

print("All property data saved to all_zillow_properties.json")

Full Script for Scraping Multiple Listings

Here's the complete script, from sending requests to saving data:

import requests
from lxml.html import fromstring
import json

# Define URLs to scrape
urls = [
    "https://www.zillow.com/homedetails/1234-Main-St-Some-City-CA-90210/12345678_zpid/",
    "https://www.zillow.com/homedetails/5678-Another-St-Some-City-CA-90210/87654321_zpid/"
]

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}

all_properties = []

for url in urls:
    response = requests.get(url, headers=headers)
    response.raise_for_status()

    parser = fromstring(response.text)

    # Extract data for each property
    title = ' '.join(parser.xpath('//h1[@class="Text-c11n-8-99-3__sc-aiai24-0 dFxMdJ"]/text()'))
    rent_estimate_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-2]
    assessment_price = parser.xpath('//span[@class="Text-c11n-8-99-3__sc-aiai24-0 dFhjAe"]//text()')[-1]

    property_data = {
        'title': title,
        'Rent estimate price': rent_estimate_price,
        'Assessment price': assessment_price
    }

    all_properties.append(property_data)

# Save the data
with open('all_zillow_properties.json', 'w') as f:
    json.dump(all_properties, f, indent=4)

print("All property data saved successfully!")

Final Thoughts

By following these steps, you can easily extract property data from Zillow to fuel your real estate analysis. Be sure to respect the website's terms of use, and if you’re scraping at scale, consider using proxies and rotating user agents to avoid being blocked.

DEV Community

The Ultimate Guide to Scraping Zillow Data with Python

Preparing What You’ll Need

Step 1: Get to Know Zillow’s HTML Structure

Step 2: Make Your First HTTP Request

Step 3: Parse the HTML with lxml

Step 4: Extract Key Data Points Using XPath

Step 5: Save the Data to JSON

Step 6: Deal with Multiple Listings

Full Script for Scraping Multiple Listings

Final Thoughts

Top comments (0)

Read next

How to Get a List of Hooks in a WordPress Plugin

AI in Web Development – Is It Replacing Developers or Making Us 10x Faster?

Dev.to is 90% misleading and dangerous AI generated content.

RandomAffine in PyTorch (2)