Real estate data drives billions of dollars in decisions every year. From individual homebuyers comparing neighborhoods to hedge funds modeling housing market trends, access to accurate property data is a competitive advantage. Redfin, with its comprehensive MLS-sourced listings and proprietary market analytics, is one of the richest sources of real estate data on the web.
In this guide, we'll explore Redfin's data architecture, the types of information you can extract, and practical techniques for building reliable scrapers. Whether you're a real estate investor tracking price trends, a proptech startup building data products, or a researcher studying housing markets, this article covers the full technical landscape.
Understanding Redfin's Data Architecture
Redfin stands out from other real estate platforms because it operates as an actual brokerage. This means its data comes directly from MLS (Multiple Listing Service) feeds, making it more accurate and timely than aggregator sites.
URL Structure
Redfin uses a clean, predictable URL structure:
-
City search:
redfin.com/city/30749/WA/Seattle -
Zip code search:
redfin.com/zipcode/98101 -
Individual listing:
redfin.com/WA/Seattle/123-Main-St-98101/home/12345678 -
Neighborhood:
redfin.com/neighborhood/529/WA/Seattle/Capitol-Hill
The numeric IDs at the end of listing URLs (/home/12345678) are Redfin's internal property IDs, which remain stable even if the address format changes.
Data Layers
Redfin organizes data in several interconnected layers:
- Search/Listing Layer: Property cards with summary data (price, beds, baths, sqft)
- Property Detail Layer: Full listing information, photos, description
- History Layer: Price changes, listing history, tax records
- Market Layer: Aggregated statistics for areas (median price, days on market, etc.)
- Agent Layer: Listing agent information, brokerage details
Redfin's Stingray API
Redfin uses an internal API (often called "Stingray") that powers its frontend. Many data requests go through endpoints like:
https://www.redfin.com/stingray/api/gis?al=1®ion_id=16163®ion_type=6
https://www.redfin.com/stingray/api/home/details/belowTheFold?propertyId=12345678
These API endpoints return JSON data (often wrapped in a comment prefix {}&&{...} that needs to be stripped). Understanding these endpoints is key to efficient scraping.
What Data Can You Extract?
Property Listing Data
| Data Point | Source | Notes |
|---|---|---|
| Address | Listing header | Full street address with unit |
| List price | Price section | Current asking price |
| Beds/Baths | Property stats | Bedroom and bathroom count |
| Square footage | Property stats | Living area in sqft |
| Lot size | Property details | Land area |
| Year built | Property details | Construction year |
| Property type | Listing type | Single family, condo, townhouse, etc. |
| MLS number | Listing details | Unique MLS identifier |
| Days on market | Listing stats | Time since listing went active |
| HOA dues | Fee section | Monthly HOA if applicable |
| Parking | Property details | Garage type and spaces |
| Status | Listing badge | Active, pending, sold, etc. |
Price History Data
Each property has a price history tab containing:
- Listing events: Date listed, price changes, taken off market, relisted
- Sale records: Past sale dates and prices
- Tax assessment history: Annual assessed values
- Price per square foot over time
This data is invaluable for investment analysis and market modeling.
Agent and Brokerage Data
Each listing includes:
- Listing agent name and contact information
- Buyer's agent (for sold properties)
- Brokerage name
- Agent's active listings count
- Agent's past sales
Market Statistics
Redfin publishes rich market data at various geographic levels:
- Median sale price and year-over-year change
- Median days on market
- Number of homes sold
- Sale-to-list price ratio
- Inventory levels
- Price drops percentage
- Competition score (Redfin's proprietary metric)
Building a Redfin Scraper with Node.js
Let's build a comprehensive Redfin scraper using Crawlee.
Project Setup and Configuration
const { CheerioCrawler, Dataset, log } = require('crawlee');
const BASE_URL = 'https://www.redfin.com';
// Redfin-specific headers to mimic browser requests
const CUSTOM_HEADERS = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
+ 'AppleWebKit/537.36 (KHTML, like Gecko) '
+ 'Chrome/120.0.0.0 Safari/537.36',
'Accept':
'text/html,application/xhtml+xml,'
+ 'application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.redfin.com/',
};
const crawler = new CheerioCrawler({
maxConcurrency: 1, // Redfin is strict about rate limiting
maxRequestRetries: 3,
requestHandlerTimeoutSecs: 90,
additionalMimeTypes: ['application/json'],
preNavigationHooks: [
(crawlingContext) => {
crawlingContext.request.headers = {
...CUSTOM_HEADERS,
};
},
],
async requestHandler({ request, $, body, log }) {
const { label } = request.userData;
switch (label) {
case 'SEARCH':
await handleSearchPage($, request, log);
break;
case 'LISTING':
await handleListingPage($, body, request, log);
break;
case 'API':
await handleApiResponse(body, request, log);
break;
default:
log.warning(`Unknown label: ${label}`);
}
},
});
Extracting Search Results
async function handleSearchPage($, request, log) {
const listings = [];
// Redfin renders property cards in the search results
$('div.HomeCardContainer').each((i, el) => {
const card = $(el);
const listing = {
address: card.find('div.homeAddressV2')
.text().trim(),
price: card.find('span.homecardV2Price')
.text().trim()
.replace(/[^0-9]/g, ''),
beds: card.find('div.HomeStatsV2 .beds')
.text().trim(),
baths: card.find('div.HomeStatsV2 .baths')
.text().trim(),
sqft: card.find('div.HomeStatsV2 .sqft')
.text().trim()
.replace(/[^0-9]/g, ''),
url: BASE_URL + card.find('a.link-and-anchor')
.attr('href'),
status: card.find('span.listingType')
.text().trim(),
};
if (listing.address) {
listings.push(listing);
}
});
log.info(
`Found ${listings.length} listings on search page`
);
// Enqueue individual listing pages
for (const listing of listings) {
await crawler.addRequests([{
url: listing.url,
userData: { label: 'LISTING', searchData: listing },
}]);
}
// Handle pagination
const nextButton = $('button.PageArrow[data-rf-test-id="react-data-paginate-next"]');
if (nextButton.length) {
const currentPage = parseInt(
$('span.pageText').text().match(/\d+/)?.[0] || '1'
);
const nextUrl = request.url.includes('/page-')
? request.url.replace(
/\/page-\d+/,
`/page-${currentPage + 1}`
)
: `${request.url}/page-2`;
await crawler.addRequests([{
url: nextUrl,
userData: { label: 'SEARCH' },
}]);
}
}
Extracting Detailed Property Information
async function handleListingPage($, body, request, log) {
const property = {
url: request.url,
scrapedAt: new Date().toISOString(),
};
// Basic property information
property.address = $('h1[data-rf-test-id="abp-homeinfo-homeaddress"]')
.text().trim();
property.price = $('div[data-rf-test-id="abp-price"] span')
.text().trim().replace(/[^0-9]/g, '');
property.status = $('span[data-rf-test-id="abp-status"]')
.text().trim();
// Property stats (beds, baths, sqft)
property.beds = $('div[data-rf-test-id="abp-beds"] .statsValue')
.text().trim();
property.baths = $('div[data-rf-test-id="abp-baths"] .statsValue')
.text().trim();
property.sqft = $('div[data-rf-test-id="abp-sqFt"] .statsValue')
.text().trim().replace(/[^0-9]/g, '');
// Description
property.description = $('div[data-rf-test-id="listing-remarks"]')
.text().trim();
// Property details from the key details section
property.details = {};
$('div.keyDetail').each((i, el) => {
const label = $(el).find('span.header').text().trim();
const value = $(el).find('span.content').text().trim();
if (label && value) {
property.details[label] = value;
}
});
// Extract price history
property.priceHistory = extractPriceHistory($);
// Extract agent information
property.listingAgent = {
name: $('div.agent-basic-details span.agent-name')
.text().trim(),
brokerage: $('div.agent-basic-details span.office-name')
.text().trim(),
phone: $('div.agent-basic-details a[href^="tel:"]')
.text().trim(),
};
// School information
property.schools = [];
$('div.school-card').each((i, el) => {
property.schools.push({
name: $(el).find('span.school-name').text().trim(),
rating: $(el).find('span.school-rating')
.text().trim(),
distance: $(el).find('span.school-distance')
.text().trim(),
type: $(el).find('span.school-type').text().trim(),
});
});
log.info(`Extracted details for: ${property.address}`);
await Dataset.pushData(property);
}
function extractPriceHistory($) {
const history = [];
$('table.property-history-table tbody tr').each((i, el) => {
const cells = $(el).find('td');
if (cells.length >= 4) {
history.push({
date: $(cells[0]).text().trim(),
event: $(cells[1]).text().trim(),
price: $(cells[2]).text().trim()
.replace(/[^0-9]/g, ''),
pricePerSqft: $(cells[3]).text().trim(),
});
}
});
return history;
}
Python Approach for Redfin Scraping
Here's a Python implementation focused on Redfin's internal API:
import requests
import json
import time
import re
from urllib.parse import quote
class RedfinScraper:
BASE_URL = "https://www.redfin.com"
STINGRAY_URL = f"{BASE_URL}/stingray/api"
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
),
"Accept": "application/json",
"Referer": "https://www.redfin.com/",
}
def __init__(self):
self.session = requests.Session()
self.session.headers.update(self.HEADERS)
def _parse_stingray_response(self, text):
"""Strip Redfin's JSON comment prefix."""
cleaned = re.sub(r'^[^{]*&&', '', text)
return json.loads(cleaned)
def search_properties(self, query, num_homes=20):
# Step 1: Resolve location via autocomplete
auto_url = (
f"{self.STINGRAY_URL}/v1/search/"
f"typeahead?input={quote(query)}"
f"&num_homes={num_homes}"
)
resp = self.session.get(auto_url)
location_data = self._parse_stingray_response(
resp.text
)
if not location_data.get("payload", {}).get(
"sections"
):
print("No location found for query.")
return []
first_result = (
location_data["payload"]["sections"][0]
["rows"][0]
)
region_id = first_result.get("id")
region_type = first_result.get("type")
print(
f"Found region: {first_result.get('name')} "
f"(ID: {region_id}, Type: {region_type})"
)
time.sleep(2)
# Step 2: Fetch listings via GIS API
gis_url = (
f"{self.STINGRAY_URL}/gis?"
f"al=1®ion_id={region_id}"
f"®ion_type={region_type}"
f"&num_homes={num_homes}"
)
resp = self.session.get(gis_url)
gis_data = self._parse_stingray_response(resp.text)
properties = []
homes = (
gis_data.get("payload", {})
.get("homes", [])
)
for home in homes:
prop = {
"property_id": home.get("propertyId"),
"listing_id": home.get("listingId"),
"address": (
home.get("streetLine", {})
.get("value", "")
),
"city": home.get("city"),
"state": home.get("state"),
"zip": home.get("zip"),
"price": home.get("price", {}).get(
"value"
),
"beds": home.get("beds"),
"baths": home.get("baths"),
"sqft": home.get("sqFt", {}).get("value"),
"lot_size": home.get("lotSize", {}).get(
"value"
),
"year_built": home.get("yearBuilt", {})
.get("value"),
"property_type": home.get(
"propertyType"
),
"listing_status": home.get("status"),
"days_on_market": home.get("dom", {})
.get("value"),
"price_per_sqft": home.get(
"pricePerSqFt", {}
).get("value"),
"hoa_dues": home.get("hoa", {}).get(
"value"
),
"url": (
f"{self.BASE_URL}"
f"{home.get('url', '')}"
),
}
properties.append(prop)
return properties
def get_property_details(self, property_id):
url = (
f"{self.STINGRAY_URL}/home/details/"
f"belowTheFold?propertyId={property_id}"
f"&accessLevel=1"
)
resp = self.session.get(url)
data = self._parse_stingray_response(resp.text)
return data.get("payload", {})
def get_price_history(self, property_id):
details = self.get_property_details(property_id)
history = (
details.get("propertyHistoryInfo", {})
.get("events", [])
)
return [
{
"date": event.get("eventDate"),
"event_type": event.get(
"eventDescription"
),
"price": event.get("price"),
"price_per_sqft": event.get(
"pricePerSqFt"
),
"source": event.get("source"),
}
for event in history
]
def get_market_stats(self, region_id, region_type=6):
url = (
f"{self.STINGRAY_URL}/market-tracker/"
f"overview?regionId={region_id}"
f"®ionType={region_type}"
)
resp = self.session.get(url)
data = self._parse_stingray_response(resp.text)
payload = data.get("payload", {})
return {
"median_sale_price": payload.get(
"medianSalePrice"
),
"median_dom": payload.get("medianDom"),
"homes_sold": payload.get("homesSold"),
"inventory": payload.get("inventory"),
"sale_to_list_ratio": payload.get(
"saleToListRatio"
),
"price_drops_pct": payload.get(
"priceDropsPct"
),
"yoy_change": payload.get(
"medianSalePriceYoyChange"
),
}
# Usage example
scraper = RedfinScraper()
# Search for properties in Seattle
properties = scraper.search_properties(
"Seattle, WA", num_homes=10
)
print(f"Found {len(properties)} properties")
for prop in properties[:3]:
print(
f"\n{prop['address']}, {prop['city']}: "
f"${prop['price']:,}"
)
# Get price history for each property
time.sleep(3) # Respectful delay
history = scraper.get_price_history(
prop['property_id']
)
for event in history[:5]:
print(
f" {event['date']}: "
f"{event['event_type']} - "
f"${event.get('price', 'N/A')}"
)
Handling Redfin's Anti-Scraping Defenses
Redfin has some of the more sophisticated anti-scraping measures among real estate sites.
1. Request Fingerprinting
Redfin tracks browser fingerprints. Your requests need consistent headers:
// Maintain session consistency
const sessionHeaders = {
'Cookie': 'RF_BROWSER_ID=abc123; RF_BID_UPDATED=1;',
'X-Requested-With': 'XMLHttpRequest',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
};
2. Rate Limiting and CAPTCHAs
Redfin aggressively rate-limits automated requests:
import time
import random
class RateLimiter:
def __init__(self, base_delay=3.0):
self.base_delay = base_delay
self.consecutive_errors = 0
def wait(self):
delay = self.base_delay * (
2 ** self.consecutive_errors
)
jitter = random.uniform(0.5, 1.5)
actual_delay = delay * jitter
time.sleep(min(actual_delay, 60))
def success(self):
self.consecutive_errors = max(
0, self.consecutive_errors - 1
)
def failure(self):
self.consecutive_errors += 1
3. JavaScript-Rendered Content
Some property details only appear after JavaScript execution:
const { PlaywrightCrawler } = require('crawlee');
const crawler = new PlaywrightCrawler({
headless: true,
maxConcurrency: 1,
async requestHandler({ page, request, log }) {
// Wait for the main content to load
await page.waitForSelector(
'div[data-rf-test-id="abp-price"]',
{ timeout: 20000 }
);
// Scroll to trigger lazy-loaded sections
await page.evaluate(async () => {
const delay = ms =>
new Promise(r => setTimeout(r, ms));
for (let i = 0; i < 5; i++) {
window.scrollBy(0, 800);
await delay(1000);
}
});
// Wait for price history table
await page.waitForSelector(
'table.property-history-table',
{ timeout: 10000 }
).catch(() => {
log.warning('Price history table not found');
});
// Now extract the fully rendered content
const data = await page.evaluate(() => {
// ... extraction logic
});
log.info(
`Extracted: ${data.address} - $${data.price}`
);
},
});
Using Apify for Redfin Scraping
For production-grade Redfin scraping, Apify handles the infrastructure complexity:
Running a Redfin Actor on Apify
const Apify = require('apify');
const run = await Apify.call('redfin/property-scraper', {
searchUrls: [
{
url: 'https://www.redfin.com/city/30749/WA/Seattle'
},
{
url: 'https://www.redfin.com/city/11203/CA/Los-Angeles'
},
],
maxItems: 200,
includeDetails: true,
includePriceHistory: true,
includeSchools: true,
proxy: {
useApifyProxy: true,
apifyProxyGroups: ['RESIDENTIAL'],
},
});
const dataset = await Apify.openDataset(
run.defaultDatasetId
);
const { items } = await dataset.getData();
console.log(`Scraped ${items.length} properties`);
// Filter for investment opportunities
const deals = items.filter(item => {
const pricePerSqft = parseInt(item.price)
/ parseInt(item.sqft);
const avgForArea = 450; // Example area average
return pricePerSqft < avgForArea * 0.85;
});
console.log(
`Found ${deals.length} potential below-market deals`
);
Automated Market Monitoring
Set up regular scraping to track market trends:
// Apify scheduled task for weekly market monitoring
const task = {
actorId: 'redfin/market-tracker',
name: 'seattle-market-weekly',
options: {
build: 'latest',
memoryMbytes: 4096,
timeoutSecs: 3600,
},
input: {
regions: [
{ name: 'Seattle', regionId: 16163 },
{ name: 'Bellevue', regionId: 1528 },
{ name: 'Redmond', regionId: 14470 },
],
metrics: [
'medianSalePrice',
'medianDom',
'inventory',
'saleToListRatio',
],
outputFormat: 'csv',
},
scheduleExpression: '0 8 * * 1', // Mondays 8 AM
};
Practical Use Cases for Redfin Data
1. Investment Property Analysis
import pandas as pd
import numpy as np
def analyze_investment_potential(properties):
df = pd.DataFrame(properties)
# Clean numeric columns
for col in ['price', 'sqft', 'year_built']:
df[col] = pd.to_numeric(df[col], errors='coerce')
# Calculate metrics
df['price_per_sqft'] = df['price'] / df['sqft']
df['age'] = 2026 - df['year_built']
# Score: lower price/sqft and lower DOM = better deal
df['price_score'] = 1 - (
df['price_per_sqft'].rank(pct=True)
)
df['dom_score'] = df['days_on_market'].rank(pct=True)
# High DOM + low price = motivated seller
df['deal_score'] = (
df['price_score'] * 0.6
+ df['dom_score'] * 0.4
)
return df.nlargest(10, 'deal_score')[[
'address', 'price', 'sqft',
'price_per_sqft', 'days_on_market',
'deal_score',
]]
2. Neighborhood Comparison Dashboard
def compare_neighborhoods(market_stats):
comparison = pd.DataFrame(market_stats)
# Calculate relative value
avg_price = comparison['median_sale_price'].mean()
comparison['price_index'] = (
comparison['median_sale_price'] / avg_price * 100
).round(1)
# Buyer's vs seller's market indicator
comparison['market_type'] = comparison[
'sale_to_list_ratio'
].apply(
lambda x: "Seller's" if x > 1.0 else "Buyer's"
)
return comparison.sort_values('price_index')
3. Price Trend Forecasting Data Prep
def prepare_trend_data(price_histories):
all_events = []
for prop_id, history in price_histories.items():
for event in history:
if event['event_type'] == 'Sold':
all_events.append({
'property_id': prop_id,
'date': pd.to_datetime(
event['date']
),
'price': event['price'],
'price_per_sqft':
event['price_per_sqft'],
})
df = pd.DataFrame(all_events)
df = df.set_index('date').sort_index()
# Monthly median price trends
monthly = df.resample('M')['price'].agg([
'median', 'count', 'std'
])
monthly['yoy_change'] = monthly['median'].pct_change(
periods=12
)
return monthly
Data Export and Integration
Exporting to Common Formats
const { Dataset } = require('crawlee');
// After scraping is complete
const dataset = await Dataset.open('redfin-properties');
// Export to CSV for spreadsheet analysis
const csvData = await dataset.exportToCSV();
// Export to JSON for API consumption
const jsonData = await dataset.exportToJSON();
// Direct integration with Google Sheets
const { google } = require('googleapis');
async function exportToSheets(data, spreadsheetId) {
const sheets = google.sheets({ version: 'v4' });
const rows = data.map(item => [
item.address,
item.price,
item.beds,
item.baths,
item.sqft,
item.pricePerSqft,
item.daysOnMarket,
item.url,
]);
await sheets.spreadsheets.values.append({
spreadsheetId,
range: 'Properties!A:H',
valueInputOption: 'USER_ENTERED',
resource: { values: rows },
});
}
Legal and Ethical Considerations
Real estate data scraping has specific legal nuances:
- MLS data is copyrighted by local MLS organizations — be aware of licensing terms
- Redfin's Terms of Service prohibit automated access — assess your risk tolerance
- Fair Housing Act implications: ensure scraped data isn't used for discriminatory purposes
- Personal data (agent info) may be subject to privacy regulations
- Rate limiting is not just ethical — aggressive scraping can impact the platform for other users
- Commercial use of scraped data may have additional legal requirements
Always consult with a legal professional before using scraped real estate data commercially.
Conclusion
Redfin scraping opens powerful possibilities for real estate analysis, investment research, and market intelligence. The platform's rich data — from granular property details and price histories to aggregated market statistics — provides the foundation for sophisticated real estate analytics.
The key challenges are Redfin's anti-scraping defenses and the need for consistent, reliable data collection. Whether you build a custom scraper tailored to your specific needs or leverage Apify's managed infrastructure, the techniques in this guide give you the technical foundation to extract real estate data effectively.
Start with a small geographic area, validate your data against what you see on the site, and scale gradually. Real estate data has real commercial value — the investment in building a solid scraping pipeline pays dividends through better-informed property decisions and market insights.
Remember: the goal isn't just to collect data, but to transform it into actionable intelligence. Combine scraped Redfin data with other sources (census data, economic indicators, permit records) to build a comprehensive view of any real estate market.
Top comments (0)