Domonique Luchin

Posted on Apr 6

How I Went From Spreadsheets to a 14-County Real Estate Scraper

#python #scraping #realestate #supabase

Six months ago my lead list was a Google Sheet with 300 rows I updated manually.

Now five edge functions scrape 14 Texas counties every night and score 125+ leads automatically.

Here is the full build.

What I Scrape

Four data sources per county:

Foreclosure filings — Properties where the owner is behind on their mortgage
Delinquent tax rolls — Properties with unpaid property taxes
Code violations — Properties with open city violations
311 complaints — Nuisance complaints filed by neighbors

Each one is a signal of seller motivation. Stack all four on one address and you have a highly distressed property whose owner may want to sell fast and cheap.

The Database

Four raw tables:

raw_foreclosures
raw_delinquent_taxes
raw_code_violations
raw_311_complaints

One normalized table:

properties  -- 54 columns, one row per address

The normalization function matches addresses across all four sources and merges signals. It calculates a distress_score from 0-100 based on how many signals overlap.

The Scraper Architecture

Five Supabase Edge Functions. Each one targets a specific county data source.

// lbc-hcad-scraper/index.ts
import { createClient } from "@supabase/supabase-js"

const sb = createClient(Deno.env.get("SUPABASE_URL")!, Deno.env.get("SUPABASE_SERVICE_KEY")!)

Deno.serve(async () => {
  const response = await fetch("https://county-data-api.com/delinquent")
  const records = await response.json()

  const { error } = await sb.from("raw_delinquent_taxes").upsert(
    records.map(r => ({
      account_number: r.acct,
      owner_name: r.owner,
      address: r.situs,
      balance: r.balance,
      county: "harris"
    }))
  )

  return new Response(JSON.stringify({ inserted: records.length }))
})

pg_cron runs each scraper nightly:

SELECT cron.schedule(
  'scrape-harris-delinquent',
  '0 2 * * *',
  $$SELECT net.http_post(url := 'https://your-project.supabase.co/functions/v1/lbc-hcad-scraper', headers := jsonb_build_object('Authorization', 'Bearer ' || current_setting('app.service_key')))$$
);

Scoring and Prioritization

After normalization, a scoring function runs:

UPDATE properties SET distress_score = (
  (CASE WHEN has_foreclosure_notice THEN 30 ELSE 0 END) +
  (CASE WHEN has_delinquent_taxes THEN 25 ELSE 0 END) +
  (CASE WHEN has_code_violations THEN 20 ELSE 0 END) +
  (CASE WHEN has_311_complaints THEN 10 ELSE 0 END) +
  (CASE WHEN est_equity < 20000 THEN 15 ELSE 0 END)
);

UPDATE properties SET distress_tier = CASE
  WHEN distress_score >= 60 THEN 'A'
  WHEN distress_score >= 40 THEN 'B'
  ELSE 'C'
END;

Tier A leads go into the dial queue immediately. Tier B gets a 24-hour delay. Tier C gets a weekly batch.

The Skip Trace Step

Raw records have addresses. They do not have phone numbers.

Skip tracing finds the owner phone from the name and address. I use a batch API that costs about $0.10 per record. Only run it on Tier A and B leads to control cost.

The result lands in properties.owner_phone, owner_phone_2, owner_phone_3.

Then the dial queue picks it up and the AI starts calling.

Results

14 counties. 125+ new scored leads per night. Zero manual work after the initial build.

The build took about 3 weekends. The scraper has been running for two months without touching it.

DEV Community