Six months ago my lead list was a Google Sheet with 300 rows I updated manually.
Now five edge functions scrape 14 Texas counties every night and score 125+ leads automatically.
Here is the full build.
What I Scrape
Four data sources per county:
- Foreclosure filings — Properties where the owner is behind on their mortgage
- Delinquent tax rolls — Properties with unpaid property taxes
- Code violations — Properties with open city violations
- 311 complaints — Nuisance complaints filed by neighbors
Each one is a signal of seller motivation. Stack all four on one address and you have a highly distressed property whose owner may want to sell fast and cheap.
The Database
Four raw tables:
raw_foreclosures
raw_delinquent_taxes
raw_code_violations
raw_311_complaints
One normalized table:
properties -- 54 columns, one row per address
The normalization function matches addresses across all four sources and merges signals. It calculates a distress_score from 0-100 based on how many signals overlap.
The Scraper Architecture
Five Supabase Edge Functions. Each one targets a specific county data source.
// lbc-hcad-scraper/index.ts
import { createClient } from "@supabase/supabase-js"
const sb = createClient(Deno.env.get("SUPABASE_URL")!, Deno.env.get("SUPABASE_SERVICE_KEY")!)
Deno.serve(async () => {
const response = await fetch("https://county-data-api.com/delinquent")
const records = await response.json()
const { error } = await sb.from("raw_delinquent_taxes").upsert(
records.map(r => ({
account_number: r.acct,
owner_name: r.owner,
address: r.situs,
balance: r.balance,
county: "harris"
}))
)
return new Response(JSON.stringify({ inserted: records.length }))
})
pg_cron runs each scraper nightly:
SELECT cron.schedule(
'scrape-harris-delinquent',
'0 2 * * *',
$$SELECT net.http_post(url := 'https://your-project.supabase.co/functions/v1/lbc-hcad-scraper', headers := jsonb_build_object('Authorization', 'Bearer ' || current_setting('app.service_key')))$$
);
Scoring and Prioritization
After normalization, a scoring function runs:
UPDATE properties SET distress_score = (
(CASE WHEN has_foreclosure_notice THEN 30 ELSE 0 END) +
(CASE WHEN has_delinquent_taxes THEN 25 ELSE 0 END) +
(CASE WHEN has_code_violations THEN 20 ELSE 0 END) +
(CASE WHEN has_311_complaints THEN 10 ELSE 0 END) +
(CASE WHEN est_equity < 20000 THEN 15 ELSE 0 END)
);
UPDATE properties SET distress_tier = CASE
WHEN distress_score >= 60 THEN 'A'
WHEN distress_score >= 40 THEN 'B'
ELSE 'C'
END;
Tier A leads go into the dial queue immediately. Tier B gets a 24-hour delay. Tier C gets a weekly batch.
The Skip Trace Step
Raw records have addresses. They do not have phone numbers.
Skip tracing finds the owner phone from the name and address. I use a batch API that costs about $0.10 per record. Only run it on Tier A and B leads to control cost.
The result lands in properties.owner_phone, owner_phone_2, owner_phone_3.
Then the dial queue picks it up and the AI starts calling.
Results
14 counties. 125+ new scored leads per night. Zero manual work after the initial build.
The build took about 3 weekends. The scraper has been running for two months without touching it.
Top comments (0)