DEV Community

Brad
Brad

Posted on • Edited on

How I Scraped 200+ Startup Contacts from Hacker News (and the Scripts I Built)

I've been working as a freelance Python engineer, and I spent the last week doing something obsessive: I scraped every HN "Who's Hiring" thread from the last 12 months and cold-emailed over 200 startups.

Did it work? Mixed results. But I built a pretty solid system in the process — and I want to share it.

The Problem

Every month, Hacker News publishes a "Who's Hiring" thread. These are gold for freelancers. Early-stage startups post their openings directly, often with founder emails, and many explicitly say "remote OK" or "contract considered."

The problem: manually reading through 300-400 comments per thread to find the remote/contract-friendly ones takes hours. And doing this across 10 months of threads would take days.

So I automated it.

What I Built

Three Python scripts:

1. hn_scraper.py — The Lead Extractor

# Scores each HN comment on remote + contract signals
REMOTE_KEYWORDS = ["remote", "distributed", "anywhere", "eu remote", "utc+"]
CONTRACT_KEYWORDS = ["contract", "freelance", "1099", "part-time", "fractional"]

def score_comment(text):
    remote_score = sum(1 for kw in REMOTE_KEYWORDS if kw in text.lower())
    contract_score = sum(1 for kw in CONTRACT_KEYWORDS if kw in text.lower())
    emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
    return {
        "score": (remote_score * 3) + (contract_score * 4) + (len(emails) * 2),
        "remote": remote_score > 0,
        "contract": contract_score > 0,
        "emails": emails[:3]
    }
Enter fullscreen mode Exit fullscreen mode

Usage: python hn_scraper.py --thread-id 44159528 --output leads.json

Typical results from a monthly thread: 300-400 total comments → 30-60 with score ≥3 → 10-25 with direct email addresses.

2. email_writer.py — The Draft Generator

Takes a lead from the scraper and generates a personalized email draft. Detects whether the company needs data pipelines or general automation work, pulls a specific signal from their post, and fills a template.

python email_writer.py --lead-file leads.json --batch --output drafts.json
Enter fullscreen mode Exit fullscreen mode

3. lead_tracker.py — The Pipeline Tracker

Simple CLI kanban: add → emailed → replied → closed. Track reply rates across campaigns.

python lead_tracker.py add --company "Acme AI" --email "cto@acme.ai" --source "HN Jun 2026"
python lead_tracker.py stats
Enter fullscreen mode Exit fullscreen mode

What the Numbers Look Like

From ~211 emails across 12 monthly threads:

  • Reply rate: ~1.5% (3 genuine warm replies)
  • Conversion to paid: 0% (so far — leads still in pipeline)
  • Time to scrape + filter 1 thread: ~15 minutes
  • Time to scrape + filter 10 threads: ~2 hours

Honest assessment: 1.5% is low. The system works for finding leads — the challenge is converting them. Remote-first EU companies reply more. Companies explicitly saying "contract OK" convert best.

What I'd Do Differently

  1. Include portfolio links in the first email. I sent several hundred emails without them, then added GitHub links — reply rate improved noticeably.
  2. Older threads are actually better. Companies that posted 3-6 months ago and still haven't found someone are more likely to respond quickly.
  3. Quality over volume. Sending 20 genuinely personalized emails beats sending 200 generic ones.

Get the Scripts

I packaged the three scripts + the full workflow guide into a download: HN Startup Hunter → (€19).

Or if you want to build your own version, the core logic above is enough to get started. The HN API is free and rate-limits are generous for casual use.


What's your experience with HN cold outreach? I'm curious whether others have found better conversion tactics for this channel.


🔧 **Found this useful?* I build custom HN lead reports (20–50 companies with verified emails, tech stacks, 24h delivery) → Order done-for-you lead report — $75 | Got a workflow to automate? → 1-Hour Python Automation Audit — $39*

Top comments (0)