I Built a Pay-Per-Result Hacker News Scraper on Apify

#opensource #webdev #tutorial #productivity

The Problem

I needed structured Hacker News data for a side project — trending stories, scores, comment counts. The HN API exists but requires pagination, filtering, and batch fetching logic.

So I built an Apify Actor that handles all of this and published it for free.

What It Does

HN Top Stories Scraper lets you:

Scrape top, new, best, ask, and show stories
Filter by minimum score, comment count, or keyword
Get up to 500 stories per run
Output as JSON, CSV, or connect to Google Sheets, Slack, Zapier

It uses the official HN Firebase API — no scraping, no proxies needed.

Example

Get the top 50 AI stories with 100+ upvotes:

{
  "count": 50,
  "type": "top",
  "minScore": 100,
  "keyword": "AI"
}

Returns:

{
  "id": 12345678,
  "title": "Show HN: AI tool that does X",
  "url": "https://example.com",
  "score": 342,
  "comments": 89,
  "author": "username",
  "hn_url": "https://news.ycombinator.com/item?id=12345678"
}

Use Cases

RSS replacement: Schedule runs to get stories as structured data
Competitor monitoring: Filter by your company name
Content curation: Feed into newsletters or Slack
Trend analysis: Track what gets high scores over time
Job monitoring: Scrape Who is Hiring threads

Pricing

Pay-per-result: ~$0.01 per 1,000 stories. Free tier available — no credit card needed.

Compare that to the $5-19/month flat-rate competitors charge.

Try It

https://apify.com/cryptosignals/hn-top-stories

Feedback welcome — this is my first published Actor.

Recommended Tools for Web Scraping

If you're building scrapers at scale, these tools can save you hours of dealing with proxies, CAPTCHAs, and rate limits:

ScraperAPI — Handles proxy rotation, browser rendering, and CAPTCHAs automatically. Great if you don't want to manage your own proxy infrastructure. Comes with 5,000 free API credits to get started.
ScrapeOps — A proxy aggregator that routes your requests through 20+ proxy providers and picks the best one for each target site. Useful when you need reliability across different domains.

DEV Community