Hacker News is arguably the most influential tech community on the internet. With 10+ million monthly visitors, it's where startup founders, VCs, and senior engineers surface the links that shape the industry. If you're doing market research, competitive intelligence, or trend analysis — scraping HN data is a smart move.
But how should you do it? In this guide, I'll compare the main approaches: the official Algolia API, dedicated Apify actors, and when each makes sense. I built one of these actors myself, so I'll be upfront about that — full disclosure throughout.
Option 1: The Algolia HN Search API (Free, No Auth)
Before reaching for any scraping tool, know this: Hacker News has a free, public search API powered by Algolia. No API key required.
Searching stories
import requests
# Search for stories mentioning "LLM"
resp = requests.get("https://hn.algolia.com/api/v1/search", params={
"query": "LLM",
"tags": "story",
"hitsPerPage": 10,
"numericFilters": "created_at_i>1704067200" # After Jan 2024
})
for hit in resp.json()["hits"]:
print(f'{hit["points"]} pts | {hit["title"]}')
print(f' {hit.get("url", "N/A")}')
Fetching comments on a story
# Get all comments on a specific story
resp = requests.get("https://hn.algolia.com/api/v1/items/38877423")
story = resp.json()
def walk_comments(children, depth=0):
for c in children:
if c.get("text"):
print(f'{" " * depth}{c["author"]}: {c["text"][:80]}...')
walk_comments(c.get("children", []), depth + 1)
walk_comments(story.get("children", []))
The Algolia API covers full-text search, date filtering via unix timestamps, and individual item lookup. Rate limits are generous (10,000 requests/hour). For many use cases, this is all you need.
When Algolia falls short: no bulk export, no scheduled monitoring, no domain extraction from story URLs, limited to 1,000 results per query, and no way to track front page rankings over time.
Option 2: Apify Actors — The Scraper Marketplace
When you need more than what the API offers — bulk data, scheduled runs, webhooks, or enriched output — Apify actors fill the gap. Here's the HN scraper landscape as of March 2026:
| Actor | Users | 30-Day Runs | Pricing | Standout Feature |
|---|---|---|---|---|
| epctex/hackernews-scraper | 160 | 474 | $10/mo | Most established, 5★ (8 reviews) |
| lucen_data/hacker-news-data-scraper | 59 | 780 | $0.001/result | Activity monitoring + Slack alerts |
| red.cars/hackernews-scraper-pro | 18 | 30 | $19/mo | No proxy required |
| fearless_sharpener/hacker-news-top-sites-scraper | 17 | 29 | $5/mo | Top sites focused |
| shahidirfan/hacker-news-data-scraper | 16 | 80 | Free | Basic scraping, zero cost |
| cryptosignals/hackernews-scraper | 5 | — | Free | Date filtering, sort modes, domain extraction |
The market leader, epctex, has been live since 2022 and has the most users and reviews by far. Lucen Data stands out for activity monitoring and Slack integration — useful if you want alerts when a specific topic starts trending. shahidirfan offers a solid free option for basic needs.
Spotlight: cryptosignals/hackernews-scraper
Full disclosure: I built this one. It's free and focuses on features the Algolia API doesn't provide natively:
- Date range filtering — fetch stories from a specific time window with simple date strings, not unix timestamps
- Multiple sort modes — by date, score, or number of comments
- Domain extraction — automatically parses the source domain from each story URL
- Story ID input — pass specific HN story IDs for targeted data retrieval
- 5,000 result limit — go beyond Algolia's 1,000-per-query cap
- Enhanced output — clean JSON with all metadata including descendants count and extracted domains
Python example
from apify_client import ApifyClient
client = ApifyClient("your_apify_token")
run = client.actor("cryptosignals/hackernews-scraper").call(run_input={
"search": "AI agents",
"sort": "byPopularity",
"dateFrom": "2026-01-01",
"dateTo": "2026-03-20",
"maxResults": 100
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f'{item["points"]} pts | {item["title"]}')
print(f' Domain: {item.get("domain", "N/A")}')
print(f' Comments: {item.get("num_comments", 0)}')
Is it the most popular? Not by a long shot — epctex has 30x more users. But if you need date filtering with human-readable dates and automatic domain extraction without a monthly subscription, it's worth a look.
When to Use What: Decision Guide
Use the Algolia API when:
- You need real-time search with sub-second response times
- Your queries fit within 1,000 results
- You're building an app that queries HN on-demand
- You don't need scheduled data collection
- You're comfortable working with unix timestamps for dates
Use an Apify actor when:
- You need bulk data export (thousands of stories at once)
- You want scheduled scraping with webhooks or Slack alerts
- You need enriched output (domain extraction, activity scores)
- You're feeding data into a pipeline (Apify integrates with Google Sheets, Slack, Zapier, n8n, and more)
- You want a managed solution without maintaining infrastructure
Build your own scraper when:
- You need to track actual front page rankings over time (not available via any API)
- You have very specific data transformation requirements
- You're comfortable maintaining your own infrastructure and handling rate limits
The HN Official API (Firebase)
Worth mentioning: HN also has an official Firebase API that returns individual items by ID. It's real-time but has no search — you'd need to poll item IDs sequentially. It's best suited for building live HN clients, not for data extraction.
Conclusion
The HN data ecosystem is surprisingly well-served. The Algolia API handles 80% of use cases for free, with zero setup. For the remaining 20% — bulk export, scheduling, monitoring, enriched output — Apify actors provide turnkey solutions at various price points.
My recommendation: start with the Algolia API. If you hit its limits (1,000 results, no scheduling, no domain extraction), then look at the actors. The epctex actor is the most battle-tested choice. For a free alternative with date filtering and domain extraction, give mine a try.
The best scraper is the one that fits your workflow. Don't over-engineer it.
Top comments (0)