Why Scrape Hacker News?
Hacker News gets 10M+ visits/month from developers, founders, and investors. Whether you're tracking trends, monitoring mentions of your product, or building a dataset for analysis — HN data is gold.
The good news: HN has an official search API powered by Algolia. Most devs don't know about it. Let me show you how to use it properly.
The Algolia Search API
Base URL: https://hn.algolia.com/api/v1/
Fetching Stories by Type
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
The tags parameter controls what you get:
| Tag | What it returns |
|---|---|
story |
All stories |
ask_hn |
Ask HN posts |
show_hn |
Show HN posts |
comment |
Comments only |
front_page |
Currently on front page |
Date Filtering with numeric_filters
This is the killer feature most people miss. You can filter by Unix timestamp:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
You can combine filters: created_at_i>X,created_at_i<Y,points>100 gives you highly-voted stories in a date range.
Sorting: Relevance vs Date
Two endpoints handle this:
-
/search— sorted by relevance (default) -
/search_by_date— sorted by date (newest first)
For monitoring use cases (tracking mentions, watching trends), search_by_date is what you want.
Fetching Comment Trees
Each story has a nested comment tree. You can fetch it by item ID:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
User Profiles
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Domain Extraction
Want to find all HN submissions from a specific domain? The API supports this natively:
/search?query=&tags=story&restrictSearchableAttributes=url&query=techcrunch.com
This is incredibly useful for competitive analysis — see how your competitors' content performs on HN.
Rate Limits and Gotchas
The Algolia API is generous but has limits:
- 10,000 requests/hour (IP-based)
- Max 1,000 results per query (pagination tops out at page 50 × 20 hits)
- No bulk export endpoint — you need to paginate through date windows
For anything beyond light usage, you'll hit pagination limits fast. If you need full historical data or continuous monitoring, that's where purpose-built tools come in.
Scaling Up: Automated HN Scraping
For production use cases — daily trend reports, mention monitoring, or building datasets — I built an Apify actor for Hacker News scraping that handles:
- All story types (top, new, best, Ask HN, Show HN)
- Date range filtering with automatic pagination
- sortBy parameter (relevance, date, points)
- Domain extraction from submitted URLs
- Full comment trees with nested replies
- Built-in proxy rotation and retry logic
It runs on Apify's infrastructure so you don't need to manage rate limits or pagination yourself.
Quick Reference
| Use Case | Endpoint | Key Params |
|---|---|---|
| Search stories | /search |
query, tags=story
|
| Recent stories | /search_by_date |
tags=story, numericFilters
|
| Front page | /search |
tags=front_page |
| Comments on story | /items/{id} |
— |
| User profile | /users/{username} |
— |
| Domain filter | /search |
restrictSearchableAttributes=url |
HN's API is one of the best-kept secrets in the scraping world. For quick scripts, it's all you need. For production pipelines, pair it with proper infrastructure and you're set.
Top comments (0)