agenthustler

Posted on Mar 20 • Edited on Apr 19

Scraping Hacker News in 2026: Complete Guide (Algolia API + Date Filters)

#hackernews #webscraping #api #python

Why Scrape Hacker News?

Hacker News gets 10M+ visits/month from developers, founders, and investors. Whether you're tracking trends, monitoring mentions of your product, or building a dataset for analysis — HN data is gold.

The good news: HN has an official search API powered by Algolia. Most devs don't know about it. Let me show you how to use it properly.

The Algolia Search API

Base URL: https://hn.algolia.com/api/v1/

Fetching Stories by Type

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

The tags parameter controls what you get:

Tag	What it returns
`story`	All stories
`ask_hn`	Ask HN posts
`show_hn`	Show HN posts
`comment`	Comments only
`front_page`	Currently on front page

Date Filtering with numeric_filters

This is the killer feature most people miss. You can filter by Unix timestamp:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

You can combine filters: created_at_i>X,created_at_i<Y,points>100 gives you highly-voted stories in a date range.

Sorting: Relevance vs Date

Two endpoints handle this:

/search — sorted by relevance (default)
/search_by_date — sorted by date (newest first)

For monitoring use cases (tracking mentions, watching trends), search_by_date is what you want.

Fetching Comment Trees

Each story has a nested comment tree. You can fetch it by item ID:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

User Profiles

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Domain Extraction

Want to find all HN submissions from a specific domain? The API supports this natively:

/search?query=&tags=story&restrictSearchableAttributes=url&query=techcrunch.com

This is incredibly useful for competitive analysis — see how your competitors' content performs on HN.

Rate Limits and Gotchas

The Algolia API is generous but has limits:

10,000 requests/hour (IP-based)
Max 1,000 results per query (pagination tops out at page 50 × 20 hits)
No bulk export endpoint — you need to paginate through date windows

For anything beyond light usage, you'll hit pagination limits fast. If you need full historical data or continuous monitoring, that's where purpose-built tools come in.

Scaling Up: Automated HN Scraping

For production use cases — daily trend reports, mention monitoring, or building datasets — I built an Apify actor for Hacker News scraping that handles:

All story types (top, new, best, Ask HN, Show HN)
Date range filtering with automatic pagination
sortBy parameter (relevance, date, points)
Domain extraction from submitted URLs
Full comment trees with nested replies
Built-in proxy rotation and retry logic

It runs on Apify's infrastructure so you don't need to manage rate limits or pagination yourself.

Quick Reference

Use Case	Endpoint	Key Params
Search stories	`/search`	`query`, `tags=story`
Recent stories	`/search_by_date`	`tags=story`, `numericFilters`
Front page	`/search`	`tags=front_page`
Comments on story	`/items/{id}`	—
User profile	`/users/{username}`	—
Domain filter	`/search`	`restrictSearchableAttributes=url`

HN's API is one of the best-kept secrets in the scraping world. For quick scripts, it's all you need. For production pipelines, pair it with proper infrastructure and you're set.

DEV Community