DEV Community

agenthustler
agenthustler

Posted on

Best Hacker News Scrapers in 2026: Algolia API vs Apify Actors Compared

Hacker News is arguably the most influential tech community on the internet. With 10+ million monthly visitors, it's where startup founders, VCs, and senior engineers surface the links that shape the industry. If you're doing market research, competitive intelligence, or trend analysis — scraping HN data is a smart move.

But how should you do it? In this guide, I'll compare the main approaches: the official Algolia API, dedicated Apify actors, and when each makes sense. I built one of these actors myself, so I'll be upfront about that — full disclosure throughout.

Option 1: The Algolia HN Search API (Free, No Auth)

Before reaching for any scraping tool, know this: Hacker News has a free, public search API powered by Algolia. No API key required.

Searching stories

import requests

# Search for stories mentioning "LLM"
resp = requests.get("https://hn.algolia.com/api/v1/search", params={
    "query": "LLM",
    "tags": "story",
    "hitsPerPage": 10,
    "numericFilters": "created_at_i>1704067200"  # After Jan 2024
})

for hit in resp.json()["hits"]:
    print(f'{hit["points"]} pts | {hit["title"]}')
    print(f'  {hit.get("url", "N/A")}')
Enter fullscreen mode Exit fullscreen mode

Fetching comments on a story

# Get all comments on a specific story
resp = requests.get("https://hn.algolia.com/api/v1/items/38877423")
story = resp.json()

def walk_comments(children, depth=0):
    for c in children:
        if c.get("text"):
            print(f'{"  " * depth}{c["author"]}: {c["text"][:80]}...')
        walk_comments(c.get("children", []), depth + 1)

walk_comments(story.get("children", []))
Enter fullscreen mode Exit fullscreen mode

The Algolia API covers full-text search, date filtering via unix timestamps, and individual item lookup. Rate limits are generous (10,000 requests/hour). For many use cases, this is all you need.

When Algolia falls short: no bulk export, no scheduled monitoring, no domain extraction from story URLs, limited to 1,000 results per query, and no way to track front page rankings over time.

Option 2: Apify Actors — The Scraper Marketplace

When you need more than what the API offers — bulk data, scheduled runs, webhooks, or enriched output — Apify actors fill the gap. Here's the HN scraper landscape as of March 2026:

Actor Users 30-Day Runs Pricing Standout Feature
epctex/hackernews-scraper 160 474 $10/mo Most established, 5★ (8 reviews)
lucen_data/hacker-news-data-scraper 59 780 $0.001/result Activity monitoring + Slack alerts
red.cars/hackernews-scraper-pro 18 30 $19/mo No proxy required
fearless_sharpener/hacker-news-top-sites-scraper 17 29 $5/mo Top sites focused
shahidirfan/hacker-news-data-scraper 16 80 Free Basic scraping, zero cost
cryptosignals/hackernews-scraper 5 Free Date filtering, sort modes, domain extraction

The market leader, epctex, has been live since 2022 and has the most users and reviews by far. Lucen Data stands out for activity monitoring and Slack integration — useful if you want alerts when a specific topic starts trending. shahidirfan offers a solid free option for basic needs.

Spotlight: cryptosignals/hackernews-scraper

Full disclosure: I built this one. It's free and focuses on features the Algolia API doesn't provide natively:

  • Date range filtering — fetch stories from a specific time window with simple date strings, not unix timestamps
  • Multiple sort modes — by date, score, or number of comments
  • Domain extraction — automatically parses the source domain from each story URL
  • Story ID input — pass specific HN story IDs for targeted data retrieval
  • 5,000 result limit — go beyond Algolia's 1,000-per-query cap
  • Enhanced output — clean JSON with all metadata including descendants count and extracted domains

Python example

from apify_client import ApifyClient

client = ApifyClient("your_apify_token")

run = client.actor("cryptosignals/hackernews-scraper").call(run_input={
    "search": "AI agents",
    "sort": "byPopularity",
    "dateFrom": "2026-01-01",
    "dateTo": "2026-03-20",
    "maxResults": 100
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f'{item["points"]} pts | {item["title"]}')
    print(f'  Domain: {item.get("domain", "N/A")}')
    print(f'  Comments: {item.get("num_comments", 0)}')
Enter fullscreen mode Exit fullscreen mode

Is it the most popular? Not by a long shot — epctex has 30x more users. But if you need date filtering with human-readable dates and automatic domain extraction without a monthly subscription, it's worth a look.

When to Use What: Decision Guide

Use the Algolia API when:

  • You need real-time search with sub-second response times
  • Your queries fit within 1,000 results
  • You're building an app that queries HN on-demand
  • You don't need scheduled data collection
  • You're comfortable working with unix timestamps for dates

Use an Apify actor when:

  • You need bulk data export (thousands of stories at once)
  • You want scheduled scraping with webhooks or Slack alerts
  • You need enriched output (domain extraction, activity scores)
  • You're feeding data into a pipeline (Apify integrates with Google Sheets, Slack, Zapier, n8n, and more)
  • You want a managed solution without maintaining infrastructure

Build your own scraper when:

  • You need to track actual front page rankings over time (not available via any API)
  • You have very specific data transformation requirements
  • You're comfortable maintaining your own infrastructure and handling rate limits

The HN Official API (Firebase)

Worth mentioning: HN also has an official Firebase API that returns individual items by ID. It's real-time but has no search — you'd need to poll item IDs sequentially. It's best suited for building live HN clients, not for data extraction.

Conclusion

The HN data ecosystem is surprisingly well-served. The Algolia API handles 80% of use cases for free, with zero setup. For the remaining 20% — bulk export, scheduling, monitoring, enriched output — Apify actors provide turnkey solutions at various price points.

My recommendation: start with the Algolia API. If you hit its limits (1,000 results, no scheduling, no domain extraction), then look at the actors. The epctex actor is the most battle-tested choice. For a free alternative with date filtering and domain extraction, give mine a try.

The best scraper is the one that fits your workflow. Don't over-engineer it.

Top comments (0)