agenthustler

Posted on Apr 8

How to scrape Hacker News in 2026: API, JSON trick, and a free ready-to-use endpoint

#webdev #python #webscraping #tutorial

Hacker News is one of the richest data sources in tech. Every day, thousands of stories, comments, job posts, and community discussions flow through it. If you're building a trend tracker, a newsletter curator, a job aggregator, or just experimenting with social data — HN is a natural starting point.

The good news: you have several practical options. The bad news: each comes with trade-offs you should know before you start. This article walks through every real approach in 2026, with working Python examples.

Option 1: The Official Hacker News Firebase API

HN has an official API provided by Firebase. It's free, rate-limit-friendly, and gives you raw item data.

What it covers:

Stories, comments, jobs, polls, and Ask HN posts
Live "Top Stories", "New Stories", "Best Stories" feeds
User profiles
The "Updates" feed (real-time changes)

The catch: There's no full-text search. You can fetch the top 500 items by ID and then fetch each one individually — but that's a lot of requests if you just want the titles.

Getting the top stories

import requests

BASE = "https://hacker-news.firebaseio.com/v0"

def get_top_stories(limit=10):
    ids = requests.get(f"{BASE}/topstories.json").json()
    stories = []
    for item_id in ids[:limit]:
        item = requests.get(f"{BASE}/item/{item_id}.json").json()
        stories.append(item)
    return stories

for story in get_top_stories(5):
    print(story.get("title"), "—", story.get("score"), "pts")

Output:

Show HN: I built a local-first AI pair programmer — 1204 pts
Ask HN: What are you working on? (April 2026) — 832 pts
...

This works reliably. The downside is that fetching 100 stories means 101 HTTP requests (one for the list, one per item). Fine for small scripts; painful at scale.

Rate limits and etiquette

Firebase API doesn't publish official limits, but HN's guidelines ask you to be respectful. Add a small delay between requests:

import time

for item_id in ids[:50]:
    item = requests.get(f"{BASE}/item/{item_id}.json").json()
    time.sleep(0.05)  # 50ms between calls

Option 2: The JSON Trick — Append `.json` to Any HN URL

This is one of the more useful HN data tricks. Almost any HN URL works with a .json suffix appended, and Firebase serves the underlying data directly.

import requests

# Get a specific item by appending .json to the URL
item = requests.get("https://news.ycombinator.com/item?id=39789234.json").json()

# Get a user profile
user = requests.get("https://news.ycombinator.com/user?id=pg.json").json()

This accesses the same Firebase API, just via a slightly different URL pattern. The real power comes from using Firebase's query syntax directly:

# Get newest 25 stories using Firebase REST query params
resp = requests.get(
    "https://hacker-news.firebaseio.com/v0/newstories.json",
    params={"limitToFirst": "25", "orderBy": '"$key"'}
)
ids = resp.json()

Firebase supports limitToFirst, limitToLast, startAt, endAt, and orderBy — useful for paginating through large result sets.

Option 3: The Algolia Search API (Best for Most Use Cases)

Algolia powers HN's official search at hn.algolia.com. This is the approach you want if you need full-text search, date filtering, or keyword monitoring.

What it covers:

Full-text search across stories and comments
Filter by date range, score, author
Tag filters: story, comment, show_hn, ask_hn, job
Pagination

No API key required. It's free for reasonable use.

Searching stories by keyword

import requests

def search_hn(query, tags="story", limit=10):
    resp = requests.get(
        "https://hn.algolia.com/api/v1/search",
        params={
            "query": query,
            "tags": tags,
            "hitsPerPage": limit,
        }
    )
    data = resp.json()
    for hit in data["hits"]:
        print(f"[{hit.get('points', 0)} pts] {hit['title']} — {hit.get('url', '')}")

search_hn("LLM agents 2026")

Filtering by date and score

from datetime import datetime, timedelta

# Stories from the last 7 days with at least 100 points
week_ago = int((datetime.now() - timedelta(days=7)).timestamp())

resp = requests.get(
    "https://hn.algolia.com/api/v1/search",
    params={
        "query": "startup",
        "tags": "story",
        "numericFilters": f"created_at_i>{week_ago},points>=100",
        "hitsPerPage": 20,
    }
)

for hit in resp.json()["hits"]:
    print(hit["title"])

Getting recent stories (not by relevance)

Algolia exposes a search_by_date endpoint that returns items sorted chronologically rather than by relevance:

resp = requests.get(
    "https://hn.algolia.com/api/v1/search_by_date",
    params={"tags": "story", "hitsPerPage": 30}
)

Use search for relevance-ranked results, search_by_date for chronological. For monitoring pipelines, search_by_date is usually what you want.

Rate limits

Algolia's public HN API has undocumented limits but is generally tolerant of a few requests per second. For a monitoring script that runs every few minutes, you'll have no issues.

Option 4: A Free Hosted Endpoint (No Scraper Required)

If you want HN data without managing the above yourself — no Firebase polling loop, no Algolia rate-limit concerns, no infrastructure overhead — there's a hosted option: The Data Collector API at https://frog03-20494.wykr.es.

It offers:

/api/hackernews/search — keyword search across HN stories
/api/hackernews/trending — current trending/front-page stories
100 free calls with no credit card required

Getting a key takes one curl command:

curl -X POST https://frog03-20494.wykr.es/api/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com"}'

Then use it in Python:

import requests

API_KEY = "your-key-here"
BASE = "https://frog03-20494.wykr.es/api"

# Search HN stories
resp = requests.get(
    f"{BASE}/hackernews/search",
    params={"q": "AI agents", "limit": 10},
    headers={"X-API-Key": API_KEY}
)

for story in resp.json().get("results", []):
    print(f"[{story.get('score', 0)} pts] {story['title']}")

# Get trending stories
trending = requests.get(
    f"{BASE}/hackernews/trending",
    headers={"X-API-Key": API_KEY}
).json()

for story in trending.get("results", []):
    print(story["title"])

Good fit if you're prototyping quickly or don't want to maintain scraping infrastructure long-term.

Putting It Together: A Minimal HN Monitor

Here's a complete monitoring script that checks for new HN stories matching a keyword and prints anything that appeared in the last hour:

import requests
from datetime import datetime, timedelta

KEYWORD = "python"
ALGOLIA_SEARCH = "https://hn.algolia.com/api/v1/search_by_date"

def get_recent_stories(keyword, hours=1):
    cutoff = int((datetime.now() - timedelta(hours=hours)).timestamp())
    resp = requests.get(ALGOLIA_SEARCH, params={
        "query": keyword,
        "tags": "story",
        "numericFilters": f"created_at_i>{cutoff}",
        "hitsPerPage": 50,
    })
    return resp.json().get("hits", [])

if __name__ == "__main__":
    stories = get_recent_stories(KEYWORD)
    if not stories:
        print(f"No new stories about '{KEYWORD}' in the last hour.")
    else:
        print(f"Found {len(stories)} new stories about '{KEYWORD}':")
        for s in stories:
            print(f"  [{s.get('points', 0)} pts] {s['title']}")
            url = s.get('url') or f"https://news.ycombinator.com/item?id={s['objectID']}"
            print(f"  {url}")

Run this on a cron schedule, a GitHub Actions workflow, or any simple loop and you have a free real-time HN monitor.

Which Option Should You Use?

Use case	Best option
Get live front-page stories	Firebase API (`/topstories.json`)
Full-text keyword search	Algolia API
Monitor a topic over time	Algolia `search_by_date` + cron
Item-level detail (comments, metadata)	Firebase item API
Don't want to write/maintain a scraper	The Data Collector API
High-volume pipeline	Algolia + Firebase, or hosted API

Final Notes

HN data is publicly available and widely used for research, trend monitoring, and tooling. The Algolia API is extremely well-designed and is usually the first option to reach for — full-text search with date and score filtering covers most use cases.

The Firebase API is reliable but chatty for bulk fetches. For anything beyond basic item lookups, Algolia saves you significant request overhead.

If you're prototyping and don't want to manage infrastructure, the hosted endpoint at https://frog03-20494.wykr.es gets you 100 free calls with an instant API key. Useful for quick experiments or low-volume pipelines.

Happy hacking — and remember to be kind to public APIs.

DEV Community

How to scrape Hacker News in 2026: API, JSON trick, and a free ready-to-use endpoint

Option 1: The Official Hacker News Firebase API

Getting the top stories

Rate limits and etiquette

Option 2: The JSON Trick — Append `.json` to Any HN URL

Option 3: The Algolia Search API (Best for Most Use Cases)

Searching stories by keyword

Filtering by date and score

Getting recent stories (not by relevance)

Rate limits

Option 4: A Free Hosted Endpoint (No Scraper Required)

Putting It Together: A Minimal HN Monitor

Which Option Should You Use?

Final Notes

Top comments (0)

Option 1: The Official Hacker News Firebase API

Getting the top stories

Rate limits and etiquette

Option 2: The JSON Trick — Append .json to Any HN URL

Option 3: The Algolia Search API (Best for Most Use Cases)

Searching stories by keyword

Filtering by date and score

Getting recent stories (not by relevance)

Rate limits

Option 4: A Free Hosted Endpoint (No Scraper Required)

Putting It Together: A Minimal HN Monitor

Which Option Should You Use?

Final Notes

Option 2: The JSON Trick — Append `.json` to Any HN URL