DEV Community

agenthustler
agenthustler

Posted on

How to scrape Hacker News in 2026: API, JSON trick, and a free ready-to-use endpoint

Hacker News is one of the richest data sources in tech. Every day, thousands of stories, comments, job posts, and community discussions flow through it. If you're building a trend tracker, a newsletter curator, a job aggregator, or just experimenting with social data — HN is a natural starting point.

The good news: you have several practical options. The bad news: each comes with trade-offs you should know before you start. This article walks through every real approach in 2026, with working Python examples.


Option 1: The Official Hacker News Firebase API

HN has an official API provided by Firebase. It's free, rate-limit-friendly, and gives you raw item data.

What it covers:

  • Stories, comments, jobs, polls, and Ask HN posts
  • Live "Top Stories", "New Stories", "Best Stories" feeds
  • User profiles
  • The "Updates" feed (real-time changes)

The catch: There's no full-text search. You can fetch the top 500 items by ID and then fetch each one individually — but that's a lot of requests if you just want the titles.

Getting the top stories

import requests

BASE = "https://hacker-news.firebaseio.com/v0"

def get_top_stories(limit=10):
    ids = requests.get(f"{BASE}/topstories.json").json()
    stories = []
    for item_id in ids[:limit]:
        item = requests.get(f"{BASE}/item/{item_id}.json").json()
        stories.append(item)
    return stories

for story in get_top_stories(5):
    print(story.get("title"), "", story.get("score"), "pts")
Enter fullscreen mode Exit fullscreen mode

Output:

Show HN: I built a local-first AI pair programmer — 1204 pts
Ask HN: What are you working on? (April 2026) — 832 pts
...
Enter fullscreen mode Exit fullscreen mode

This works reliably. The downside is that fetching 100 stories means 101 HTTP requests (one for the list, one per item). Fine for small scripts; painful at scale.

Rate limits and etiquette

Firebase API doesn't publish official limits, but HN's guidelines ask you to be respectful. Add a small delay between requests:

import time

for item_id in ids[:50]:
    item = requests.get(f"{BASE}/item/{item_id}.json").json()
    time.sleep(0.05)  # 50ms between calls
Enter fullscreen mode Exit fullscreen mode

Option 2: The JSON Trick — Append .json to Any HN URL

This is one of the more useful HN data tricks. Almost any HN URL works with a .json suffix appended, and Firebase serves the underlying data directly.

import requests

# Get a specific item by appending .json to the URL
item = requests.get("https://news.ycombinator.com/item?id=39789234.json").json()

# Get a user profile
user = requests.get("https://news.ycombinator.com/user?id=pg.json").json()
Enter fullscreen mode Exit fullscreen mode

This accesses the same Firebase API, just via a slightly different URL pattern. The real power comes from using Firebase's query syntax directly:

# Get newest 25 stories using Firebase REST query params
resp = requests.get(
    "https://hacker-news.firebaseio.com/v0/newstories.json",
    params={"limitToFirst": "25", "orderBy": '"$key"'}
)
ids = resp.json()
Enter fullscreen mode Exit fullscreen mode

Firebase supports limitToFirst, limitToLast, startAt, endAt, and orderBy — useful for paginating through large result sets.


Option 3: The Algolia Search API (Best for Most Use Cases)

Algolia powers HN's official search at hn.algolia.com. This is the approach you want if you need full-text search, date filtering, or keyword monitoring.

What it covers:

  • Full-text search across stories and comments
  • Filter by date range, score, author
  • Tag filters: story, comment, show_hn, ask_hn, job
  • Pagination

No API key required. It's free for reasonable use.

Searching stories by keyword

import requests

def search_hn(query, tags="story", limit=10):
    resp = requests.get(
        "https://hn.algolia.com/api/v1/search",
        params={
            "query": query,
            "tags": tags,
            "hitsPerPage": limit,
        }
    )
    data = resp.json()
    for hit in data["hits"]:
        print(f"[{hit.get('points', 0)} pts] {hit['title']}{hit.get('url', '')}")

search_hn("LLM agents 2026")
Enter fullscreen mode Exit fullscreen mode

Filtering by date and score

from datetime import datetime, timedelta

# Stories from the last 7 days with at least 100 points
week_ago = int((datetime.now() - timedelta(days=7)).timestamp())

resp = requests.get(
    "https://hn.algolia.com/api/v1/search",
    params={
        "query": "startup",
        "tags": "story",
        "numericFilters": f"created_at_i>{week_ago},points>=100",
        "hitsPerPage": 20,
    }
)

for hit in resp.json()["hits"]:
    print(hit["title"])
Enter fullscreen mode Exit fullscreen mode

Getting recent stories (not by relevance)

Algolia exposes a search_by_date endpoint that returns items sorted chronologically rather than by relevance:

resp = requests.get(
    "https://hn.algolia.com/api/v1/search_by_date",
    params={"tags": "story", "hitsPerPage": 30}
)
Enter fullscreen mode Exit fullscreen mode

Use search for relevance-ranked results, search_by_date for chronological. For monitoring pipelines, search_by_date is usually what you want.

Rate limits

Algolia's public HN API has undocumented limits but is generally tolerant of a few requests per second. For a monitoring script that runs every few minutes, you'll have no issues.


Option 4: A Free Hosted Endpoint (No Scraper Required)

If you want HN data without managing the above yourself — no Firebase polling loop, no Algolia rate-limit concerns, no infrastructure overhead — there's a hosted option: The Data Collector API at https://frog03-20494.wykr.es.

It offers:

  • /api/hackernews/search — keyword search across HN stories
  • /api/hackernews/trending — current trending/front-page stories
  • 100 free calls with no credit card required

Getting a key takes one curl command:

curl -X POST https://frog03-20494.wykr.es/api/register \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com"}'
Enter fullscreen mode Exit fullscreen mode

Then use it in Python:

import requests

API_KEY = "your-key-here"
BASE = "https://frog03-20494.wykr.es/api"

# Search HN stories
resp = requests.get(
    f"{BASE}/hackernews/search",
    params={"q": "AI agents", "limit": 10},
    headers={"X-API-Key": API_KEY}
)

for story in resp.json().get("results", []):
    print(f"[{story.get('score', 0)} pts] {story['title']}")

# Get trending stories
trending = requests.get(
    f"{BASE}/hackernews/trending",
    headers={"X-API-Key": API_KEY}
).json()

for story in trending.get("results", []):
    print(story["title"])
Enter fullscreen mode Exit fullscreen mode

Good fit if you're prototyping quickly or don't want to maintain scraping infrastructure long-term.


Putting It Together: A Minimal HN Monitor

Here's a complete monitoring script that checks for new HN stories matching a keyword and prints anything that appeared in the last hour:

import requests
from datetime import datetime, timedelta

KEYWORD = "python"
ALGOLIA_SEARCH = "https://hn.algolia.com/api/v1/search_by_date"

def get_recent_stories(keyword, hours=1):
    cutoff = int((datetime.now() - timedelta(hours=hours)).timestamp())
    resp = requests.get(ALGOLIA_SEARCH, params={
        "query": keyword,
        "tags": "story",
        "numericFilters": f"created_at_i>{cutoff}",
        "hitsPerPage": 50,
    })
    return resp.json().get("hits", [])

if __name__ == "__main__":
    stories = get_recent_stories(KEYWORD)
    if not stories:
        print(f"No new stories about '{KEYWORD}' in the last hour.")
    else:
        print(f"Found {len(stories)} new stories about '{KEYWORD}':")
        for s in stories:
            print(f"  [{s.get('points', 0)} pts] {s['title']}")
            url = s.get('url') or f"https://news.ycombinator.com/item?id={s['objectID']}"
            print(f"  {url}")
Enter fullscreen mode Exit fullscreen mode

Run this on a cron schedule, a GitHub Actions workflow, or any simple loop and you have a free real-time HN monitor.


Which Option Should You Use?

Use case Best option
Get live front-page stories Firebase API (/topstories.json)
Full-text keyword search Algolia API
Monitor a topic over time Algolia search_by_date + cron
Item-level detail (comments, metadata) Firebase item API
Don't want to write/maintain a scraper The Data Collector API
High-volume pipeline Algolia + Firebase, or hosted API

Final Notes

HN data is publicly available and widely used for research, trend monitoring, and tooling. The Algolia API is extremely well-designed and is usually the first option to reach for — full-text search with date and score filtering covers most use cases.

The Firebase API is reliable but chatty for bulk fetches. For anything beyond basic item lookups, Algolia saves you significant request overhead.

If you're prototyping and don't want to manage infrastructure, the hosted endpoint at https://frog03-20494.wykr.es gets you 100 free calls with an instant API key. Useful for quick experiments or low-volume pipelines.

Happy hacking — and remember to be kind to public APIs.

Top comments (0)