How the Hacker News API Works
Hacker News exposes a public REST API built on Google's Firebase. It requires no authentication, no API keys, and no developer application. You can query it right now with a single curl command.
The base URL is https://hacker-news.firebaseio.com/v0/. All responses are JSON. Because it is built on Firebase's real-time database, every item — stories, comments, job posts, polls — lives at a predictable URL based on its integer ID.
Here are the core endpoints:
Top, New, and Best Story Lists
These endpoints return an array of item IDs sorted by their respective ranking algorithm:
# Top stories (front page ranking)
curl "https://hacker-news.firebaseio.com/v0/topstories.json"
# Newest submissions
curl "https://hacker-news.firebaseio.com/v0/newstories.json"
# Best stories (long-term quality ranking)
curl "https://hacker-news.firebaseio.com/v0/beststories.json"
# Ask HN posts
curl "https://hacker-news.firebaseio.com/v0/askstories.json"
# Show HN posts
curl "https://hacker-news.firebaseio.com/v0/showstories.json"
# Job postings
curl "https://hacker-news.firebaseio.com/v0/jobstories.json"
Each of these returns an array of up to 500 item IDs. The top stories endpoint returns the current front page ordering, which changes every few minutes as votes and time decay interact.
Item Details
Once you have an item ID, you can fetch full details for any story, comment, or job post:
curl "https://hacker-news.firebaseio.com/v0/item/43211234.json"
A story response looks like this:
{
"id": 43211234,
"type": "story",
"title": "SQLite is not a toy database",
"url": "https://antonz.org/sqlite-is-not-a-toy-database/",
"by": "thunderbong",
"score": 847,
"descendants": 142,
"kids": [43211501, 43211398, 43211287],
"time": 1741234567
}
The kids field contains IDs of top-level comments. Each comment is itself an item, and may have its own kids — so fetching a full comment thread means recursively walking a tree of IDs and making an API call for each node.
User Profiles
Fetch a user's profile by username:
curl "https://hacker-news.firebaseio.com/v0/user/pg.json"
Response:
{
"id": "pg",
"created": 1160418092,
"karma": 155111,
"about": "Co-founder of Viaweb and Y Combinator...",
"submitted": [43198765, 43187654, 43156789]
}
The submitted array contains IDs of every item the user has ever posted or commented on, most recent first. For prolific users, this array can have tens of thousands of entries.
Search via Algolia
HN's own search is powered by Algolia and exposes a separate API endpoint that supports full-text search, date filtering, and result ranking:
# Search stories by keyword
curl "http://hn.algolia.com/api/v1/search?query=rust+programming&tags=story"
# Search with date filtering (Unix timestamps)
curl "http://hn.algolia.com/api/v1/search?query=openai&numericFilters=created_at_i>1700000000"
# Search recent items (past week)
curl "http://hn.algolia.com/api/v1/search_by_date?query=webassembly&tags=story"
The Algolia endpoint returns richer metadata than the Firebase API — including highlighted text matches, story URLs, and author information — all in a single response per page.
| Endpoint | Purpose | Pagination |
|---|---|---|
search |
Best match ranking |
page parameter |
search_by_date |
Chronological |
page parameter |
Tags available: story, comment, poll, job, ask_hn, show_hn
|
The Pagination and Rate Limit Problem
The direct API is excellent for simple lookups. The problems start when you try to collect data at any meaningful scale.
The fan-out problem. The story list endpoints give you IDs, not content. To get 500 top stories with their metadata, you need 500 individual HTTP requests to the item endpoint — one per story. To get the comments on those 500 stories, you need additional requests for every comment ID in every kids array. A single thread with 200 comments might require 200+ additional requests to fully traverse. For the full front page with comments, you are easily looking at 5,000–10,000 API calls.
No bulk endpoint. There is no way to say "give me the top 100 stories with their metadata in one request." The Firebase API is intentionally simple: one ID lookup per request. You build the fan-out logic yourself.
Rate limits. Firebase does not publish explicit rate limits for the HN API, but aggressive concurrent requests will result in connection refusals or throttling. Production scrapers need exponential backoff, connection pooling, and retry logic to work reliably.
Comment tree traversal. Comments are nested arbitrarily deep. Fetching a complete thread means walking the tree recursively, and you can't know the depth in advance. A top-level comment might have no replies or it might have a 15-level-deep argument about tabs versus spaces.
No date filtering on the Firebase API. If you want stories from a specific date range using the Firebase API, you have to fetch IDs, fetch each item, check the timestamp, and discard anything out of range. There is no server-side filter.
The Algolia search endpoint solves some of these problems (date filtering, full-text search), but introduces its own: it is an index, not the live database, so there is a sync delay. And you still get paginated results that require cursor management for large collections.
This is exactly the kind of infrastructure problem that an Apify Actor is built to solve.
Using the Hacker News Scraper on Apify
The Hacker News Scraper is a ready-to-use Actor that handles the fan-out, pagination, rate limiting, and tree traversal for you. You configure what you want, click run, and get structured data back.
Step 1: Open the Actor
Go to apify.com/cryptosignals/hackernews-scraper. You can run it from the web console with no code required.
Step 2: Configure Your Search
In the Actor's input configuration, set your parameters:
- Search terms — keywords or phrases to search for across HN stories. You can pass multiple terms and they will be processed in parallel.
- Max results — cap the total number of items returned. Useful for controlling cost and runtime during testing.
- Result type — choose to collect stories, comments, or both. For recruiting use cases, you typically want stories only. For sentiment analysis, you want comments.
Step 3: Run and Export
Click Start. Depending on the volume requested, runs typically complete in seconds to a few minutes. When finished, export results in any of:
- JSON — for programmatic consumption and data pipelines
- CSV — for spreadsheets, Excel, and BI tools like Tableau or Looker
- Excel — for business stakeholders who want to open it directly
- XML — for legacy system integration
Each story record includes the title, URL, author, score, comment count, submission timestamp, and story type. Comment records include the text (as HTML), author, timestamp, parent item ID, and nesting depth.
Running the Actor Programmatically
The web console is the right starting point, but automation requires calling the Actor from code. The Apify client libraries wrap the REST API and handle polling for run completion.
JavaScript (Node.js)
Install the client:
npm install apify-client
Run the Actor and process results:
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({
token: 'YOUR_APIFY_API_TOKEN',
});
// Run the HN Scraper actor
const run = await client.actor('cryptosignals/hackernews-scraper').call({
searchTerms: ['rust programming', 'webassembly', 'llm inference'],
maxResults: 300,
});
// Fetch results from the default dataset
const { items } = await client
.dataset(run.defaultDatasetId)
.listItems();
console.log(`Collected ${items.length} items`);
// Process each story
for (const story of items) {
console.log(`[${story.score} pts] ${story.title}`);
console.log(` URL: ${story.url}`);
console.log(` By: ${story.by} | Comments: ${story.descendants}`);
console.log(` Posted: ${new Date(story.time * 1000).toISOString()}`);
console.log('---');
}
Python
Install the client:
pip install apify-client
Run and fetch results:
from apify_client import ApifyClient
from datetime import datetime
client = ApifyClient('YOUR_APIFY_API_TOKEN')
# Configure and run the actor
run = client.actor('cryptosignals/hackernews-scraper').call(run_input={
'searchTerms': ['rust programming', 'webassembly', 'llm inference'],
'maxResults': 300,
})
# Fetch results
dataset_items = client.dataset(run['defaultDatasetId']).list_items().items
print(f'Collected {len(dataset_items)} items')
for story in dataset_items:
posted = datetime.fromtimestamp(story['time']).strftime('%Y-%m-%d')
print(f"[{story.get('score', 0)} pts] {story['title']}")
print(f" URL: {story.get('url', 'N/A')}")
print(f" By: {story['by']} | Comments: {story.get('descendants', 0)}")
print(f" Posted: {posted}")
print('---')
Both examples follow the same pattern: initialize the client with your API token, call the Actor with your desired input, wait for completion (the client handles polling automatically), then iterate over the dataset. You get your data back as a Python list of dictionaries or a JavaScript array of objects — no JSON parsing, no cursor management, no retry logic to write.
Scheduling Recurring Scrapes
Most HN monitoring use cases require ongoing collection, not one-off runs. Apify's scheduling system supports any cron expression, and each scheduled run creates a fresh dataset — giving you a clean time-series record of results.
Setting Up a Schedule via the Console
- Open the Schedules section in Apify Console
- Click Create new schedule
- Select the Hacker News Scraper actor
- Set your cron expression — for example,
0 7 * * *to run daily at 7:00 AM UTC - Paste your Actor input (search terms, max results)
- Save the schedule
For "Who's Hiring" threads — which Y Combinator posts on the first weekday of every month — you might set a monthly schedule to capture and archive those threads systematically.
Setting Up a Schedule via the API
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({
token: 'YOUR_APIFY_API_TOKEN',
});
// Create a schedule that runs every weekday morning at 7:00 AM UTC
const schedule = await client.schedules().create({
name: 'hn-daily-tech-monitor',
cronExpression: '0 7 * * 1-5',
timezone: 'UTC',
actions: [{
type: 'RUN_ACTOR',
actorId: 'cryptosignals/hackernews-scraper',
runInput: {
body: JSON.stringify({
searchTerms: ['your competitor name', 'your product category'],
maxResults: 200,
}),
contentType: 'application/json',
},
}],
});
console.log(`Schedule created: ${schedule.id}`);
Connecting to Downstream Systems
Configure a webhook on the Actor run to trigger the moment a scrape finishes. A typical pipeline looks like:
- Scheduled Actor run completes
- Webhook fires to your endpoint (or an Apify webhook integration)
- Your system reads the dataset via the Apify API
- New records land in your database, Slack channel, or Google Sheet
This gives you a fully automated HN monitoring pipeline with no servers to manage.
Practical Use Cases
Recruiting — "Who's Hiring" Threads
Y Combinator posts a "Who's Hiring" thread on the first business day of every month. These threads contain hundreds of job listings with direct contact information for hiring managers — no recruiter intermediary, no applicant tracking system. Scraping these threads monthly gives you a structured database of active technical hiring, segmented by company, role, tech stack, and location. Search for "Who is Hiring" or "Who's Hiring" to retrieve the thread, then collect all comments.
Trend Detection
HN surfaces early-stage technology discussions before they hit mainstream media. Run a weekly scrape across a set of emerging technology keywords and track score trajectories over time. A story about a new database, programming language, or infrastructure tool that scores 500+ points in its first hour is a meaningful signal that the technical community has noticed something.
For trend analysis, sort by score descending and look at the ratio of points to comments — high points with low comments often indicates strong signal (people upvote and move on), while high comments with lower points may indicate controversy.
Competitive Intelligence
Search for your competitors by name. Collect every HN thread where they are mentioned, along with the comments. Comments on HN tend to be technically sophisticated and candid — you will find honest assessments of product tradeoffs, complaints about pricing, and comparisons to alternatives that you will never see in a vendor's marketing materials.
Track these threads over time to spot when sentiment shifts, when a new competitor enters a discussion, or when a feature gap starts appearing repeatedly in comments.
Sentiment Analysis and NLP
HN comments are particularly good training data and evaluation sets for technical NLP tasks because the writing is dense, opinionated, and domain-specific. For sentiment analysis on developer tooling, HN comments are a better signal than Twitter/X or Reddit because the audience is more homogenous and the discussions are more focused.
Collect comments for a specific story or keyword, strip the HTML tags from the text field, and you have clean input for any standard NLP pipeline.
Academic Research
Researchers studying technical communities, online discourse, innovation diffusion, or information cascades have used HN as a primary dataset. The public API and the Algolia search index make it one of the more accessible large social datasets, and the long history (posts dating back to 2007) supports longitudinal studies.
Why Use the Actor vs. Direct API Calls
The Hacker News Firebase API is free and open — so why layer an Apify Actor on top of it?
| Concern | Direct HN API | Apify Actor |
|---|---|---|
| Story metadata | 500 separate HTTP requests for 500 stories | Single configured run |
| Full-text search | Not supported (Algolia endpoint required separately) | Built-in across all runs |
| Comment retrieval | Recursive tree traversal, N+1 requests per thread | Handled automatically |
| Rate limiting | Must implement backoff yourself | Built-in retry and backoff |
| Date filtering | Fetch-and-discard; no server-side filter on Firebase | Supported via Algolia integration |
| Export formats | JSON only | JSON, CSV, Excel, XML |
| Scheduling | You manage cron, hosting, error alerts | Built-in cron with monitoring |
| Error recovery | Build it yourself | Automatic retries and failure alerts |
| Storage | You provision and manage | Managed datasets with retention |
| Multiple search terms | Sequential loops | Parallel execution |
For a single quick lookup — checking the current score on a specific item, fetching one user profile — the direct API is the right tool. It is a curl command. But for production data collection across multiple search terms, daily runs, comment tree traversal, and integration with downstream systems, writing and maintaining that infrastructure yourself is a significant engineering investment. The Actor packages all of it.
Getting Started
The shortest path from zero to a working HN dataset:
- Create a free Apify account
- Open the Hacker News Scraper
- Enter your search terms and a result limit
- Click Start and wait for the run to complete
- Download your results in JSON or CSV
For recurring collection, add a schedule. For programmatic access, copy your API token from the Apify Console and use the code examples above directly in your project.
Hacker News is 17 years of high-quality technical discourse, freely accessible through a public API. Whether you are building a recruiting pipeline, a technology trend tracker, or a competitive intelligence dashboard, the dataset is there — you just need a reliable way to collect it at scale.
The Hacker News Scraper Actor is available at apify.com/cryptosignals/hackernews-scraper. For reference on the underlying API, see the official HN API documentation and the Algolia HN Search API.
Disclosure: This post contains affiliate links. I may earn a commission if you sign up through my links, at no extra cost to you.
Disclosure: This post contains affiliate links. I may earn a commission if you sign up through my links, at no extra cost to you.
Compare web scraping APIs:
- ScraperAPI — 5,000 free credits, 50+ countries, structured data parsing
- Scrape.do — From $29/mo, strong Cloudflare bypass
- ScrapeOps — Proxy comparison + monitoring dashboard
Need custom web scraping? Email hustler@curlship.com — fast turnaround, fair pricing.
Top comments (0)