How to Scrape Hacker News Comments in 2026 — Free API
Level: Beginner to Intermediate
Stack: Python + Playwright + Apify Actor
Time: ~20 minutes
Cost: Free to develop, $0.0005 per comment on Apify
Why Scrape Hacker News Comments?
Hacker News is a goldmine of developer discussions, startup ideas, and technical debates. Building an app that taps into this conversation requires clean data — not screenscraped HTML chaos.
In this tutorial, I'll show you how to build a production-ready HN Comment Scraper using Apify and Playwright, then publish it to the Apify Store so anyone can use it.
What We're Building
A reusable Apify Actor that:
- Accepts a HN story URL → scrapes all comments
- OR accepts a keyword → searches HN Algolia → scrapes top stories' comments
- Returns structured JSON with author, text, timestamp, depth, replies
Step 1 — Project Setup
npm install -g apify-cli
apify init hackernews-comment-scraper
cd hackernews-comment-scraper
mkdir -p src
Step 2 — Write the Scraper (src/main.py)
import asyncio
import json
from urllib.parse import quote
try:
from playwright.async_api import async_playwright
except ImportError:
import subprocess
subprocess.check_call(["pip", "install", "playwright", "--quiet"])
subprocess.check_call(["playwright", "install", "chromium", "--with-deps"])
from playwright.async_api import async_playwright
async def scrape_comments(page, url, max_comments=50, max_replies=5):
await page.goto(url, wait_until="domcontentloaded", timeout=30000)
await page.wait_for_selector(".comment", timeout=15000)
comments = []
comment_elements = await page.query_selector_all(".comment")
for i, el in enumerate(comment_elements[:max_comments]):
try:
author_el = await el.query_selector(".hnuser")
author = await author_el.inner_text() if author_el else "unknown"
text_el = await el.query_selector(".comment-body")
text = await text_el.inner_text() if text_el else ""
text = text.strip()
time_el = await el.query_selector(".age")
time = await time_el.get_attribute("title") if time_el else ""
replies = []
if max_replies > 0:
reply_els = await el.query_selector_all(".comment")[:max_replies]
for r in reply_els:
r_author = await (await r.query_selector(".hnuser")).inner_text() if await r.query_selector(".hnuser") else "unknown"
r_text = await (await r.query_selector(".comment-body")).inner_text() if await r.query_selector(".comment-body") else ""
replies.append({"author": r_author, "text": r_text.strip()})
comments.append({
"id": i,
"author": author,
"text": text,
"timestamp": time,
"replyCount": len(replies),
"replies": replies
})
except Exception as e:
continue
return comments
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
story_url = "https://news.ycombinator.com/item?id=12345678"
comments = await scrape_comments(page, story_url, max_comments=50, max_replies=3)
print(json.dumps(comments, indent=2))
await browser.close()
if __name__ == "__main__":
asyncio.run(main())
Step 3 — Deploy to Apify
apify login
apify actors push
Set pricing to Pay-per-result at $0.0005 per comment returned.
Step 4 — Test the Actor
apify actors call ebS02wB1m9aZkUWL5 \
--input '{"mode":"url","storyUrl":"https://news.ycombinator.com/item?id=12345678","maxComments":10}'
Live Actor
The Actor supports two modes:
- URL mode: Scrape comments from a specific HN story
- Search mode: Search HN by keyword, scrape top story comments
Conclusion
With ~50 lines of Python + Playwright, you can build a production scraper on Apify and start earning passive income. The platform handles hosting, scaling, and billing — you just write the code.
Top comments (0)