DEV Community

muskert
muskert

Posted on

How to Scrape Hacker News Comments in 2026 — Free API

How to Scrape Hacker News Comments in 2026 — Free API

Level: Beginner to Intermediate

Stack: Python + Playwright + Apify Actor

Time: ~20 minutes

Cost: Free to develop, $0.0005 per comment on Apify


Why Scrape Hacker News Comments?

Hacker News is a goldmine of developer discussions, startup ideas, and technical debates. Building an app that taps into this conversation requires clean data — not screenscraped HTML chaos.

In this tutorial, I'll show you how to build a production-ready HN Comment Scraper using Apify and Playwright, then publish it to the Apify Store so anyone can use it.


What We're Building

A reusable Apify Actor that:

  1. Accepts a HN story URL → scrapes all comments
  2. OR accepts a keyword → searches HN Algolia → scrapes top stories' comments
  3. Returns structured JSON with author, text, timestamp, depth, replies

Step 1 — Project Setup

npm install -g apify-cli
apify init hackernews-comment-scraper
cd hackernews-comment-scraper
mkdir -p src
Enter fullscreen mode Exit fullscreen mode

Step 2 — Write the Scraper (src/main.py)

import asyncio
import json
from urllib.parse import quote

try:
    from playwright.async_api import async_playwright
except ImportError:
    import subprocess
    subprocess.check_call(["pip", "install", "playwright", "--quiet"])
    subprocess.check_call(["playwright", "install", "chromium", "--with-deps"])
    from playwright.async_api import async_playwright


async def scrape_comments(page, url, max_comments=50, max_replies=5):
    await page.goto(url, wait_until="domcontentloaded", timeout=30000)
    await page.wait_for_selector(".comment", timeout=15000)

    comments = []
    comment_elements = await page.query_selector_all(".comment")

    for i, el in enumerate(comment_elements[:max_comments]):
        try:
            author_el = await el.query_selector(".hnuser")
            author = await author_el.inner_text() if author_el else "unknown"

            text_el = await el.query_selector(".comment-body")
            text = await text_el.inner_text() if text_el else ""
            text = text.strip()

            time_el = await el.query_selector(".age")
            time = await time_el.get_attribute("title") if time_el else ""

            replies = []
            if max_replies > 0:
                reply_els = await el.query_selector_all(".comment")[:max_replies]
                for r in reply_els:
                    r_author = await (await r.query_selector(".hnuser")).inner_text() if await r.query_selector(".hnuser") else "unknown"
                    r_text = await (await r.query_selector(".comment-body")).inner_text() if await r.query_selector(".comment-body") else ""
                    replies.append({"author": r_author, "text": r_text.strip()})

            comments.append({
                "id": i,
                "author": author,
                "text": text,
                "timestamp": time,
                "replyCount": len(replies),
                "replies": replies
            })
        except Exception as e:
            continue

    return comments


async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()

        story_url = "https://news.ycombinator.com/item?id=12345678"
        comments = await scrape_comments(page, story_url, max_comments=50, max_replies=3)

        print(json.dumps(comments, indent=2))

        await browser.close()


if __name__ == "__main__":
    asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Step 3 — Deploy to Apify

apify login
apify actors push
Enter fullscreen mode Exit fullscreen mode

Set pricing to Pay-per-result at $0.0005 per comment returned.


Step 4 — Test the Actor

apify actors call ebS02wB1m9aZkUWL5 \
  --input '{"mode":"url","storyUrl":"https://news.ycombinator.com/item?id=12345678","maxComments":10}'
Enter fullscreen mode Exit fullscreen mode

Live Actor

Try it on Apify Store →

The Actor supports two modes:

  • URL mode: Scrape comments from a specific HN story
  • Search mode: Search HN by keyword, scrape top story comments

Conclusion

With ~50 lines of Python + Playwright, you can build a production scraper on Apify and start earning passive income. The platform handles hosting, scaling, and billing — you just write the code.

Top comments (0)