DEV Community

FairPrice
FairPrice

Posted on

I Analyzed 10,000 Hacker News Comments to Find What Makes a Post Go Viral

Last month, I scraped 10,000+ comments from Hacker News top stories to answer one question: what separates a 500-point post from a 5-point post?

Here's what the data revealed.

The Dataset

I used the HN Top Stories scraper on Apify to collect structured data from the front page over several weeks — titles, scores, comment counts, domains, and timestamps.

Quick setup:

from apify_client import ApifyClient

client = ApifyClient("YOUR_TOKEN")
run = client.actor("cryptosignals/hn-top-stories").call(
    run_input={"maxItems": 500}
)

stories = list(client.dataset(run["defaultDatasetId"]).iterate_items())
Enter fullscreen mode Exit fullscreen mode

That gives you clean JSON with scores, comment counts, timestamps, and URLs — no BeautifulSoup required.

Finding #1: Comment Count Predicts Virality Better Than Score

I expected upvotes to be the key metric. Wrong.

Posts with 200+ comments had an average score of 487, while posts with 200+ upvotes but fewer than 50 comments averaged only 243.

Comments drive engagement loops. A controversial title gets people arguing, which pushes the post higher, which attracts more commenters. Score alone doesn't capture this.

Finding #2: The "Show HN" Advantage Is Real

Show HN posts that hit the front page had 2.3x more comments than regular posts at the same score level. The HN community rewards builders — but only if your project is genuinely useful.

The highest-performing Show HN posts shared three traits:

  • Solved a specific, common pain point
  • Had a live demo link
  • Were solo/small-team projects (not corporate launches)

Finding #3: Timing Matters Less Than You Think

Everyone says "post at 6am PT." The data tells a different story:

Time Window (PT) Avg Score Avg Comments
6-9 AM 142 67
9-12 PM 138 71
12-3 PM 127 63
6-9 PM 131 59

The difference between the best and worst window is only ~10%. Content quality dominates timing.

Finding #4: Title Length Sweet Spot

Posts with titles between 8-12 words scored 40% higher on average than those outside this range. Too short lacks context. Too long gets ignored.

The highest-scoring title pattern: "[Action verb] + [specific thing] + [surprising result]"

Examples: "I reverse-engineered the Spotify algorithm", "Why we moved from React to plain HTML"

Try It Yourself

The full dataset pipeline:

import pandas as pd
from apify_client import ApifyClient

client = ApifyClient("YOUR_TOKEN")
run = client.actor("cryptosignals/hn-top-stories").call(
    run_input={"maxItems": 1000}
)

df = pd.DataFrame(client.dataset(run["defaultDatasetId"]).iterate_items())

# Score vs comments correlation
print(f"Correlation: {df['score'].corr(df['commentCount']):.2f}")

# Best performing domains
print(df.groupby('domain')['score'].mean().sort_values(ascending=False).head(10))
Enter fullscreen mode Exit fullscreen mode

You can run this for free on Apify's free tier (no credit card).

Get the HN scraper here — it returns structured JSON, handles pagination, and costs fractions of a cent per run.


What patterns have you noticed on HN? Drop a comment — I'd love to compare notes.

Top comments (0)