Scraping Reddit in 2026: Subreddits, Posts & Comments via Apify

#webdev #python #tutorial #dataengineering

Reddit is a goldmine for market research, trend analysis, and product feedback — but scraping it in 2026 is harder than it used to be. In this guide, I'll show you why the public JSON API breaks on most servers, the residential proxy workaround, and a ready-made Apify actor that handles it all.

Why Scrape Reddit?

Reddit hosts 50M+ daily active users discussing every topic imaginable. Common use cases:

Trend analysis — spot emerging topics before they go mainstream
Market research — understand what your target audience cares about
Product feedback — find unfiltered opinions about your product (or competitors')
Sentiment analysis — gauge community mood around brands, events, or releases
Competitor monitoring — track mentions and comparisons in relevant subreddits

Reddit's Public JSON API — And Why It Breaks

Reddit exposes a free, no-auth JSON API. Just append .json to any URL:

https://www.reddit.com/r/python/hot.json
https://www.reddit.com/r/webdev/search.json?q=scraping&sort=new
https://www.reddit.com/r/datascience/comments/abc123.json

This returns structured JSON with posts, scores, comments, flairs — everything you need.

The catch: Reddit blocks datacenter IPs. If you run this from AWS, GCP, Azure, or any cloud VPS, you'll get a 403 Forbidden response. Reddit fingerprints datacenter IP ranges and rejects automated requests from them.

This means your local machine works fine, but the moment you deploy a scraper to production — it breaks.

The Residential Proxy Solution

The fix is residential proxies — IP addresses assigned to real ISP customers. Reddit can't easily distinguish these from normal user traffic.

But managing residential proxies yourself is expensive and complex. You need:

A proxy provider subscription ($50–200+/month)
Rotation logic to avoid rate limits
Error handling for dead proxies
Session management for paginated requests

Or you can use a tool that bundles all of this.

Reddit Scraper on Apify — 4 Modes

I built a Reddit Scraper actor on Apify that handles the proxy layer, pagination, and data extraction. It has 4 modes:

1. Subreddit Mode

Scrape hot, new, or top posts from any subreddit.

Input:

{
  "mode": "subreddit",
  "subreddit": "python",
  "sort": "hot",
  "limit": 50
}

Returns: Post title, author, score, comment count, flair, URL, timestamp, selftext, and more.

2. Search Mode

Find all posts matching a keyword across Reddit or within a specific subreddit.

{
  "mode": "search",
  "query": "apify scraper",
  "subreddit": "webdev",
  "sort": "relevance",
  "limit": 25
}

Great for monitoring brand mentions or tracking discussions about a specific topic.

3. Comments Mode

Get an entire comment thread with nested replies.

{
  "mode": "comments",
  "postUrl": "https://www.reddit.com/r/python/comments/abc123/my_post/"
}

Returns the full comment tree — author, score, body, depth level, and reply chains.

4. User Profile Mode

Scrape a user's recent posts and comment history.

{
  "mode": "user-profile",
  "username": "spez",
  "limit": 100
}

Returns posts, comments, karma breakdown, and activity timeline.

Python Code Example

Here's how to run the actor programmatically with the apify-client package:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("cryptosignals/reddit-scraper").call(
    run_input={
        "mode": "subreddit",
        "subreddit": "python",
        "sort": "hot",
        "limit": 25,
    }
)

for post in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{post['score']:>5}  {post['title'][:80]}")

That's 10 lines to get structured Reddit data in Python. No proxies to configure, no 403 errors.

Install the client:

pip install apify-client

Use Case: Reddit Sentiment Dashboard

Here's a practical example — building a sentiment tracker for your product:

Schedule the actor to run daily with search mode, querying your product name
Export results to a dataset or webhook
Run sentiment analysis on post titles and comments (TextBlob, VADER, or an LLM)
Track over time — plot sentiment score by day to catch PR crises early

from textblob import TextBlob

for post in posts:
    sentiment = TextBlob(post["title"]).sentiment.polarity
    print(f"{sentiment:+.2f}  {post['title'][:60]}")

You can also:

Compare sentiment across competing products
Alert on sudden negative spikes
Track which subreddits mention you most
Find your most vocal advocates (and critics)

Pricing

The actor is $4.99/month (starting April 3, 2026) — that includes residential proxy usage. Compare that to $50–200/month for a standalone residential proxy subscription, plus the time spent building and maintaining your own scraper.

Try the Reddit Scraper on Apify →

Have questions or feature requests? Drop a comment below or open an issue on the actor's Apify page.