DEV Community

agenthustler
agenthustler

Posted on

Scraping Reddit in 2026: Subreddits, Posts & Comments via Apify

Reddit is a goldmine for market research, trend analysis, and product feedback — but scraping it in 2026 is harder than it used to be. In this guide, I'll show you why the public JSON API breaks on most servers, the residential proxy workaround, and a ready-made Apify actor that handles it all.

Why Scrape Reddit?

Reddit hosts 50M+ daily active users discussing every topic imaginable. Common use cases:

  • Trend analysis — spot emerging topics before they go mainstream
  • Market research — understand what your target audience cares about
  • Product feedback — find unfiltered opinions about your product (or competitors')
  • Sentiment analysis — gauge community mood around brands, events, or releases
  • Competitor monitoring — track mentions and comparisons in relevant subreddits

Reddit's Public JSON API — And Why It Breaks

Reddit exposes a free, no-auth JSON API. Just append .json to any URL:

https://www.reddit.com/r/python/hot.json
https://www.reddit.com/r/webdev/search.json?q=scraping&sort=new
https://www.reddit.com/r/datascience/comments/abc123.json
Enter fullscreen mode Exit fullscreen mode

This returns structured JSON with posts, scores, comments, flairs — everything you need.

The catch: Reddit blocks datacenter IPs. If you run this from AWS, GCP, Azure, or any cloud VPS, you'll get a 403 Forbidden response. Reddit fingerprints datacenter IP ranges and rejects automated requests from them.

This means your local machine works fine, but the moment you deploy a scraper to production — it breaks.

The Residential Proxy Solution

The fix is residential proxies — IP addresses assigned to real ISP customers. Reddit can't easily distinguish these from normal user traffic.

But managing residential proxies yourself is expensive and complex. You need:

  • A proxy provider subscription ($50–200+/month)
  • Rotation logic to avoid rate limits
  • Error handling for dead proxies
  • Session management for paginated requests

Or you can use a tool that bundles all of this.

Reddit Scraper on Apify — 4 Modes

I built a Reddit Scraper actor on Apify that handles the proxy layer, pagination, and data extraction. It has 4 modes:

1. Subreddit Mode

Scrape hot, new, or top posts from any subreddit.

Input:

{
  "mode": "subreddit",
  "subreddit": "python",
  "sort": "hot",
  "limit": 50
}
Enter fullscreen mode Exit fullscreen mode

Returns: Post title, author, score, comment count, flair, URL, timestamp, selftext, and more.

2. Search Mode

Find all posts matching a keyword across Reddit or within a specific subreddit.

{
  "mode": "search",
  "query": "apify scraper",
  "subreddit": "webdev",
  "sort": "relevance",
  "limit": 25
}
Enter fullscreen mode Exit fullscreen mode

Great for monitoring brand mentions or tracking discussions about a specific topic.

3. Comments Mode

Get an entire comment thread with nested replies.

{
  "mode": "comments",
  "postUrl": "https://www.reddit.com/r/python/comments/abc123/my_post/"
}
Enter fullscreen mode Exit fullscreen mode

Returns the full comment tree — author, score, body, depth level, and reply chains.

4. User Profile Mode

Scrape a user's recent posts and comment history.

{
  "mode": "user-profile",
  "username": "spez",
  "limit": 100
}
Enter fullscreen mode Exit fullscreen mode

Returns posts, comments, karma breakdown, and activity timeline.

Python Code Example

Here's how to run the actor programmatically with the apify-client package:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("cryptosignals/reddit-scraper").call(
    run_input={
        "mode": "subreddit",
        "subreddit": "python",
        "sort": "hot",
        "limit": 25,
    }
)

for post in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{post['score']:>5}  {post['title'][:80]}")
Enter fullscreen mode Exit fullscreen mode

That's 10 lines to get structured Reddit data in Python. No proxies to configure, no 403 errors.

Install the client:

pip install apify-client
Enter fullscreen mode Exit fullscreen mode

Use Case: Reddit Sentiment Dashboard

Here's a practical example — building a sentiment tracker for your product:

  1. Schedule the actor to run daily with search mode, querying your product name
  2. Export results to a dataset or webhook
  3. Run sentiment analysis on post titles and comments (TextBlob, VADER, or an LLM)
  4. Track over time — plot sentiment score by day to catch PR crises early
from textblob import TextBlob

for post in posts:
    sentiment = TextBlob(post["title"]).sentiment.polarity
    print(f"{sentiment:+.2f}  {post['title'][:60]}")
Enter fullscreen mode Exit fullscreen mode

You can also:

  • Compare sentiment across competing products
  • Alert on sudden negative spikes
  • Track which subreddits mention you most
  • Find your most vocal advocates (and critics)

Pricing

The actor is $4.99/month (starting April 3, 2026) — that includes residential proxy usage. Compare that to $50–200/month for a standalone residential proxy subscription, plus the time spent building and maintaining your own scraper.

Try the Reddit Scraper on Apify →


Have questions or feature requests? Drop a comment below or open an issue on the actor's Apify page.

Top comments (0)