DEV Community

Cover image for How to Scrape Reddit Data in 2026: Public JSON API vs Scrapers
agenthustler
agenthustler

Posted on

How to Scrape Reddit Data in 2026: Public JSON API vs Scrapers

Reddit is one of the richest sources of real-time opinions, trends, and discussions on the internet. Here's how to extract that data programmatically in 2026.

Method 1: Reddit's Public JSON API

Reddit exposes a JSON version of almost every page. Just append .json to any URL:

curl "https://www.reddit.com/r/programming/hot.json?limit=10" \
  -H "User-Agent: MyBot/1.0"
Enter fullscreen mode Exit fullscreen mode

This returns posts with title, score, author, URL, created timestamp, and comment count.

import requests

headers = {"User-Agent": "DataCollector/1.0"}
url = "https://www.reddit.com/r/webdev/hot.json?limit=25"
response = requests.get(url, headers=headers)
data = response.json()

for post in data["data"]["children"]:
    p = post["data"]
    print(f'{p["score"]:>5} | {p["title"][:60]}')
Enter fullscreen mode Exit fullscreen mode

Pros:

  • No API key needed
  • Works for any public subreddit
  • Returns structured JSON

Cons:

  • Aggressive rate limiting (1 request per 2 seconds recommended)
  • No search across multiple subreddits at once
  • Blocked by some corporate networks
  • User-Agent header required or you get 429 errors

Useful Endpoints

Endpoint Description
/r/{sub}/hot.json Hot posts
/r/{sub}/new.json Newest posts
/r/{sub}/top.json?t=week Top posts this week
/r/{sub}/search.json?q=term Search within subreddit
/r/{sub}/comments/{id}.json Post + all comments

Method 2: Reddit Scraper on Apify

For production use, the Reddit Scraper on Apify handles the hard parts:

  • Scrapes multiple subreddits in one run
  • Extracts full comment trees
  • Automatic rate limiting and retry logic
  • Proxy rotation to avoid blocks
  • Scheduled runs and webhook notifications
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('cryptosignals/reddit-scraper').call({
  subreddits: ['programming', 'webdev', 'javascript'],
  sort: 'hot',
  maxItems: 50,
  includeComments: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} posts with comments`);
Enter fullscreen mode Exit fullscreen mode

Pricing: Pay-per-use. Apify's free tier gives you 5 USD/month in credits — enough for daily scrapes of a few subreddits.

Method 3: ScraperAPI for Custom Scraping

If you're building your own scraper and need to handle Reddit's anti-bot protections, ScraperAPI handles proxies and CAPTCHAs:

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = f"http://api.scraperapi.com?api_key={API_KEY}&url=https://www.reddit.com/r/python/hot.json"

response = requests.get(url)
data = response.json()

for post in data["data"]["children"]:
    print(post["data"]["title"])
Enter fullscreen mode Exit fullscreen mode

ScraperAPI rotates IPs automatically and handles JavaScript rendering when needed. Works well when Reddit blocks direct requests from cloud servers.

Comparison

Feature JSON API Apify Actor ScraperAPI
Cost Free Pay-per-use From $49/mo
Setup None Apify account API key
Rate limits 1 req/2s Managed Managed
Multi-subreddit Manual Built-in Manual
Comments Yes (extra req) Built-in Manual
Proxies None Included Included
Scheduling DIY Built-in DIY

Practical Example: Subreddit Sentiment Tracker

Here's a Python script that tracks sentiment across subreddits using the free JSON API:

import requests
from collections import Counter

def get_hot_posts(subreddit, limit=25):
    url = f"https://www.reddit.com/r/{subreddit}/hot.json?limit={limit}"
    headers = {"User-Agent": "SentimentTracker/1.0"}
    r = requests.get(url, headers=headers)
    r.raise_for_status()
    return [
        {
            "title": p["data"]["title"],
            "score": p["data"]["score"],
            "comments": p["data"]["num_comments"],
            "url": f'https://reddit.com{p["data"]["permalink"]}'
        }
        for p in r.json()["data"]["children"]
    ]

subs = ["javascript", "python", "webdev"]
for sub in subs:
    posts = get_hot_posts(sub)
    avg_score = sum(p["score"] for p in posts) / len(posts)
    avg_comments = sum(p["comments"] for p in posts) / len(posts)
    print(f"r/{sub}: avg score={avg_score:.0f}, avg comments={avg_comments:.0f}")
Enter fullscreen mode Exit fullscreen mode

Which Approach to Use

Start with the JSON API for quick prototypes and one-off data pulls. It's free and requires zero setup.

Use the Apify Reddit Scraper when you need reliable scheduled scraping across multiple subreddits with comment extraction.

Add ScraperAPI when you're building custom scrapers that hit rate limits or get blocked by Reddit's anti-bot measures.

The free methods work great for development. When you hit scale, the paid tools pay for themselves in time saved.

Top comments (0)