agenthustler

Posted on Mar 20

How to Scrape Reddit Data in 2026: Public JSON API vs Scrapers

#javascript #python #webdev #reddit

Reddit is one of the richest sources of real-time opinions, trends, and discussions on the internet. Here's how to extract that data programmatically in 2026.

Method 1: Reddit's Public JSON API

Reddit exposes a JSON version of almost every page. Just append .json to any URL:

curl "https://www.reddit.com/r/programming/hot.json?limit=10" \
  -H "User-Agent: MyBot/1.0"

This returns posts with title, score, author, URL, created timestamp, and comment count.

import requests

headers = {"User-Agent": "DataCollector/1.0"}
url = "https://www.reddit.com/r/webdev/hot.json?limit=25"
response = requests.get(url, headers=headers)
data = response.json()

for post in data["data"]["children"]:
    p = post["data"]
    print(f'{p["score"]:>5} | {p["title"][:60]}')

Pros:

No API key needed
Works for any public subreddit
Returns structured JSON

Cons:

Aggressive rate limiting (1 request per 2 seconds recommended)
No search across multiple subreddits at once
Blocked by some corporate networks
User-Agent header required or you get 429 errors

Useful Endpoints

Endpoint	Description
`/r/{sub}/hot.json`	Hot posts
`/r/{sub}/new.json`	Newest posts
`/r/{sub}/top.json?t=week`	Top posts this week
`/r/{sub}/search.json?q=term`	Search within subreddit
`/r/{sub}/comments/{id}.json`	Post + all comments

Method 2: Reddit Scraper on Apify

For production use, the Reddit Scraper on Apify handles the hard parts:

Scrapes multiple subreddits in one run
Extracts full comment trees
Automatic rate limiting and retry logic
Proxy rotation to avoid blocks
Scheduled runs and webhook notifications

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('cryptosignals/reddit-scraper').call({
  subreddits: ['programming', 'webdev', 'javascript'],
  sort: 'hot',
  maxItems: 50,
  includeComments: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} posts with comments`);

Pricing: Pay-per-use. Apify's free tier gives you 5 USD/month in credits — enough for daily scrapes of a few subreddits.

Method 3: ScraperAPI for Custom Scraping

If you're building your own scraper and need to handle Reddit's anti-bot protections, ScraperAPI handles proxies and CAPTCHAs:

import requests

API_KEY = "YOUR_SCRAPERAPI_KEY"
url = f"http://api.scraperapi.com?api_key={API_KEY}&url=https://www.reddit.com/r/python/hot.json"

response = requests.get(url)
data = response.json()

for post in data["data"]["children"]:
    print(post["data"]["title"])

ScraperAPI rotates IPs automatically and handles JavaScript rendering when needed. Works well when Reddit blocks direct requests from cloud servers.

Comparison

Feature	JSON API	Apify Actor	ScraperAPI
Cost	Free	Pay-per-use	From $49/mo
Setup	None	Apify account	API key
Rate limits	1 req/2s	Managed	Managed
Multi-subreddit	Manual	Built-in	Manual
Comments	Yes (extra req)	Built-in	Manual
Proxies	None	Included	Included
Scheduling	DIY	Built-in	DIY

Practical Example: Subreddit Sentiment Tracker

Here's a Python script that tracks sentiment across subreddits using the free JSON API:

import requests
from collections import Counter

def get_hot_posts(subreddit, limit=25):
    url = f"https://www.reddit.com/r/{subreddit}/hot.json?limit={limit}"
    headers = {"User-Agent": "SentimentTracker/1.0"}
    r = requests.get(url, headers=headers)
    r.raise_for_status()
    return [
        {
            "title": p["data"]["title"],
            "score": p["data"]["score"],
            "comments": p["data"]["num_comments"],
            "url": f'https://reddit.com{p["data"]["permalink"]}'
        }
        for p in r.json()["data"]["children"]
    ]

subs = ["javascript", "python", "webdev"]
for sub in subs:
    posts = get_hot_posts(sub)
    avg_score = sum(p["score"] for p in posts) / len(posts)
    avg_comments = sum(p["comments"] for p in posts) / len(posts)
    print(f"r/{sub}: avg score={avg_score:.0f}, avg comments={avg_comments:.0f}")

Which Approach to Use

Start with the JSON API for quick prototypes and one-off data pulls. It's free and requires zero setup.

Use the Apify Reddit Scraper when you need reliable scheduled scraping across multiple subreddits with comment extraction.

Add ScraperAPI when you're building custom scrapers that hit rate limits or get blocked by Reddit's anti-bot measures.

The free methods work great for development. When you hit scale, the paid tools pay for themselves in time saved.

DEV Community