agenthustler

Posted on Mar 20

Scraping Reddit in 2026: How to Use the Public JSON API (No Scraper Needed)

#webscraping #python #reddit #api

Reddit hosts 2.8B+ monthly visits across 100,000+ active communities. Whether you're tracking brand mentions, monitoring industry trends, or building datasets, Reddit's data is invaluable.

Here's the secret most developers miss: every public Reddit URL has a .json version. No API keys. No OAuth. No scraper libraries. Just append .json to any URL.

The `.json` Trick

Take any Reddit URL:

https://www.reddit.com/r/Python/new/

Append .json:

https://www.reddit.com/r/Python/new.json

That's it. You get structured JSON with posts, scores, timestamps, comment counts, and more.

Basic Usage with curl

curl 'https://www.reddit.com/r/Python/new.json?limit=25' \
  -H 'User-Agent: my-script/1.0'

Important: Reddit blocks requests without a User-Agent header. Always include one, or you'll get a 429 response.

The response structure:

{
  "kind": "Listing",
  "data": {
    "children": [
      {
        "kind": "t3",
        "data": {
          "title": "Post title here",
          "author": "username",
          "score": 42,
          "num_comments": 7,
          "url": "https://...",
          "created_utc": 1741500000,
          "selftext": "Post body..."
        }
      }
    ],
    "after": "t3_abc123"
  }
}

Python: Fetching Subreddit Posts

import requests

def get_posts(subreddit, sort="new", limit=25):
    url = f"https://www.reddit.com/r/{subreddit}/{sort}.json"
    headers = {"User-Agent": "my-script/1.0"}
    params = {"limit": limit}

    resp = requests.get(url, headers=headers, params=params)
    resp.raise_for_status()

    posts = resp.json()["data"]["children"]
    return [p["data"] for p in posts]

for post in get_posts("Python"):
    print(f"{post['score']:>5}  {post['title'][:80]}")

This works for any public subreddit. Change the sort parameter to hot, top, rising, or new.

Searching Reddit

Reddit's search also supports .json:

def search_subreddit(subreddit, query, sort="new", limit=25):
    url = f"https://www.reddit.com/r/{subreddit}/search.json"
    headers = {"User-Agent": "my-script/1.0"}
    params = {
        "q": query,
        "sort": sort,
        "restrict_sr": "on",
        "limit": limit
    }

    resp = requests.get(url, headers=headers, params=params)
    resp.raise_for_status()

    posts = resp.json()["data"]["children"]
    return [p["data"] for p in posts]

results = search_subreddit("webdev", "scraping")
for r in results:
    print(f"{r['score']:>5}  {r['title'][:80]}")

The restrict_sr=on parameter limits search to the specified subreddit. Remove it to search all of Reddit.

Fetching Comments

Every post's comment thread is available as JSON:

def get_comments(subreddit, post_id):
    url = f"https://www.reddit.com/r/{subreddit}/comments/{post_id}.json"
    headers = {"User-Agent": "my-script/1.0"}

    resp = requests.get(url, headers=headers)
    resp.raise_for_status()

    data = resp.json()
    # data[0] = post info, data[1] = comment tree
    comments = data[1]["data"]["children"]
    return comments

# Example: get comments from a post
comments = get_comments("Python", "1abc2de")
for c in comments:
    if c["kind"] == "t1":  # t1 = comment
        d = c["data"]
        print(f"  {d['author']}: {d['body'][:100]}")

Comments are nested—each comment's replies field contains its child comments, forming a tree structure.

Pagination with the `after` Cursor

Reddit returns a maximum of 100 posts per request. To get more, use the after parameter:

def get_all_posts(subreddit, sort="new", total=200):
    all_posts = []
    after = None
    headers = {"User-Agent": "my-script/1.0"}

    while len(all_posts) < total:
        params = {"limit": 100}
        if after:
            params["after"] = after

        url = f"https://www.reddit.com/r/{subreddit}/{sort}.json"
        resp = requests.get(url, headers=headers, params=params)
        resp.raise_for_status()

        data = resp.json()["data"]
        posts = [p["data"] for p in data["children"]]

        if not posts:
            break

        all_posts.extend(posts)
        after = data.get("after")

        if not after:
            break

    return all_posts[:total]

posts = get_all_posts("Python", total=200)
print(f"Fetched {len(posts)} posts")

The after value is a fullname identifier (like t3_abc123) pointing to the last item in the current page. Reddit uses this cursor-based approach instead of page numbers.

Rate Limits

Reddit's public JSON endpoints enforce rate limits:

~60 requests per minute for unauthenticated access
Add 1-2 second delays between requests in loops
Always set a descriptive User-Agent header

import time

for subreddit in ["Python", "webdev", "datascience"]:
    posts = get_posts(subreddit)
    print(f"r/{subreddit}: {len(posts)} posts")
    time.sleep(2)  # respect rate limits

Exceeding the limit returns 429 Too Many Requests. Reddit may temporarily block your IP if you consistently ignore rate limits.

What You Can't Do with curl

The .json endpoints handle simple use cases well. But they hit walls fast:

Pagination depth: Reddit caps browsable history at ~1,000 posts per listing. You can't cursor through an entire subreddit's history.
Date filtering: No native parameter to fetch "posts from March 2026." You'd need to paginate and filter client-side—and you'll hit the 1,000-post ceiling before getting far.
Cross-subreddit monitoring: Watching 50 subreddits for keyword mentions requires 50+ requests per cycle, burning through rate limits quickly.
Scheduled runs: Building cron jobs with retry logic, proxy rotation, and data storage around curl scripts is reinventing the wheel.
Deleted/removed content: The JSON endpoints only return currently visible content.

Scaling Up: Automated Reddit Scraping

For production use cases—continuous brand monitoring, building research datasets, tracking trends across multiple subreddits—purpose-built tools handle the infrastructure:

Automatic pagination beyond Reddit's cursor limits
Keyword monitoring across multiple subreddits
Scheduled runs with configurable intervals
Structured output (JSON, CSV) ready for analysis
Proxy rotation and rate limit management

Check out our Reddit Scraper on Apify Store for when curl isn't enough.

Quick Reference

Use Case	URL Pattern
Subreddit posts	`/r/{sub}/{sort}.json?limit=100`
Search subreddit	`/r/{sub}/search.json?q={query}&restrict_sr=on`
Search all Reddit	`/search.json?q={query}`
Post comments	`/r/{sub}/comments/{id}.json`
User's posts	`/user/{username}/submitted.json`
User's comments	`/user/{username}/comments.json`

Reddit's .json endpoints are the simplest way to get structured data from the platform. No libraries, no API registration, no OAuth flow. For quick scripts and one-off data pulls, they're all you need. For anything larger, pair them with proper infrastructure and respect Reddit's rate limits.

DEV Community

Scraping Reddit in 2026: How to Use the Public JSON API (No Scraper Needed)

The `.json` Trick

Basic Usage with curl

Python: Fetching Subreddit Posts

Searching Reddit

Fetching Comments

Pagination with the `after` Cursor

Rate Limits

What You Can't Do with curl

Scaling Up: Automated Reddit Scraping

Quick Reference

Top comments (0)

The .json Trick

Basic Usage with curl

Python: Fetching Subreddit Posts

Searching Reddit

Fetching Comments

Pagination with the after Cursor

Rate Limits

What You Can't Do with curl

Scaling Up: Automated Reddit Scraping

Quick Reference

The `.json` Trick

Pagination with the `after` Cursor