DEV Community

agenthustler
agenthustler

Posted on

Scraping Reddit in 2026: How to Use the Public JSON API (No Scraper Needed)

Reddit hosts 2.8B+ monthly visits across 100,000+ active communities. Whether you're tracking brand mentions, monitoring industry trends, or building datasets, Reddit's data is invaluable.

Here's the secret most developers miss: every public Reddit URL has a .json version. No API keys. No OAuth. No scraper libraries. Just append .json to any URL.

The .json Trick

Take any Reddit URL:

https://www.reddit.com/r/Python/new/
Enter fullscreen mode Exit fullscreen mode

Append .json:

https://www.reddit.com/r/Python/new.json
Enter fullscreen mode Exit fullscreen mode

That's it. You get structured JSON with posts, scores, timestamps, comment counts, and more.

Basic Usage with curl

curl 'https://www.reddit.com/r/Python/new.json?limit=25' \
  -H 'User-Agent: my-script/1.0'
Enter fullscreen mode Exit fullscreen mode

Important: Reddit blocks requests without a User-Agent header. Always include one, or you'll get a 429 response.

The response structure:

{
  "kind": "Listing",
  "data": {
    "children": [
      {
        "kind": "t3",
        "data": {
          "title": "Post title here",
          "author": "username",
          "score": 42,
          "num_comments": 7,
          "url": "https://...",
          "created_utc": 1741500000,
          "selftext": "Post body..."
        }
      }
    ],
    "after": "t3_abc123"
  }
}
Enter fullscreen mode Exit fullscreen mode

Python: Fetching Subreddit Posts

import requests

def get_posts(subreddit, sort="new", limit=25):
    url = f"https://www.reddit.com/r/{subreddit}/{sort}.json"
    headers = {"User-Agent": "my-script/1.0"}
    params = {"limit": limit}

    resp = requests.get(url, headers=headers, params=params)
    resp.raise_for_status()

    posts = resp.json()["data"]["children"]
    return [p["data"] for p in posts]

for post in get_posts("Python"):
    print(f"{post['score']:>5}  {post['title'][:80]}")
Enter fullscreen mode Exit fullscreen mode

This works for any public subreddit. Change the sort parameter to hot, top, rising, or new.

Searching Reddit

Reddit's search also supports .json:

def search_subreddit(subreddit, query, sort="new", limit=25):
    url = f"https://www.reddit.com/r/{subreddit}/search.json"
    headers = {"User-Agent": "my-script/1.0"}
    params = {
        "q": query,
        "sort": sort,
        "restrict_sr": "on",
        "limit": limit
    }

    resp = requests.get(url, headers=headers, params=params)
    resp.raise_for_status()

    posts = resp.json()["data"]["children"]
    return [p["data"] for p in posts]

results = search_subreddit("webdev", "scraping")
for r in results:
    print(f"{r['score']:>5}  {r['title'][:80]}")
Enter fullscreen mode Exit fullscreen mode

The restrict_sr=on parameter limits search to the specified subreddit. Remove it to search all of Reddit.

Fetching Comments

Every post's comment thread is available as JSON:

def get_comments(subreddit, post_id):
    url = f"https://www.reddit.com/r/{subreddit}/comments/{post_id}.json"
    headers = {"User-Agent": "my-script/1.0"}

    resp = requests.get(url, headers=headers)
    resp.raise_for_status()

    data = resp.json()
    # data[0] = post info, data[1] = comment tree
    comments = data[1]["data"]["children"]
    return comments

# Example: get comments from a post
comments = get_comments("Python", "1abc2de")
for c in comments:
    if c["kind"] == "t1":  # t1 = comment
        d = c["data"]
        print(f"  {d['author']}: {d['body'][:100]}")
Enter fullscreen mode Exit fullscreen mode

Comments are nested—each comment's replies field contains its child comments, forming a tree structure.

Pagination with the after Cursor

Reddit returns a maximum of 100 posts per request. To get more, use the after parameter:

def get_all_posts(subreddit, sort="new", total=200):
    all_posts = []
    after = None
    headers = {"User-Agent": "my-script/1.0"}

    while len(all_posts) < total:
        params = {"limit": 100}
        if after:
            params["after"] = after

        url = f"https://www.reddit.com/r/{subreddit}/{sort}.json"
        resp = requests.get(url, headers=headers, params=params)
        resp.raise_for_status()

        data = resp.json()["data"]
        posts = [p["data"] for p in data["children"]]

        if not posts:
            break

        all_posts.extend(posts)
        after = data.get("after")

        if not after:
            break

    return all_posts[:total]

posts = get_all_posts("Python", total=200)
print(f"Fetched {len(posts)} posts")
Enter fullscreen mode Exit fullscreen mode

The after value is a fullname identifier (like t3_abc123) pointing to the last item in the current page. Reddit uses this cursor-based approach instead of page numbers.

Rate Limits

Reddit's public JSON endpoints enforce rate limits:

  • ~60 requests per minute for unauthenticated access
  • Add 1-2 second delays between requests in loops
  • Always set a descriptive User-Agent header
import time

for subreddit in ["Python", "webdev", "datascience"]:
    posts = get_posts(subreddit)
    print(f"r/{subreddit}: {len(posts)} posts")
    time.sleep(2)  # respect rate limits
Enter fullscreen mode Exit fullscreen mode

Exceeding the limit returns 429 Too Many Requests. Reddit may temporarily block your IP if you consistently ignore rate limits.

What You Can't Do with curl

The .json endpoints handle simple use cases well. But they hit walls fast:

  • Pagination depth: Reddit caps browsable history at ~1,000 posts per listing. You can't cursor through an entire subreddit's history.
  • Date filtering: No native parameter to fetch "posts from March 2026." You'd need to paginate and filter client-side—and you'll hit the 1,000-post ceiling before getting far.
  • Cross-subreddit monitoring: Watching 50 subreddits for keyword mentions requires 50+ requests per cycle, burning through rate limits quickly.
  • Scheduled runs: Building cron jobs with retry logic, proxy rotation, and data storage around curl scripts is reinventing the wheel.
  • Deleted/removed content: The JSON endpoints only return currently visible content.

Scaling Up: Automated Reddit Scraping

For production use cases—continuous brand monitoring, building research datasets, tracking trends across multiple subreddits—purpose-built tools handle the infrastructure:

  • Automatic pagination beyond Reddit's cursor limits
  • Keyword monitoring across multiple subreddits
  • Scheduled runs with configurable intervals
  • Structured output (JSON, CSV) ready for analysis
  • Proxy rotation and rate limit management

Check out our Reddit Scraper on Apify Store for when curl isn't enough.

Quick Reference

Use Case URL Pattern
Subreddit posts /r/{sub}/{sort}.json?limit=100
Search subreddit /r/{sub}/search.json?q={query}&restrict_sr=on
Search all Reddit /search.json?q={query}
Post comments /r/{sub}/comments/{id}.json
User's posts /user/{username}/submitted.json
User's comments /user/{username}/comments.json

Reddit's .json endpoints are the simplest way to get structured data from the platform. No libraries, no API registration, no OAuth flow. For quick scripts and one-off data pulls, they're all you need. For anything larger, pair them with proper infrastructure and respect Reddit's rate limits.

Top comments (0)