Reddit hosts 2.8B+ monthly visits across 100,000+ active communities. Whether you're tracking brand mentions, monitoring industry trends, or building datasets, Reddit's data is invaluable.
Here's the secret most developers miss: every public Reddit URL has a .json version. No API keys. No OAuth. No scraper libraries. Just append .json to any URL.
The .json Trick
Take any Reddit URL:
https://www.reddit.com/r/Python/new/
Append .json:
https://www.reddit.com/r/Python/new.json
That's it. You get structured JSON with posts, scores, timestamps, comment counts, and more.
Basic Usage with curl
curl 'https://www.reddit.com/r/Python/new.json?limit=25' \
-H 'User-Agent: my-script/1.0'
Important: Reddit blocks requests without a User-Agent header. Always include one, or you'll get a 429 response.
The response structure:
{
"kind": "Listing",
"data": {
"children": [
{
"kind": "t3",
"data": {
"title": "Post title here",
"author": "username",
"score": 42,
"num_comments": 7,
"url": "https://...",
"created_utc": 1741500000,
"selftext": "Post body..."
}
}
],
"after": "t3_abc123"
}
}
Python: Fetching Subreddit Posts
import requests
def get_posts(subreddit, sort="new", limit=25):
url = f"https://www.reddit.com/r/{subreddit}/{sort}.json"
headers = {"User-Agent": "my-script/1.0"}
params = {"limit": limit}
resp = requests.get(url, headers=headers, params=params)
resp.raise_for_status()
posts = resp.json()["data"]["children"]
return [p["data"] for p in posts]
for post in get_posts("Python"):
print(f"{post['score']:>5} {post['title'][:80]}")
This works for any public subreddit. Change the sort parameter to hot, top, rising, or new.
Searching Reddit
Reddit's search also supports .json:
def search_subreddit(subreddit, query, sort="new", limit=25):
url = f"https://www.reddit.com/r/{subreddit}/search.json"
headers = {"User-Agent": "my-script/1.0"}
params = {
"q": query,
"sort": sort,
"restrict_sr": "on",
"limit": limit
}
resp = requests.get(url, headers=headers, params=params)
resp.raise_for_status()
posts = resp.json()["data"]["children"]
return [p["data"] for p in posts]
results = search_subreddit("webdev", "scraping")
for r in results:
print(f"{r['score']:>5} {r['title'][:80]}")
The restrict_sr=on parameter limits search to the specified subreddit. Remove it to search all of Reddit.
Fetching Comments
Every post's comment thread is available as JSON:
def get_comments(subreddit, post_id):
url = f"https://www.reddit.com/r/{subreddit}/comments/{post_id}.json"
headers = {"User-Agent": "my-script/1.0"}
resp = requests.get(url, headers=headers)
resp.raise_for_status()
data = resp.json()
# data[0] = post info, data[1] = comment tree
comments = data[1]["data"]["children"]
return comments
# Example: get comments from a post
comments = get_comments("Python", "1abc2de")
for c in comments:
if c["kind"] == "t1": # t1 = comment
d = c["data"]
print(f" {d['author']}: {d['body'][:100]}")
Comments are nested—each comment's replies field contains its child comments, forming a tree structure.
Pagination with the after Cursor
Reddit returns a maximum of 100 posts per request. To get more, use the after parameter:
def get_all_posts(subreddit, sort="new", total=200):
all_posts = []
after = None
headers = {"User-Agent": "my-script/1.0"}
while len(all_posts) < total:
params = {"limit": 100}
if after:
params["after"] = after
url = f"https://www.reddit.com/r/{subreddit}/{sort}.json"
resp = requests.get(url, headers=headers, params=params)
resp.raise_for_status()
data = resp.json()["data"]
posts = [p["data"] for p in data["children"]]
if not posts:
break
all_posts.extend(posts)
after = data.get("after")
if not after:
break
return all_posts[:total]
posts = get_all_posts("Python", total=200)
print(f"Fetched {len(posts)} posts")
The after value is a fullname identifier (like t3_abc123) pointing to the last item in the current page. Reddit uses this cursor-based approach instead of page numbers.
Rate Limits
Reddit's public JSON endpoints enforce rate limits:
- ~60 requests per minute for unauthenticated access
- Add 1-2 second delays between requests in loops
- Always set a descriptive
User-Agentheader
import time
for subreddit in ["Python", "webdev", "datascience"]:
posts = get_posts(subreddit)
print(f"r/{subreddit}: {len(posts)} posts")
time.sleep(2) # respect rate limits
Exceeding the limit returns 429 Too Many Requests. Reddit may temporarily block your IP if you consistently ignore rate limits.
What You Can't Do with curl
The .json endpoints handle simple use cases well. But they hit walls fast:
- Pagination depth: Reddit caps browsable history at ~1,000 posts per listing. You can't cursor through an entire subreddit's history.
- Date filtering: No native parameter to fetch "posts from March 2026." You'd need to paginate and filter client-side—and you'll hit the 1,000-post ceiling before getting far.
- Cross-subreddit monitoring: Watching 50 subreddits for keyword mentions requires 50+ requests per cycle, burning through rate limits quickly.
- Scheduled runs: Building cron jobs with retry logic, proxy rotation, and data storage around curl scripts is reinventing the wheel.
- Deleted/removed content: The JSON endpoints only return currently visible content.
Scaling Up: Automated Reddit Scraping
For production use cases—continuous brand monitoring, building research datasets, tracking trends across multiple subreddits—purpose-built tools handle the infrastructure:
- Automatic pagination beyond Reddit's cursor limits
- Keyword monitoring across multiple subreddits
- Scheduled runs with configurable intervals
- Structured output (JSON, CSV) ready for analysis
- Proxy rotation and rate limit management
Check out our Reddit Scraper on Apify Store for when curl isn't enough.
Quick Reference
| Use Case | URL Pattern |
|---|---|
| Subreddit posts | /r/{sub}/{sort}.json?limit=100 |
| Search subreddit | /r/{sub}/search.json?q={query}&restrict_sr=on |
| Search all Reddit | /search.json?q={query} |
| Post comments | /r/{sub}/comments/{id}.json |
| User's posts | /user/{username}/submitted.json |
| User's comments | /user/{username}/comments.json |
Reddit's .json endpoints are the simplest way to get structured data from the platform. No libraries, no API registration, no OAuth flow. For quick scripts and one-off data pulls, they're all you need. For anything larger, pair them with proper infrastructure and respect Reddit's rate limits.
Top comments (0)