Reddit killed its free API in July 2023. What used to be a simple praw call now requires OAuth approval that takes weeks, rate limits that make bulk collection useless, and pricing that starts at $0.24 per 1,000 API calls.
But Reddit's data is still public. And there are still ways to collect it — legally, reliably, and at scale. Here's what actually works in 2026.
Method 1: Reddit's Hidden JSON Endpoints
This is the best-kept secret in web scraping. Reddit serves JSON for every single page. Just append .json to any URL:
https://www.reddit.com/r/technology/top.json?t=week&limit=25
No API key. No OAuth. No approval process. Just raw JSON.
Here's a working Python example:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Pagination works with the after parameter:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Limitations: Reddit rate-limits these endpoints aggressively. You'll get 429 errors after ~60 requests per minute from a single IP. For casual scraping, this is fine. For anything bigger, you need Method 2.
Method 2: Proxy Rotation for Scale
The JSON endpoint works — until Reddit recognizes your IP. The fix is rotating residential proxies.
ScraperAPI handles this automatically: proxy rotation, CAPTCHA solving, and retry logic in a single API call.
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
With ScraperAPI, you get:
- 40M+ residential IPs — Reddit can't block you
- Automatic retries on failures
- Geotargeting if you need location-specific results
- Free tier with 5,000 API credits to test
This is the move when you need 1,000+ posts or are scraping continuously.
If you'd rather manage proxies yourself, ThorData offers residential proxy access that works with any HTTP client — just set the proxy in your requests.get() call. Useful if you want more control over rotation logic or need proxies for multiple targets beyond Reddit.
Method 3: Pre-Built Scrapers (Zero Code)
If you don't want to write code at all, Apify's Reddit Scraper handles everything — pagination, rate limits, proxy rotation, structured output.
You configure it with a subreddit URL, set the number of posts, and it exports clean JSON or CSV. It's useful for one-off data collection, market research, or feeding data into an analysis pipeline.
You can also call it programmatically:
from apify_client import ApifyClient
client = ApifyClient("your_apify_token")
run = client.actor("cryptosignals/reddit-scraper").call(
run_input={
"startUrls": [{"url": "https://www.reddit.com/r/technology/"}],
"maxItems": 500,
"sort": "top",
"time": "month"
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], item["score"])
Complete Example: Monitor r/technology Daily
Here's a production-ready script that scrapes daily, deduplicates, and saves to CSV:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Run this with cron once a day and you've got a free Reddit monitoring pipeline.
Anti-Bot Tips
Reddit's anti-scraping has gotten smarter. Here's how to avoid detection:
1. Rotate User-Agents
import random
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101",
]
headers = {"User-Agent": random.choice(USER_AGENTS)}
2. Rate limit yourself — 1 request every 2 seconds minimum. Reddit tracks request patterns.
3. Respect 429s — Back off exponentially:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
4. Use sessions — requests.Session() reuses TCP connections and looks more like a real browser.
5. Don't scrape logged-in pages — Stick to public endpoints. Scraping behind auth violates Reddit's TOS.
When to Use Each Method
| Method | Best For | Cost | Scale |
|---|---|---|---|
| JSON endpoints | Side projects, research, <1K posts | Free | Low |
| ScraperAPI + proxies | Production pipelines, daily collection | ~$49/mo | High |
| Apify pre-built | One-off exports, non-developers | Pay per use | Medium |
My recommendation: Start with Method 1. It's free and handles most use cases. When you hit rate limits consistently, add ScraperAPI for proxy rotation. Only go to Apify if you need a managed solution.
Key Takeaways
- Reddit's
.jsonendpoints are still the easiest way to get structured data - Always rotate User-Agents and respect rate limits
- For scale, proxy rotation is non-negotiable
- Save yourself time — deduplicate with post IDs, not URLs
- Stick to public data. Don't scrape anything that requires login
The code in this article is tested and working as of March 2026. Reddit changes things periodically, so if something breaks, check the response format first — the field names occasionally shift.
Building a scraping pipeline? I write about Python automation, web scraping, and developer tools. Follow for more practical guides.
Top comments (0)