Most Reddit scrapers parse HTML and break on every redesign. But Reddit has a public JSON API hiding in plain sight.
The Secret: Just Add .json
Append .json to any Reddit URL:
https://www.reddit.com/r/programming/hot.json
https://www.reddit.com/search.json?q=web+scraping
https://www.reddit.com/r/programming/comments/abc123/title.json
What You Get
Structured JSON with 20+ fields per post:
- title, author, score, upvote_ratio
- num_comments, flair, awards
- selftext (full post body)
- url, domain, is_video, thumbnail
- created_utc, permalink
Plus full comment trees with nested replies.
Why It's Better
- Never breaks on redesigns — JSON API is separate from the UI
- Complete data — fields not visible in the UI
- Faster — JSON is lighter than HTML
-
Pagination — use
afterparameter
Caveats
- Need proper User-Agent header
- Rate limit: don't exceed 1 req/sec
- Cloud scraping needs residential proxy (Reddit blocks datacenter IPs)
I built a Reddit scraper based on this approach — free on Apify Store (search knotless_cadence). But the API is simple enough to use directly.
Top comments (0)