Why my Reddit scraper went from 92% to 61% success rate in 30 days (and the one-line fix)

#webscraping #reddit #apify #showdev

Update (June 2026): this story now has a sequel. Reddit shut down the public .json endpoints entirely - universal 403, datacenter AND residential. The fix below stopped mattering overnight. The scraper survived by falling back to Reddit's RSS feeds (yes, RSS, like it's 2008) behind a circuit breaker, with degraded fields tagged honestly as source: rss-fallback. Lesson upgraded: the platform doesn't send you a deprecation notice - your failure-rate chart does.

Why my Reddit scraper went from 92% to 61% success rate in 30 days (and how I fixed it in one config flag)

I publish a small Reddit scraper actor on the Apify Store. It was my most-used actor: ~$30/mo in revenue, 70+ unique users per month, 92% success rate.

A month later, the success rate had collapsed to 61%. The actor was throwing on 40% of runs. Users stopped coming back. Revenue dropped to almost zero.

Here's what happened and the one-line fix that brought it back.

The symptom

Every failed run looked like this in the logs:

2026-05-12T05:59:31  HTTP 403 (proxy IP likely blocked). Rotating proxy and retrying (2/6)...
2026-05-12T05:59:34  HTTP 403 (proxy IP likely blocked). Rotating proxy and retrying (3/6)...
2026-05-12T05:59:37  HTTP 403 (proxy IP likely blocked). Rotating proxy and retrying (4/6)...
2026-05-12T05:59:39  HTTP 403 (proxy IP likely blocked). Rotating proxy and retrying (5/6)...
2026-05-12T05:59:42  HTTP 403 (proxy IP likely blocked). Rotating proxy and retrying (6/6)...
Failed r/SideProject: Reddit blocked request after 6 attempts (HTTP 403).

Every proxy IP I could rotate to was getting 403'd. All six retry attempts, every subreddit. The retries weren't broken - Reddit was just blocking the entire proxy pool.

Why this happened

Apify's residential proxy pool gets shared across all customers running scrapers. If a few popular Reddit scrapers run thousands of requests per hour, Reddit eventually fingerprints those IPs and blocks them at the edge.

I checked: my retry logic already rotated proxies on every retry, used a varied set of User-Agent strings, and respected rate-limit hints. None of it mattered. The proxies were the bottleneck and I couldn't get to fresh IPs because every proxy in the pool was already burned.

What surprised me was how fast it happened. 30 days from 92% ? 61%. No code change on my side. Just IP reputation decay against www.reddit.com.

The fix that worked

Change one hostname.

- const BASE = 'https://www.reddit.com';
+ const BASE = 'https://old.reddit.com';

old.reddit.com is the same JSON API. Same routes, same response shape. But its bot-detection thresholds are dramatically more permissive - apparently because most legit human traffic moved to the redesigned www.reddit.com years ago, so the old subdomain's load is lower and the WAF rules are looser.

After deploying the change to my actor (v0.1.18), the very next test run looked like this:

2026-05-17T14:57:43.069  Starting Reddit scraper: 1 subreddits...
2026-05-17T14:57:43.545  Fetching r/test page 1 (0 so far)...
2026-05-17T14:57:48.351  Done r/test: extracted 5 posts (saw 100 raw children).
2026-05-17T14:57:48.353  Reddit scraper finished. Pushed 5 items. 0 target(s) failed.

Zero retries. Zero 403s. Five posts in 4.8 seconds.

What else I added

While I was in the file, I also added:

Browser-like header set: Sec-Ch-Ua, Sec-Ch-Ua-Mobile, Sec-Ch-Ua-Platform, Sec-Fetch-Dest, Sec-Fetch-Mode, Sec-Fetch-Site, Accept-Encoding, Referer. Reddit fingerprints on these for any unusual combinations - anything missing is a tell.
300-900ms jitter before each request: prevents the pattern-matching that catches scrapers running at a perfectly steady rate.
Two more User-Agent strings (Firefox + Safari) on top of the three Chrome variants, so the UA-rotation doesn't degenerate into the same Chrome string 90% of the time.
Automatic fallback to www.reddit.com: if old.reddit.com returns 403 six times in a row, the actor switches the host and retries. (For now this never triggers, but it's there.)

The actor is back to 92%+ success rate and processing live runs as I write this.

Why I'm sharing this

Three reasons.

One: if you're running a scraper against Reddit (or any high-volume target), check whether you're using www when there's an old or m subdomain alternative. The WAF rules are often very different.

Two: in the proxy reputation game, you can't out-rotate a poisoned pool. The architectural fix (switch endpoint) beats the tactical fix (more retries) every time.

Three: I publish a public Apify scraper that pulls Reddit posts and comments. It's running on the fixed version right now. If you find it useful, a 30-second review on the Apify Store helps a solo developer compete with bigger publishers. I read every one.

I also do $9/mo curated Reddit digests by email if you'd rather not manage scraper runs yourself - stripe link here.

Renzo, solo dev. Building supabase-security, Apify actors, and other things at the intersection of "I should automate this" and "let me ship it as a product."