DEV Community

Short Play Skits
Short Play Skits

Posted on

Why I ditched cloud scrapers and built a local-first Reddit tool

So I've been doing Reddit marketing for my SaaS for about a year now.

The strategy is pretty simple: find posts in relevant subreddits, write helpful comments, occasionally mention my product when it makes sense. Nothing groundbreaking. Just consistent presence.

The problem? Finding those posts was killing me.

The cloud scraper trap

I tried a bunch of cloud-based tools. Monitoring services. Browser extensions that phone home to some server. Even some Python scripts running on my VPS.

They all had the same problem: Reddit blocks server IPs.

Like, aggressively. My VPS got blocked within 5 minutes of running a simple scraper. Tried rotating proxies. Tried residential IPs. Reddit kept catching on.

Every few weeks I'd get emails from my monitoring tool saying "we're experiencing issues with Reddit." Yeah no kidding.

The obvious solution

A friend said something offhand that stuck with me:

"Why don't you just run it on your computer?"

I had objections. Distribution is harder. Can't do recurring billing easily. No usage tracking.

But here's the thing: if my app runs from my laptop, Reddit sees my home IP. Just a normal person browsing. No detection to evade. It just works.

What I built

Python + PyQt6. Desktop app. SQLite for storage. Reddit Toolbox

The core is embarrassingly simple:

import requests
import sqlite3

def scrape_subreddit(name, limit=100):
    url = f"https://reddit.com/r/{name}.json?limit={limit}"

    # That's it. Just a GET request from user's IP.
    response = requests.get(url, headers={
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
    })

    if response.status_code == 200:
        return response.json()['data']['children']
    else:
        # Fallback to RSS if JSON is blocked
        return scrape_via_rss(name)
Enter fullscreen mode Exit fullscreen mode

The RSS fallback is key. Sometimes Reddit blocks JSON for certain patterns but leaves RSS open. Having both means it rarely fails completely.

The features that actually matter

After using it daily for a month, here's what I actually use:

1. Batch scraping with filters

Paste 5 subreddit names, scrape 200 posts each in 10 seconds. Filter by:

  • Max comment count (I set 8 - anything more is too late)
  • Min score (filter out downvoted stuff)

2. Right-click AI replies

Not for copy-pasting. Just to get a starting draft. I always rewrite heavily.

3. User analysis

Before DMing someone, I check their history. Account age, karma, active subreddits. Quick sanity check.

The monetization question

With web apps you control everything. Logins. Feature gates. Server-side limits.

With a desktop app? User has the binary. They can do whatever.

I thought about DRM. License keys. Hardware fingerprinting.

Then I realized: the kind of person who would crack a $15/mo tool was never going to pay anyway.

So I kept it simple. App checks subscription status once per session. API call to Supabase. If it fails, defaults to free tier (15 scrapes/day).

Could someone bypass this? Sure. Do I care? Not really. The people who need this for actual work are happy to pay.

Trade-offs I accepted

No cross-device sync. Data lives on one machine.

Manual updates. Working on auto-updater but not there yet.

Zero telemetry. No idea how people actually use it. Kind of nice honestly.

Results

Zero support tickets about blocking. Not one. Used to get these daily with cloud tools.

App size is 50MB. Electron would be 150MB+.

Users actually thank me for not requiring login to try it. "Finally a tool that doesn't want my email first." That email made my week.

When local-first makes sense

This isn't for everything. You need a server for:

  • Real-time collaboration
  • Multi-device sync
  • Anything with social features

But for single-user tools that talk to APIs that actively fight scrapers? Local-first is worth considering.


The tool is called Reddit Toolbox. Free tier available if you want to try it.

Happy to answer questions about PyQt, the architecture, or why I now have strong opinions about User-Agent strings.

Top comments (0)