Okay so this is going to sound backwards. But hear me out.
I spent two weeks building a cloud-based Reddit scraper. Authentication, Postgres, the whole thing. Then I threw it all away and started over with a desktop app that has zero cloud infrastructure.
Best decision I made on this project. Here's why.
The cloud version kept getting blocked
I'm building a tool called Reddit Toolbox. It scrapes subreddits, filters posts by comment count, that kind of thing. Useful for people doing Reddit marketing or research.
First version was a web app. Standard stack - Next.js frontend, Python backend on a VPS, Supabase for auth and data.
Problem is, Reddit really doesn't like servers making requests. Like, really doesn't like it.
My VPS IP got flagged within literally 5 minutes of testing. Five. Minutes. Tried rotating proxies, tried residential IPs, tried slowing down requests. Nothing worked reliably.
Every time I thought I'd solved it, Reddit would update their detection and I'd be back to square one. Users kept asking "why am I getting blocked?" and I had no good answer.
The obvious solution I kept ignoring
At some point I was venting to a friend about this and he said something like "why don't you just run it on the user's machine?"
I had all these objections. Distribution is harder. Updates are manual. Can't track usage. No recurring revenue model that works easily.
But I kept coming back to one fact: if the app runs from somebody's home IP, Reddit sees a normal person browsing. Because that's exactly what it is.
No proxy games. No cat-and-mouse with detection. It just... works.
What I rebuilt
Threw away the web stack. Started fresh with Python + PyQt6.
# The core of it is embarrassingly simple
import requests
import sqlite3
def scrape_subreddit(name, limit=100):
url = f"https://reddit.com/r/{name}.json?limit={limit}"
# That's it. Just a GET request from the user's IP.
response = requests.get(url, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
})
if response.status_code == 200:
return response.json()['data']['children']
else:
# Fallback to RSS if JSON is blocked
return scrape_via_rss(name)
The RSS fallback is important. Sometimes Reddit blocks the JSON API for certain patterns but leaves RSS open. Having both means the tool almost never fails completely.
For data storage - SQLite. One file. Lives next to the app. User can back it up by copying a single file. No database server, no connection strings, no "is my DB running" debugging at 2am.
The part that scared me: how to make money
With a web app, you can gate features behind auth. Easy.
With a desktop app? User downloads the binary. They can... do whatever they want with it. Crack it, patch it, share it.
I spent way too long thinking about DRM solutions. Then I realized something.
The kind of person who would crack a $15/month tool wasn't going to pay anyway. I'd rather have good UX for paying customers than annoying DRM that punishes everyone.
So I did the simple thing. The app phones home once per session to check subscription status. Quick API call to Supabase. If the user is offline or blocks it, the app still works - defaults to free tier limits (15 scrapes/day).
Could someone bypass this? Sure. Do I care? Not really. The people who actually need this tool for business are happy to pay. The rest were never going to be customers anyway.
Trade-offs I accepted
Not gonna pretend this is perfect. Here's what I gave up:
No cross-device sync. Your data lives on one machine. If you want it on another computer, you export and import. Is it annoying? A little. But most users work from one machine anyway.
Manual updates. I'm working on an auto-updater but for now users download new versions themselves. Actually got some feedback that people prefer this - they like knowing exactly when their software changes.
Zero telemetry. I have no idea how people actually use the app. Kind of flying blind here. Might add opt-in analytics later but honestly... it's kind of nice not drowning in dashboards.
Results after a few weeks
Not going to share exact numbers because I'm still early. But some observations:
Zero support tickets about blocking. Used to get these daily with the web version. Now everyone's using their home IP and Reddit treats them like normal humans. Because they are.
App size is 50MB. An Electron version would be 150MB+. PyQt6 is just... better for native feeling apps that don't need a browser engine.
Users actually thank me for the simplicity. No sign-up required. Download, open, start using. Got an email that said "finally a tool that doesn't want my email before I can try it." That made my week.
When to go local-first
This approach isn't for everything. You probably still need a server if you're building:
- Real-time collaboration
- Mobile apps that sync across devices
- Anything with social features
But for single-user productivity tools that talk to external APIs - especially APIs that actively fight scrapers - local-first is worth considering.
You might be adding cloud complexity that doesn't actually serve your users. I was.
If you want to see what I built, it's called Reddit Toolbox. Search "Reddit Toolbox wappkit" or check wappkit.com. Free tier available.
Happy to answer questions about PyQt, the architecture, or why I hate proxies now.
Top comments (1)
I’ve read through all your posts. Thanks to you, I gained some really valuable insights into local programs.
I’ve actually given up on using APIs from various sites many times because of their costs and policies. I had put those ideas aside for a while, but I’m going to give it another shot using the method you mentioned. Thanks!
Also, I’ll definitely download and try out your Reddit Toolbox!