I maintain a lot of scrapers by myself. People ask how that does not collapse into chaos. The honest answer is that almost all of it is boring discipline, not clever code. Here is the setup.
One repo, many small actors
Every scraper is its own Apify actor, but they all live in one repo and share a small set of libraries. Cookie extraction, a link policy filter for promotion, a few parsing helpers. When I fix a parser bug once, every actor that imports it gets the fix.
A boring, identical shape
Every actor is Node with Crawlee and Playwright, ESM, a single src/main.js entry point. Same structure, same scripts, same deploy command. Sameness is the point. I never have to relearn a project. A new actor is mostly copy the skeleton, change the parsing, ship.
Test before every deploy
One broken actor that a user reports publicly costs more than ten quiet good ones. So nothing deploys without a check and a smoke test. It is not fancy. It just has to run before every push.
Fail fast on anti bot
Scrapers hang. A target throws a captcha, a page never settles, and a run that should take two minutes burns twenty. The fix that saved me was a wall clock soft deadline inside each actor. If it is past the budget, stop cleanly, return what you have, and never charge for an empty result.
Let the platform bill
Apify charges per event, so I do not run servers or chase invoices. I do watch one number per actor though, the compute cost per run against the revenue per run. A heavy browser actor with thin output can quietly run at a loss. Light HTTP and JSON actors win.
Watch for silent failure
The scary failures are the quiet ones. An actor that returns zero rows because the site changed, not because there was no data. So I keep probes that tell the difference between a real empty result and a block, and I get flagged when an actor goes quiet.
Automate the boring promotion too
Even the marketing runs on scripts. Posting, logging what went where, respecting per platform limits so nothing gets an account banned. If I had to do it by hand I would not do it at all.
None of this is clever. It is the same boring shape repeated, with guardrails, so one person can keep a lot of plates spinning. The full set lives on Apify at https://apify.com/scrapemint and I think out loud about the build in the Discord at https://discord.gg/Ed2VNSHbr.
If you run a lot of small projects solo, what is the one piece of discipline that keeps it from falling apart?
Top comments (0)