DEV Community

Ken-Mutisya
Ken-Mutisya

Posted on

I Ship a New Data Scraper Every Few Days. Here Is What I Have Learned

A while back I kept hitting the same wall. I wanted some specific slice of public data, a list of local businesses, fresh SEC filings, app store reviews, and the options were always the same. Pay a bloated subscription for a dashboard I did not need, or write yet another scraper from scratch.

So I started building the scrapers anyway, and then I did the obvious thing. I put each one online as a small pay per use API. As I write this there are around 85 of them live, and I ship a new one every few days, so by the time you read this the count is higher.

Here is what the stack looks like and what I learned.

The stack

Every actor is Node with Crawlee and Playwright, deployed on Apify. ESM modules, a single src/main.js entry point, automated checks before every deploy. Apify handles the hosting and the pay per event billing, so a buyer is charged per result instead of a flat monthly fee. You run it, you pay for what comes out.

That billing model changed how I think about products. A free tier is not charity, it is the top of the funnel. People test on a few rows, then run a real job.

The interesting problems

Most of the work was not the happy path, it was the edges.

Anti bot was the constant fight. Some targets fold to residential proxies and a good browser fingerprint. Others, like a couple of the big marketplaces, defeat both and need a different door entirely.

Keyless public APIs turned out to be gold. SEC EDGAR, openFDA, USAspending, the npm registry search, the PyPI simple index. No key, no signup, just clean JSON if you read the docs closely. A surprising number of useful datasets sit behind endpoints nobody talks about.

The pattern that paid off most was chaining. A single scraper is worth a little. A pipeline that takes the output of one and enriches it with two more is worth far more, because it does the boring glue work the buyer would otherwise do by hand.

What actually moved the needle

Positioning beat features every time. Renaming a generic scraper to the outcome a buyer wants did more for traffic than any code change.

Power users matter more than total users. A handful of people running thousands of jobs out earn hundreds of casual testers. So I optimize for runs per user, not signups.

And shipping beats polishing. A live actor that solves one real problem earns more than a perfect one that never goes out.

Where it is now

The full catalog lives on Apify at https://apify.com/scrapemint, and I just opened a Discord to keep the new drops and questions in one place at https://discord.gg/Ed2VNSHbr.

If you build things for your own problems and then wonder whether anyone else would pay for them, I would like to hear how you decided what was worth selling. What did you ship that surprised you?

Top comments (0)