<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ken-Mutisya</title>
    <description>The latest articles on DEV Community by Ken-Mutisya (@scrapemint).</description>
    <link>https://dev.to/scrapemint</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4006428%2F450d271d-24fa-4dcd-b75c-cda1fe0182dc.jpg</url>
      <title>DEV Community: Ken-Mutisya</title>
      <link>https://dev.to/scrapemint</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/scrapemint"/>
    <language>en</language>
    <item>
      <title>How I Run Dozens of Scrapers as One Person</title>
      <dc:creator>Ken-Mutisya</dc:creator>
      <pubDate>Sun, 28 Jun 2026 15:00:24 +0000</pubDate>
      <link>https://dev.to/scrapemint/how-i-run-dozens-of-scrapers-as-one-person-3d9g</link>
      <guid>https://dev.to/scrapemint/how-i-run-dozens-of-scrapers-as-one-person-3d9g</guid>
      <description>&lt;p&gt;I maintain a lot of scrapers by myself. People ask how that does not collapse into chaos. The honest answer is that almost all of it is boring discipline, not clever code. Here is the setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  One repo, many small actors
&lt;/h2&gt;

&lt;p&gt;Every scraper is its own Apify actor, but they all live in one repo and share a small set of libraries. Cookie extraction, a link policy filter for promotion, a few parsing helpers. When I fix a parser bug once, every actor that imports it gets the fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  A boring, identical shape
&lt;/h2&gt;

&lt;p&gt;Every actor is Node with Crawlee and Playwright, ESM, a single src/main.js entry point. Same structure, same scripts, same deploy command. Sameness is the point. I never have to relearn a project. A new actor is mostly copy the skeleton, change the parsing, ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test before every deploy
&lt;/h2&gt;

&lt;p&gt;One broken actor that a user reports publicly costs more than ten quiet good ones. So nothing deploys without a check and a smoke test. It is not fancy. It just has to run before every push.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fail fast on anti bot
&lt;/h2&gt;

&lt;p&gt;Scrapers hang. A target throws a captcha, a page never settles, and a run that should take two minutes burns twenty. The fix that saved me was a wall clock soft deadline inside each actor. If it is past the budget, stop cleanly, return what you have, and never charge for an empty result.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let the platform bill
&lt;/h2&gt;

&lt;p&gt;Apify charges per event, so I do not run servers or chase invoices. I do watch one number per actor though, the compute cost per run against the revenue per run. A heavy browser actor with thin output can quietly run at a loss. Light HTTP and JSON actors win.&lt;/p&gt;

&lt;h2&gt;
  
  
  Watch for silent failure
&lt;/h2&gt;

&lt;p&gt;The scary failures are the quiet ones. An actor that returns zero rows because the site changed, not because there was no data. So I keep probes that tell the difference between a real empty result and a block, and I get flagged when an actor goes quiet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automate the boring promotion too
&lt;/h2&gt;

&lt;p&gt;Even the marketing runs on scripts. Posting, logging what went where, respecting per platform limits so nothing gets an account banned. If I had to do it by hand I would not do it at all.&lt;/p&gt;

&lt;p&gt;None of this is clever. It is the same boring shape repeated, with guardrails, so one person can keep a lot of plates spinning. The full set lives on Apify at &lt;a href="https://apify.com/scrapemint" rel="noopener noreferrer"&gt;https://apify.com/scrapemint&lt;/a&gt; and I think out loud about the build in the Discord at &lt;a href="https://discord.gg/Ed2VNSHbr" rel="noopener noreferrer"&gt;https://discord.gg/Ed2VNSHbr&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you run a lot of small projects solo, what is the one piece of discipline that keeps it from falling apart?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>api</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Keyless Public APIs Most Developers Do Not Know About</title>
      <dc:creator>Ken-Mutisya</dc:creator>
      <pubDate>Sun, 28 Jun 2026 14:43:40 +0000</pubDate>
      <link>https://dev.to/scrapemint/keyless-public-apis-most-developers-do-not-know-about-28oe</link>
      <guid>https://dev.to/scrapemint/keyless-public-apis-most-developers-do-not-know-about-28oe</guid>
      <description>&lt;p&gt;Building scrapers taught me that a lot of scraping is unnecessary. A surprising amount of valuable public data ships as clean JSON behind endpoints with no key and no signup. Here are the ones I reach for most.&lt;/p&gt;

&lt;h2&gt;
  
  
  SEC EDGAR
&lt;/h2&gt;

&lt;p&gt;The US securities regulator publishes every corporate filing as JSON. The path data.sec.gov/submissions/CIK##########.json gives a company's entire filing history, and efts.sec.gov runs full text search across filings. No key. The only rules are a fair use rate limit and a descriptive user agent. Insider trades, 8-K events, fund holdings, all there.&lt;/p&gt;

&lt;h2&gt;
  
  
  openFDA
&lt;/h2&gt;

&lt;p&gt;openFDA exposes drug approvals, recalls, adverse events, and device clearances at api.fda.gov. One gotcha that cost me an hour: drug sponsor names are case sensitive and want uppercase, while device applicant names are not. A 404 just means an empty result set, not an error.&lt;/p&gt;

&lt;h2&gt;
  
  
  USAspending
&lt;/h2&gt;

&lt;p&gt;Every US federal contract and grant is queryable at api.usaspending.gov. You POST a JSON filter and get awards with amounts, recipients, and dates. If you want new awards rather than old multi year giants, filter by date signed, not the default action date.&lt;/p&gt;

&lt;h2&gt;
  
  
  npm and PyPI
&lt;/h2&gt;

&lt;p&gt;The npm registry search at registry.npmjs.org/-/v1/search returns packages with maintainers and metadata. On the Python side, pypi.org/simple is the entire package index as one document, and pypi.org/pypi/{package}/json gives structured metadata per project. No key, very stable, great for mapping an ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  A few more worth a look
&lt;/h2&gt;

&lt;p&gt;ClinicalTrials.gov has a clean v2 API for trials. Google Patents has an undocumented xhr query endpoint that returns JSON. Hacker News ships a Firebase API. Wikipedia and OpenStreetMap both have generous public endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Half the time I see someone reaching for a paid data vendor or spinning up a headless browser, the data is already sitting behind one of these. Read the docs slowly, send a polite user agent, respect the rate limits, and you skip a lot of pain.&lt;/p&gt;

&lt;p&gt;These endpoints quietly power a good chunk of the actors I publish on Apify at &lt;a href="https://apify.com/scrapemint" rel="noopener noreferrer"&gt;https://apify.com/scrapemint&lt;/a&gt;, and I keep notes on new ones in the Discord at &lt;a href="https://discord.gg/Ed2VNSHbr" rel="noopener noreferrer"&gt;https://discord.gg/Ed2VNSHbr&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;What is your favorite keyless API that more people should know about?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>api</category>
      <category>javascript</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Ship a New Data Scraper Every Few Days. Here Is What I Have Learned</title>
      <dc:creator>Ken-Mutisya</dc:creator>
      <pubDate>Sun, 28 Jun 2026 13:15:22 +0000</pubDate>
      <link>https://dev.to/scrapemint/i-ship-a-new-data-scraper-every-few-days-here-is-what-i-have-learned-49ph</link>
      <guid>https://dev.to/scrapemint/i-ship-a-new-data-scraper-every-few-days-here-is-what-i-have-learned-49ph</guid>
      <description>&lt;p&gt;A while back I kept hitting the same wall. I wanted some specific slice of public data, a list of local businesses, fresh SEC filings, app store reviews, and the options were always the same. Pay a bloated subscription for a dashboard I did not need, or write yet another scraper from scratch.&lt;/p&gt;

&lt;p&gt;So I started building the scrapers anyway, and then I did the obvious thing. I put each one online as a small pay per use API. As I write this there are around 85 of them live, and I ship a new one every few days, so by the time you read this the count is higher.&lt;/p&gt;

&lt;p&gt;Here is what the stack looks like and what I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;p&gt;Every actor is Node with Crawlee and Playwright, deployed on Apify. ESM modules, a single src/main.js entry point, automated checks before every deploy. Apify handles the hosting and the pay per event billing, so a buyer is charged per result instead of a flat monthly fee. You run it, you pay for what comes out.&lt;/p&gt;

&lt;p&gt;That billing model changed how I think about products. A free tier is not charity, it is the top of the funnel. People test on a few rows, then run a real job.&lt;/p&gt;

&lt;h2&gt;
  
  
  The interesting problems
&lt;/h2&gt;

&lt;p&gt;Most of the work was not the happy path, it was the edges.&lt;/p&gt;

&lt;p&gt;Anti bot was the constant fight. Some targets fold to residential proxies and a good browser fingerprint. Others, like a couple of the big marketplaces, defeat both and need a different door entirely.&lt;/p&gt;

&lt;p&gt;Keyless public APIs turned out to be gold. SEC EDGAR, openFDA, USAspending, the npm registry search, the PyPI simple index. No key, no signup, just clean JSON if you read the docs closely. A surprising number of useful datasets sit behind endpoints nobody talks about.&lt;/p&gt;

&lt;p&gt;The pattern that paid off most was chaining. A single scraper is worth a little. A pipeline that takes the output of one and enriches it with two more is worth far more, because it does the boring glue work the buyer would otherwise do by hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually moved the needle
&lt;/h2&gt;

&lt;p&gt;Positioning beat features every time. Renaming a generic scraper to the outcome a buyer wants did more for traffic than any code change.&lt;/p&gt;

&lt;p&gt;Power users matter more than total users. A handful of people running thousands of jobs out earn hundreds of casual testers. So I optimize for runs per user, not signups.&lt;/p&gt;

&lt;p&gt;And shipping beats polishing. A live actor that solves one real problem earns more than a perfect one that never goes out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where it is now
&lt;/h2&gt;

&lt;p&gt;The full catalog lives on Apify at &lt;a href="https://apify.com/scrapemint" rel="noopener noreferrer"&gt;https://apify.com/scrapemint&lt;/a&gt;, and I just opened a Discord to keep the new drops and questions in one place at &lt;a href="https://discord.gg/Ed2VNSHbr" rel="noopener noreferrer"&gt;https://discord.gg/Ed2VNSHbr&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you build things for your own problems and then wonder whether anyone else would pay for them, I would like to hear how you decided what was worth selling. What did you ship that surprised you?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>api</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
