<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Do</title>
    <description>The latest articles on DEV Community by Do (@itflowbot).</description>
    <link>https://dev.to/itflowbot</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3717453%2Fe2c431f4-136a-4b06-a855-76a13633b052.png</url>
      <title>DEV Community: Do</title>
      <link>https://dev.to/itflowbot</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/itflowbot"/>
    <language>en</language>
    <item>
      <title>I Built 18 Web Scrapers in One Week - Here's What I Learned About Modern Scraping</title>
      <dc:creator>Do</dc:creator>
      <pubDate>Sun, 18 Jan 2026 08:19:02 +0000</pubDate>
      <link>https://dev.to/itflowbot/i-built-18-web-scrapers-in-one-week-heres-what-i-learned-about-modern-scraping-1kei</link>
      <guid>https://dev.to/itflowbot/i-built-18-web-scrapers-in-one-week-heres-what-i-learned-about-modern-scraping-1kei</guid>
      <description>&lt;p&gt;Last week, I challenged myself to build and publish 18 production-ready web scrapers on &lt;a href="https://apify.com/store" rel="noopener noreferrer"&gt;Apify Store&lt;/a&gt;. Not toy projects - real tools that handle pagination, anti-bot measures, and edge cases.&lt;/p&gt;

&lt;p&gt;Here's what I learned (and the mistakes I made).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Build scrapers for different categories - jobs, news, crypto, social media, developer tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Node.js, Cheerio, Crawlee, and FireCrawl API for the tough sites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 18 working scrapers, 350+ test runs, ~1 paying user (we'll get there).&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: Free APIs Are Everywhere (And Nobody Uses Them)
&lt;/h2&gt;

&lt;p&gt;Before writing a single line of scraping code, I discovered something surprising: &lt;strong&gt;many "protected" sites have completely free, undocumented APIs&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Examples I Found:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Site&lt;/th&gt;
&lt;th&gt;API Type&lt;/th&gt;
&lt;th&gt;Auth Required&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Remotive.com&lt;/td&gt;
&lt;td&gt;REST API&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CoinGecko&lt;/td&gt;
&lt;td&gt;Public API&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Greenhouse Job Boards&lt;/td&gt;
&lt;td&gt;JSON endpoints&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hacker News&lt;/td&gt;
&lt;td&gt;Firebase API&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reddit&lt;/td&gt;
&lt;td&gt;JSON append to URLs&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; Spend 30 minutes looking for APIs before writing a scraper. Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network tab in DevTools&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;robots.txt&lt;/code&gt; for API hints&lt;/li&gt;
&lt;li&gt;GitHub for unofficial API wrappers&lt;/li&gt;
&lt;li&gt;Adding &lt;code&gt;.json&lt;/code&gt; to URLs
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Instead of scraping Reddit HTML:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.reddit.com/r/webscraping.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// Clean JSON with all post data!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 2: The 403 Tier List
&lt;/h2&gt;

&lt;p&gt;Not all websites are created equal. After building 18 scrapers, here's my tier list:&lt;/p&gt;

&lt;h3&gt;
  
  
  S-Tier (Easy - Use APIs)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Hacker News&lt;/li&gt;
&lt;li&gt;CoinGecko&lt;/li&gt;
&lt;li&gt;GitHub API&lt;/li&gt;
&lt;li&gt;Stack Overflow API&lt;/li&gt;
&lt;li&gt;NPM Registry&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A-Tier (Medium - Standard Scraping Works)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Dev.to&lt;/li&gt;
&lt;li&gt;RemoteOK&lt;/li&gt;
&lt;li&gt;Arbeitnow&lt;/li&gt;
&lt;li&gt;Eventbrite&lt;/li&gt;
&lt;li&gt;Google News RSS&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  B-Tier (Hard - Need Stealth)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Product Hunt&lt;/li&gt;
&lt;li&gt;Glassdoor&lt;/li&gt;
&lt;li&gt;TripAdvisor&lt;/li&gt;
&lt;li&gt;Bark.com&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  F-Tier (Basically Impossible Without $$$)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;LinkedIn (DataDome)&lt;/li&gt;
&lt;li&gt;Yelp (Custom WAF)&lt;/li&gt;
&lt;li&gt;DoorDash (Bot Detection)&lt;/li&gt;
&lt;li&gt;Amazon (CAPTCHA + IP blocks)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; Pick your battles. Start with S and A tier sites.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: The "Works on My Machine" Problem
&lt;/h2&gt;

&lt;p&gt;My scrapers worked perfectly locally. Then I deployed them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What changed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apify's IP ranges are well-known (blocked by many sites)&lt;/li&gt;
&lt;li&gt;No residential proxy by default&lt;/li&gt;
&lt;li&gt;Different User-Agent detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Use external scraping APIs for tough sites:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// For B-tier sites, use a scraping API&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scrapePage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// FireCrawl, ScrapingBee, or similar&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.firecrawl.dev/v1/scrape&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;formats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;markdown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 4: Pagination is Where Scrapers Die
&lt;/h2&gt;

&lt;p&gt;Most scraper tutorials show you how to scrape one page. Real scrapers need to handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infinite scroll&lt;/li&gt;
&lt;li&gt;"Load more" buttons&lt;/li&gt;
&lt;li&gt;URL-based pagination (?page=2)&lt;/li&gt;
&lt;li&gt;Cursor-based pagination&lt;/li&gt;
&lt;li&gt;Rate limits between pages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My pagination pattern:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scrapeWithPagination&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxPages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;maxPages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;?page=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;scrapePage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// No more results&lt;/span&gt;

    &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Be nice to servers&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 5: Error Handling &amp;gt; Feature Count
&lt;/h2&gt;

&lt;p&gt;My first scrapers had great features and terrible error handling. They crashed on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Empty responses&lt;/li&gt;
&lt;li&gt;Changed HTML structure&lt;/li&gt;
&lt;li&gt;Rate limit responses&lt;/li&gt;
&lt;li&gt;Network timeouts&lt;/li&gt;
&lt;li&gt;Partial data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Now every scraper has:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeScrape&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;User-Agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getRandomUA&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Rate limited, waiting...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;safeScrape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Retry&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`HTTP &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; for &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Error scraping &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lesson 6: The MCP Revolution
&lt;/h2&gt;

&lt;p&gt;The most exciting discovery: &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;MCP (Model Context Protocol)&lt;/a&gt; lets AI agents use scrapers directly.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Human requests data&lt;/li&gt;
&lt;li&gt;Human runs scraper&lt;/li&gt;
&lt;li&gt;Human processes results&lt;/li&gt;
&lt;li&gt;Human gives to AI&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AI agent calls scraper via MCP&lt;/li&gt;
&lt;li&gt;AI processes results automatically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;This changes everything.&lt;/strong&gt; Scrapers aren't just for developers anymore - they're tools for AI agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Worked (And What Didn't)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Worked:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Job board scrapers&lt;/strong&gt; - High demand, structured data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;News aggregators&lt;/strong&gt; - RSS feeds are reliable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer tools&lt;/strong&gt; (GitHub, NPM, Stack Overflow) - Great APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crypto data&lt;/strong&gt; - Free APIs everywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Didn't Work:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;E-commerce&lt;/strong&gt; - Too protected, need expensive proxies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Social media&lt;/strong&gt; - API changes, legal gray area&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review sites&lt;/strong&gt; - Heavy anti-bot (Yelp, TripAdvisor)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Numbers (Honest)
&lt;/h2&gt;

&lt;p&gt;After one week:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;18 scrapers published&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;350+ test runs&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~21 MAU&lt;/strong&gt; (Monthly Active Users)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$0 revenue&lt;/strong&gt; (so far)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Apify $1M Challenge requires 50 MAU by January 31st. I'm getting there!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; Building is the easy part. Distribution is everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Content marketing&lt;/strong&gt; - This article is part of that&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub awesome-lists&lt;/strong&gt; - PRs submitted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community engagement&lt;/strong&gt; - Discord, Reddit (carefully)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better SEO&lt;/strong&gt; - Optimizing actor descriptions&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try My Scrapers
&lt;/h2&gt;

&lt;p&gt;All 18 scrapers are free to try on Apify Store:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Jobs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/remoteok-scraper" rel="noopener noreferrer"&gt;RemoteOK Scraper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/arbeitnow-scraper" rel="noopener noreferrer"&gt;Arbeitnow Scraper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/greenhouse-scraper" rel="noopener noreferrer"&gt;Greenhouse Jobs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/remotive-scraper" rel="noopener noreferrer"&gt;Remotive Jobs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Developer Tools:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/github-scraper" rel="noopener noreferrer"&gt;GitHub Scraper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/npm-scraper" rel="noopener noreferrer"&gt;NPM Package Scraper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/stackoverflow-scraper" rel="noopener noreferrer"&gt;Stack Overflow Scraper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/devto-scraper" rel="noopener noreferrer"&gt;Dev.to Scraper&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;News &amp;amp; Social:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/hackernews-scraper" rel="noopener noreferrer"&gt;Hacker News Scraper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/google-news-scraper" rel="noopener noreferrer"&gt;Google News Scraper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/reddit-scraper" rel="noopener noreferrer"&gt;Reddit Scraper&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Other:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/coingecko-scraper" rel="noopener noreferrer"&gt;CoinGecko Crypto&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/email-verifier" rel="noopener noreferrer"&gt;Email Verifier&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apify.com/muscular_quadruplet/eventbrite-event-scraper" rel="noopener noreferrer"&gt;Eventbrite Events&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Questions?
&lt;/h2&gt;

&lt;p&gt;Drop a comment if you want me to dive deeper into any of these topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anti-bot bypass techniques&lt;/li&gt;
&lt;li&gt;Pagination patterns&lt;/li&gt;
&lt;li&gt;MCP integration for AI agents&lt;/li&gt;
&lt;li&gt;Monetizing scrapers&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Building in public. Follow the journey on &lt;a href="https://twitter.com/flowbot_ai" rel="noopener noreferrer"&gt;Twitter/X&lt;/a&gt; or check the &lt;a href="https://flowbot.company" rel="noopener noreferrer"&gt;portfolio&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>javascript</category>
      <category>automation</category>
      <category>api</category>
    </item>
    <item>
      <title>How to Build an AI-Powered Data Pipeline with Web Scrapers</title>
      <dc:creator>Do</dc:creator>
      <pubDate>Sun, 18 Jan 2026 06:23:09 +0000</pubDate>
      <link>https://dev.to/itflowbot/how-to-build-an-ai-powered-data-pipeline-with-web-scrapers-364e</link>
      <guid>https://dev.to/itflowbot/how-to-build-an-ai-powered-data-pipeline-with-web-scrapers-364e</guid>
      <description>&lt;p&gt;Web scraping is essential for AI agents that need real-time data. In this tutorial, I'll show you how to set up a complete data extraction pipeline using Apify actors.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;AI agents need fresh data to make decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Job aggregators need current listings&lt;/li&gt;
&lt;li&gt;Lead generation tools need verified contacts&lt;/li&gt;
&lt;li&gt;Market research needs competitor data&lt;/li&gt;
&lt;li&gt;News monitoring needs latest articles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Manual data collection doesn't scale. APIs are often limited or expensive. Web scraping fills the gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution: Pre-built Scrapers + AI
&lt;/h2&gt;

&lt;p&gt;Instead of building scrapers from scratch, use production-ready actors. Here's my toolkit:&lt;/p&gt;

&lt;h3&gt;
  
  
  Job Data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RemoteOK Scraper&lt;/strong&gt; - Remote job listings with salary data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Greenhouse Scraper&lt;/strong&gt; - ATS job boards (thousands of companies use Greenhouse)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arbeitnow Scraper&lt;/strong&gt; - European job market&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Developer Data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Scraper&lt;/strong&gt; - Repository stats, stars, languages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stack Overflow Scraper&lt;/strong&gt; - Q&amp;amp;A for training data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NPM Scraper&lt;/strong&gt; - Package ecosystem analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  News &amp;amp; Social
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hacker News Scraper&lt;/strong&gt; - Tech news and discussions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reddit Scraper&lt;/strong&gt; - Community sentiment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google News Scraper&lt;/strong&gt; - Headlines by topic&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Business
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Email Verifier&lt;/strong&gt; - Clean your lead lists&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CoinGecko Scraper&lt;/strong&gt; - Crypto market data&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Get Apify Account
&lt;/h3&gt;

&lt;p&gt;Sign up at &lt;a href="https://apify.com" rel="noopener noreferrer"&gt;apify.com&lt;/a&gt; - free tier includes $5/month credits.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Run a Scraper
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Using Apify Client&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ApifyClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;apify-client&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;YOUR_API_TOKEN&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Scrape remote jobs&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;muscular_quadruplet/remoteok-scraper&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;maxItems&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Get results&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;listItems&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Found &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; jobs`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Use with AI Agents (MCP)
&lt;/h3&gt;

&lt;p&gt;Connect to &lt;a href="https://mcp.apify.com" rel="noopener noreferrer"&gt;mcp.apify.com&lt;/a&gt; and use natural language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Scrape 50 remote JavaScript jobs from RemoteOK"
"Get top 100 cryptocurrencies from CoinGecko"
"Find trending posts from r/webdev"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Integration Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  n8n Workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Add Apify node&lt;/li&gt;
&lt;li&gt;Select actor (e.g., &lt;code&gt;muscular_quadruplet/hackernews-scraper&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Connect to your AI processing nodes&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Python Script
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Verify emails before outreach
&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;muscular_quadruplet/email-verifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;emails&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lead1@company.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lead2@startup.io&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;valid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Valid: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Pre-built Scrapers?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Maintained&lt;/strong&gt; - I update them when sites change&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tested&lt;/strong&gt; - E2E tests ensure they work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalable&lt;/strong&gt; - Apify handles proxies and retries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Ready&lt;/strong&gt; - Works with Claude, Cursor, and AI agents&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Available Actors
&lt;/h2&gt;

&lt;p&gt;All my actors are free to use on Apify Store:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Actor&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/muscular_quadruplet/email-verifier" rel="noopener noreferrer"&gt;Email Verifier&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Lead cleaning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/muscular_quadruplet/remoteok-scraper" rel="noopener noreferrer"&gt;RemoteOK Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Remote jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/muscular_quadruplet/github-scraper" rel="noopener noreferrer"&gt;GitHub Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Developer analytics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/muscular_quadruplet/hackernews-scraper" rel="noopener noreferrer"&gt;Hacker News Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Tech news&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/muscular_quadruplet/coingecko-scraper" rel="noopener noreferrer"&gt;CoinGecko Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Crypto data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://apify.com/muscular_quadruplet/reddit-scraper" rel="noopener noreferrer"&gt;Reddit Scraper&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Community insights&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Pick an actor for your use case&lt;/li&gt;
&lt;li&gt;Test with free tier credits&lt;/li&gt;
&lt;li&gt;Integrate into your AI workflow&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Questions? Drop a comment below.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building AI-ready data tools at &lt;a href="https://flowbot.company" rel="noopener noreferrer"&gt;flowbot.company&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
