<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yassine</title>
    <description>The latest articles on DEV Community by Yassine (@yassine_739cd69df).</description>
    <link>https://dev.to/yassine_739cd69df</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3368283%2Fa6994024-7318-4289-b907-65806f1ae8ed.png</url>
      <title>DEV Community: Yassine</title>
      <link>https://dev.to/yassine_739cd69df</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yassine_739cd69df"/>
    <language>en</language>
    <item>
      <title>How Do You Handle Web Scraping at Scale Without Getting Blocked?</title>
      <dc:creator>Yassine</dc:creator>
      <pubDate>Thu, 24 Jul 2025 21:15:21 +0000</pubDate>
      <link>https://dev.to/yassine_739cd69df/how-do-you-handle-web-scraping-at-scale-without-getting-blocked-bch</link>
      <guid>https://dev.to/yassine_739cd69df/how-do-you-handle-web-scraping-at-scale-without-getting-blocked-bch</guid>
      <description>&lt;p&gt;Hey devs 👋&lt;/p&gt;

&lt;p&gt;Over the past few months, I’ve been working on a side project that involves collecting structured data from various websites (mostly product listings and user reviews). At first, I was using traditional tools like requests, BeautifulSoup, and Scrapy — and they worked fine, until they didn’t.&lt;/p&gt;

&lt;p&gt;Once I started scaling things up even a little, I hit all the usual walls:&lt;br&gt;
❌ IP bans&lt;br&gt;
❌ CAPTCHAs&lt;br&gt;
❌ Anti-bot protections&lt;br&gt;
❌ Frequent layout changes&lt;/p&gt;

&lt;p&gt;Eventually, I experimented with proxy solutions. I tried a few, and one that worked decently well for me was Bright Data — it allowed me to test scraping across different regions and IPs without too much setup. I'm still not sure if I’ll stick with it long-term, but it definitely helped bypass some of those annoying blocks.&lt;/p&gt;

&lt;p&gt;That got me wondering:&lt;/p&gt;

&lt;p&gt;🔍 What tools or platforms are you using for scraping at scale?&lt;br&gt;
🔧 Do you still roll your own stack, or do you rely more on third-party services for proxy management, headless browsers, or data extraction?&lt;/p&gt;

&lt;h1&gt;
  
  
  webscraping #python #datascience #proxies
&lt;/h1&gt;

</description>
    </item>
    <item>
      <title>🚀 Just tackled large-scale web scraping for ML datasets! Faced lots of issues with CAPTCHAs &amp; bot detection. Found a tool that solved it all — fingerprinting, stealth, proxies. Happy to share tips if you're struggling too! 🔍💡 #MachineLearning #WebScrapi</title>
      <dc:creator>Yassine</dc:creator>
      <pubDate>Fri, 18 Jul 2025 20:00:44 +0000</pubDate>
      <link>https://dev.to/yassine_739cd69df/just-tackled-large-scale-web-scraping-for-ml-datasets-faced-lots-of-issues-with-captchas-bot-9ik</link>
      <guid>https://dev.to/yassine_739cd69df/just-tackled-large-scale-web-scraping-for-ml-datasets-faced-lots-of-issues-with-captchas-bot-9ik</guid>
      <description></description>
      <category>webscraping</category>
      <category>security</category>
      <category>machinelearning</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
