<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Codes Me</title>
    <description>The latest articles on DEV Community by Codes Me (@codes_me_734c93c2eb65de65).</description>
    <link>https://dev.to/codes_me_734c93c2eb65de65</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3906269%2Fb40d095d-e2c9-4e4d-9c37-3e46305cf57d.png</url>
      <title>DEV Community: Codes Me</title>
      <link>https://dev.to/codes_me_734c93c2eb65de65</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/codes_me_734c93c2eb65de65"/>
    <language>en</language>
    <item>
      <title>Stop Getting Blocked: Recon Your Target Website Before Scraping It</title>
      <dc:creator>Codes Me</dc:creator>
      <pubDate>Thu, 30 Apr 2026 16:01:46 +0000</pubDate>
      <link>https://dev.to/codes_me_734c93c2eb65de65/stop-getting-blocked-recon-your-target-website-before-scraping-it-2mn8</link>
      <guid>https://dev.to/codes_me_734c93c2eb65de65/stop-getting-blocked-recon-your-target-website-before-scraping-it-2mn8</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;You spend hours writing a scraper, run it, and immediately get a 403.&lt;br&gt;
Or you build it with requests, only to realize the site needs JavaScript to render.&lt;/p&gt;

&lt;p&gt;I got tired of this loop, so I built &lt;strong&gt;scrapalyser&lt;/strong&gt; — a Python library that scans&lt;br&gt;
any website before you write a single line of scraper code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;pip install scrapalyser&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;p&gt;import scrapalyser&lt;/p&gt;

&lt;p&gt;report = scrapalyser.scan(&lt;br&gt;
    url="&lt;a href="https://example.com" rel="noopener noreferrer"&gt;https://example.com&lt;/a&gt;",&lt;br&gt;
    output="txt",&lt;br&gt;
    lang="en",&lt;br&gt;
)&lt;/p&gt;

&lt;h2&gt;
  
  
  What it detects
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anti-bot&lt;/strong&gt;: Cloudflare, DataDome, PerimeterX, Akamai, Kasada, reCAPTCHA, hCaptcha&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tech stack&lt;/strong&gt;: React, Vue, Angular, Next.js, Nuxt, WordPress, Shopify...&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JS required&lt;/strong&gt;: so you know if requests is enough or if you need Playwright&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API endpoints&lt;/strong&gt;: via CSP headers, inline scripts, or XHR interception (Playwright mode)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;robots.txt &amp;amp; sitemap&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Login wall&lt;/strong&gt;: form, redirect, button, OAuth&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Two engines
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;curl_cffi&lt;/strong&gt; (default): fast, no browser, one HTTP request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;playwright&lt;/strong&gt;: full browser with XHR interception and screenshot.&lt;/p&gt;

&lt;p&gt;report = scrapalyser.scan(&lt;br&gt;
    url="&lt;a href="https://example.com" rel="noopener noreferrer"&gt;https://example.com&lt;/a&gt;",&lt;br&gt;
    engine="playwright",&lt;br&gt;
    headless=False,&lt;br&gt;
    screenshot="capture.png",&lt;br&gt;
)&lt;/p&gt;

&lt;h2&gt;
  
  
  If you get blocked
&lt;/h2&gt;

&lt;p&gt;If the site returns a 403 or a captcha page, the report immediately tells you&lt;br&gt;
which antibot blocked you — all other fields return "blocked by antibot".&lt;br&gt;
No guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Website :&lt;a href="https://codes-me.com/" rel="noopener noreferrer"&gt;https://codes-me.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/codesme34/scrapalyser" rel="noopener noreferrer"&gt;https://github.com/codesme34/scrapalyser&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;a href="https://pypi.org/project/scrapalyser/" rel="noopener noreferrer"&gt;https://pypi.org/project/scrapalyser/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;YouTube channel (french): &lt;a href="https://www.youtube.com/@CodesMe" rel="noopener noreferrer"&gt;https://www.youtube.com/@CodesMe&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>showdev</category>
      <category>tooling</category>
      <category>webscraping</category>
    </item>
  </channel>
</rss>
