<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Karim Shoair</title>
    <description>The latest articles on DEV Community by Karim Shoair (@d4vinci).</description>
    <link>https://dev.to/d4vinci</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3111912%2F7e0f9644-59c4-4d2a-a8b5-500458af5687.jpeg</url>
      <title>DEV Community: Karim Shoair</title>
      <link>https://dev.to/d4vinci</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/d4vinci"/>
    <language>en</language>
    <item>
      <title>In case you missed it</title>
      <dc:creator>Karim Shoair</dc:creator>
      <pubDate>Mon, 05 May 2025 15:07:39 +0000</pubDate>
      <link>https://dev.to/d4vinci/in-case-you-missed-it-4mi1</link>
      <guid>https://dev.to/d4vinci/in-case-you-missed-it-4mi1</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/d4vinci/creating-self-healing-spiders-with-scrapling-in-python-without-ai-web-scraping-2kfa" class="crayons-story__hidden-navigation-link"&gt;Creating self-healing spiders with Scrapling in Python without AI (Web Scraping)&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/d4vinci" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3111912%2F7e0f9644-59c4-4d2a-a8b5-500458af5687.jpeg" alt="d4vinci profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/d4vinci" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Karim Shoair
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Karim Shoair
                
              
              &lt;div id="story-author-preview-content-2459477" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/d4vinci" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3111912%2F7e0f9644-59c4-4d2a-a8b5-500458af5687.jpeg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Karim Shoair&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/d4vinci/creating-self-healing-spiders-with-scrapling-in-python-without-ai-web-scraping-2kfa" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 5 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/d4vinci/creating-self-healing-spiders-with-scrapling-in-python-without-ai-web-scraping-2kfa" id="article-link-2459477"&gt;
          Creating self-healing spiders with Scrapling in Python without AI (Web Scraping)
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/webscraping"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;webscraping&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/d4vinci/creating-self-healing-spiders-with-scrapling-in-python-without-ai-web-scraping-2kfa" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;14&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/d4vinci/creating-self-healing-spiders-with-scrapling-in-python-without-ai-web-scraping-2kfa#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            10 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>webscraping</category>
      <category>python</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Creating self-healing spiders with Scrapling in Python without AI (Web Scraping)</title>
      <dc:creator>Karim Shoair</dc:creator>
      <pubDate>Mon, 05 May 2025 03:10:48 +0000</pubDate>
      <link>https://dev.to/d4vinci/creating-self-healing-spiders-with-scrapling-in-python-without-ai-web-scraping-2kfa</link>
      <guid>https://dev.to/d4vinci/creating-self-healing-spiders-with-scrapling-in-python-without-ai-web-scraping-2kfa</guid>
      <description>&lt;p&gt;Hello everyone, this is my first article here, so I hope you like it!&lt;/p&gt;

&lt;p&gt;If you have been doing Web Scraping for a long time, you probably noticed that there are repeating problems with Web Scraping, like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Rapidly changing website structures&lt;/strong&gt; — Sites frequently update their DOM structures, breaking static XPath/CSS selectors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unstable selectors&lt;/strong&gt; — Class names and IDs often change or use randomly generated values that break scrapers or make scraping these websites difficult.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Increasingly complex anti-bot measures&lt;/strong&gt; — CAPTCHA systems, browser fingerprinting, and behavior analysis make traditional scraping difficult
and others&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But that's only if you are doing targeted Web Scraping for known websites, in which case you can write specific code for every website.&lt;/p&gt;

&lt;p&gt;If you start thinking about bigger goals like Broad Scraping or Generic Web Scraping, or what you like to call it, then the above issues intensify, and you will face new issues like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Extreme Website Diversity&lt;/strong&gt; — Generic scraping must handle countless variations in HTML structures, CSS usage, JavaScript frameworks, and backend technologies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identifying Relevant Data&lt;/strong&gt; — How does the scraper know what data is important on a page it has never seen before?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pagination variations&lt;/strong&gt; — Infinite scroll, traditional pagination, "load more" buttons all requiring different approaches
and more&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;How are you going to solve that manually? I'm talking about generic web scraping of different websites that don't share any technologies.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI to the rescue
&lt;/h2&gt;

&lt;p&gt;Recently, there's been a noticeable shift toward AI-based web scraping, driven by its potential to address these challenges.&lt;/p&gt;

&lt;p&gt;Of course, the AI can solve most of these issues easily because it will understand the page source, tell you where the fields you want are, or create selectors for them for you. That's, of course, if you already solved the anti-bot measures through other tools 😄&lt;/p&gt;

&lt;p&gt;This approach is beautiful, of course. I love AI and find it very interesting to keep learning about it, especially GenAI. You will probably spend a lot of time on prompt engineering and tweaking the prompts, but if that's cool with you, you will soon hit the real issue with using AI here.&lt;/p&gt;

&lt;p&gt;Most websites have huge content per page, which you will need to pass to the AI somehow so it can do its magic. This will burn through tokens like fire in a haystack, quickly building up high costs!&lt;/p&gt;

&lt;p&gt;Unless money is irrelevant to you, you will try to find cheaper approaches, and that's why I made Scrapling 😄&lt;/p&gt;

&lt;h2&gt;
  
  
  Scrapling got you covered
&lt;/h2&gt;

&lt;p&gt;After years of working in Web Scraping and scraping hundreds or thousands of websites manually with Python spiders, I got tired of maintaining spiders and the same repeating issues we all deal with in this field.&lt;/p&gt;

&lt;p&gt;So, 8 months ago, I decided to take the first step and camped in my house for ~50 days, not doing anything in my life other than finishing my Web Scraping job, then working on Scrapling for the rest of the day. Both my job and Scrapling were taking between 8-14 hours daily, rewrote the first version more than 5 times to get better performance and an easier API, but in the end, it was worth it as Scrapling version 0.1 was born, and now we are at version 0.2.99 while writing this 😄&lt;/p&gt;

&lt;p&gt;Scrapling is an Undetectable, high-performance, intelligent Web Scraping library for Python to make Web Scraping easy and effortless as it should be, or should I say as it was? The goal is to provide powerful features while maintaining simplicity and minimal boilerplate code.&lt;/p&gt;

&lt;p&gt;Scrapling can deal with almost all issues you will face during Web Scraping, and the following updates will cover the rest carefully.&lt;/p&gt;

&lt;p&gt;Also, did I say that Scrapling's parsing engine is faster than BeautifulSoup 400-600 times in benchmarks, while having more features, uses less memory, and has a very similar API? Oh, I think I just did, but that's a subject for another time :laugh:&lt;/p&gt;

&lt;p&gt;Below, we will talk about how to install it, then we will talk about how the issues above are solved by Scrapling&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;Scrapling is a breeze to get started with! Starting from version 0.2.9, we require at least Python 3.9 to work.&lt;/p&gt;

&lt;p&gt;Run this command to install it with Python's pip.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip3 &lt;span class="nb"&gt;install &lt;/span&gt;scrapling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You are ready if you plan to use the parser only (the &lt;code&gt;Adaptor&lt;/code&gt; class).&lt;/p&gt;

&lt;p&gt;But if you are going to make requests or fetch pages with Scrapling, then run this command to install the browsers' dependencies needed to use the Fetchers&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;scrapling &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Solving issue T1: Rapidly changing website structures
&lt;/h2&gt;

&lt;p&gt;One of Scrapling's most powerful features is &lt;a href="https://scrapling.readthedocs.io/en/latest/parsing/automatch/" rel="noopener noreferrer"&gt;Automatch&lt;/a&gt;. It allows your scraper to survive website changes by intelligently tracking and relocating elements.&lt;/p&gt;

&lt;p&gt;While Web Scraping, if you have automatch enabled, you can save any element's unique location properties to find it again later if the website's structure changes. The most frustrating thing about changes is that anything about an element can change, so there's nothing to rely on. &lt;/p&gt;

&lt;p&gt;That's how the automatch feature works: it stores everything unique about an element's location in the DOM. When the website structure changes, it returns the element with the highest similarity score with the saved properties.&lt;/p&gt;

&lt;p&gt;I will give you two examples of how to use it to hammer the idea.&lt;/p&gt;

&lt;p&gt;Let's say you are scraping a page with a structure like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"container"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;section&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"products"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;article&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"product"&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"p1"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;h3&amp;gt;&lt;/span&gt;Product 1&lt;span class="nt"&gt;&amp;lt;/h3&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;p&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"description"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Description 1&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/article&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;article&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"product"&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"p2"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;h3&amp;gt;&lt;/span&gt;Product 2&lt;span class="nt"&gt;&amp;lt;/h3&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;p&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"description"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Description 2&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/article&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/section&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And you want to scrape the first product, the one with the &lt;code&gt;p1&lt;/code&gt; ID. You will probably write a selector like this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#p1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When website owners implement structural changes like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"new-container"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"product-wrapper"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;section&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"products"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;article&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"product new-class"&lt;/span&gt; &lt;span class="na"&gt;data-id=&lt;/span&gt;&lt;span class="s"&gt;"p1"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"product-info"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;h3&amp;gt;&lt;/span&gt;Product 1&lt;span class="nt"&gt;&amp;lt;/h3&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;p&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"new-description"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Description 1&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/article&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;article&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"product new-class"&lt;/span&gt; &lt;span class="na"&gt;data-id=&lt;/span&gt;&lt;span class="s"&gt;"p2"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"product-info"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;h3&amp;gt;&lt;/span&gt;Product 2&lt;span class="nt"&gt;&amp;lt;/h3&amp;gt;&lt;/span&gt;
                    &lt;span class="nt"&gt;&amp;lt;p&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"new-description"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Description 2&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/article&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/section&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The selector will no longer function, and your code needs maintenance. That's where Scrapling's auto-matching feature comes into play.&lt;/p&gt;

&lt;p&gt;With Scrapling, you can enable the &lt;code&gt;automatch&lt;/code&gt; feature the first time you select an element, and the next time you select that element and it doesn't exist, Scrapling will remember its properties and search on the website for the element with the highest percentage of similarity to that element and without AI 😄&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scrapling&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Adaptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Fetcher&lt;/span&gt;
&lt;span class="c1"&gt;# Before the change
&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Adaptor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auto_match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;example.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# or
&lt;/span&gt;&lt;span class="n"&gt;Fetcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;auto_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Fetcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://example.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# then
&lt;/span&gt;&lt;span class="n"&gt;element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#p1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="n"&gt;auto_save&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;element&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# One day website changes?
&lt;/span&gt;    &lt;span class="n"&gt;element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#p1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auto_match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Scrapling still finds it!
# the rest of your code...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's show you a &lt;strong&gt;Real-World Scenario&lt;/strong&gt;. But wait, to do this, we need to find a website that will soon change its design/structure, take a copy of its source, and then wait for the website to make the change. Of course, that's nearly impossible to know unless I know the website's owner, but that will make it a staged test, haha.&lt;/p&gt;

&lt;p&gt;To solve this issue, I will use &lt;a href="https://archive.org/" rel="noopener noreferrer"&gt;The Web Archive&lt;/a&gt;'s &lt;a href="https://web.archive.org/" rel="noopener noreferrer"&gt;Wayback Machine&lt;/a&gt;. Here is a copy of &lt;a href="https://web.archive.org/web/20100102003420/http://stackoverflow.com/" rel="noopener noreferrer"&gt;StackOverFlow's website in 2010&lt;/a&gt;; pretty old, eh?&lt;/p&gt;

&lt;p&gt;Let's test if the automatch feature can extract the same button in the old design from 2010 and the current design using the same selector 😄&lt;/p&gt;

&lt;p&gt;If I want to extract the Questions button from the old design, I can use a selector like this: &lt;code&gt;#hmenus &amp;gt; div:nth-child(1) &amp;gt; ul &amp;gt; li:nth-child(1) &amp;gt; a&lt;/code&gt; This selector is too specific because it was generated by Google Chrome.&lt;/p&gt;

&lt;p&gt;Now, let's test the same selector in both versions&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scrapling&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Fetcher&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;selector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#hmenus &amp;gt; div:nth-child(1) &amp;gt; ul &amp;gt; li:nth-child(1) &amp;gt; a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;old_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://web.archive.org/web/20100102003420/http://stackoverflow.com/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;new_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://stackoverflow.com/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Fetcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auto_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;automatch_domain&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;stackoverflow.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; 
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Fetcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;old_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;element1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;css_first&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auto_save&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; 
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="c1"&gt;# Same selector but used in the updated website
&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Fetcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;element2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;css_first&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auto_match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; 
&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;element1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;element2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Scrapling found the same element in the old and new designs!&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Scrapling found the same element in the old and new designs!&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that I used a new argument called &lt;code&gt;automatch_domain&lt;/code&gt;; this is because, for Scrapling, these are two different domains(&lt;code&gt;archive.org&lt;/code&gt; and &lt;code&gt;stackoverflow.com&lt;/code&gt;), so scrapling will isolate their &lt;code&gt;auto_match&lt;/code&gt; data. To tell Scrapling they are the same website, we need to pass the custom domain we want to use while saving auto-match data for them both, so Scrapling doesn't isolate them.&lt;/p&gt;

&lt;p&gt;The code will be the same in a real-world scenario, except it will use the same URL for both requests, so you won't need to use the &lt;code&gt;automatch_domain&lt;/code&gt; argument. This is the closest example I can give to real-world cases, so I hope it didn't confuse you 😄&lt;/p&gt;

&lt;p&gt;The rest of the details are on the &lt;a href="https://scrapling.readthedocs.io/en/latest/parsing/automatch/" rel="noopener noreferrer"&gt;automatch&lt;/a&gt; page on Scrapling's documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving issue T2: Unstable selectors
&lt;/h2&gt;

&lt;p&gt;If you have been doing Web scraping for a long enough time, you have likely experienced this once. I'm talking about a website that uses poor design patterns, is built on pure html without any IDs/classes, uses random class names that change a lot with no identifiers or attributes to rely on, and the list goes on!&lt;/p&gt;

&lt;p&gt;In these cases, standard selection methods with CSS/XPath selectors won't be optimal, and that's why Scrapling provides 3 more methods for Selection:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://scrapling.readthedocs.io/en/latest/parsing/selection/#text-content-selection" rel="noopener noreferrer"&gt;Selection by element content&lt;/a&gt; - Through text content (&lt;code&gt;find_by_text&lt;/code&gt;) or regex that match a text content (&lt;code&gt;find_by_regex&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapling.readthedocs.io/en/latest/parsing/selection/#finding-similar-elements" rel="noopener noreferrer"&gt;Selecting elements similar to another element&lt;/a&gt; - You find an element, and we will do the rest!&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapling.readthedocs.io/en/latest/parsing/selection/#filters-based-searching" rel="noopener noreferrer"&gt;Selecting elements by filters&lt;/a&gt; - You just specify conditions that this element must fulfill!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These selection methods need separate articles, so there is no need to explain any of these here but just click on the links from the documentation above, and take a deep dive!&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving issue T3: Increasingly complex anti-bot measures
&lt;/h2&gt;

&lt;p&gt;It's known that making an undetectable spider takes more than residential/mobile proxies and human-like behavior. It also needs a hard-to-detect browser, which Scrapling provides two main options to solve:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://scrapling.readthedocs.io/en/latest/fetching/dynamic/" rel="noopener noreferrer"&gt;PlayWrightFetcher&lt;/a&gt; — This fetcher provides not only stealth mode suitable for small-medium protections but also more flexible options, like using your real browser.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://scrapling.readthedocs.io/en/latest/fetching/stealthy/" rel="noopener noreferrer"&gt;StealthyFetcher&lt;/a&gt; — Because we live in a harsh world and you need to take &lt;a href="https://www.youtube.com/watch?v=7BE4QcwX4dU" rel="noopener noreferrer"&gt;full measure instead of half measures&lt;/a&gt;, &lt;code&gt;StealthyFetcher&lt;/code&gt; was born. This fetcher uses a modified Firefox browser called &lt;a href="https://camoufox.com/stealth/" rel="noopener noreferrer"&gt;Camoufox&lt;/a&gt; that almost passes all known tests and adds more tricks on top of it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The links will redirect you to the documentation of these two classes. Both classes will be improved a lot with the upcoming updates, so stay tuned 😄&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving issues B1 &amp;amp; B2: Extreme Website Diversity / Identifying Relevant Data
&lt;/h2&gt;

&lt;p&gt;This one is tough to handle, but it's possible with Scrapling's flexibility. &lt;/p&gt;

&lt;p&gt;I talked with someone who uses AI to extract prices from different websites. He is only interested in prices and titles, so he uses AI to find the price for him.&lt;/p&gt;

&lt;p&gt;I told him you don't need to use AI here and gave this code as an example&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;price_element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_by_regex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;£[\d\.,]+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Get the first element that contains a text that matches price regex eg. £10.50
# If you want the container/element that contains the price element
&lt;/span&gt;&lt;span class="n"&gt;price_element_container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;price_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;price_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_ancestor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;ancestor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ancestor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has_class&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# or other methods...
&lt;/span&gt;&lt;span class="n"&gt;target_element_selector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;price_element_container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_css_selector&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;price_element_container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_full_css_selector&lt;/span&gt; &lt;span class="c1"&gt;# or xpath
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then he said What about cases like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;'currency'&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt; $ &lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt; &lt;span class="nt"&gt;&amp;lt;span&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;'a-price'&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt; 45,000 &lt;span class="nt"&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, I updated the code like this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;price_element_container&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_by_regex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[\d,]+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;first_match&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="c1"&gt;# Adjusted the regex for this example
&lt;/span&gt;&lt;span class="n"&gt;full_price_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;price_element_container&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_all_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Returns '$45,000' in this case
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was enough for his use case. You can use the first regex, and if it doesn't find anything, use the following regex, and so on. Try to cover the most common patterns first, then the lesser common ones, and so on.&lt;br&gt;
It will be a bit boring, but it's definitely cheaper than AI.&lt;/p&gt;

&lt;p&gt;This example demonstrates the idea I wanted to deliver here. Not every challenge will need AI only to be solved, but sometimes you need to be creative, and that might save you a lot of money :)&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving issue B3: Pagination variations
&lt;/h2&gt;

&lt;p&gt;This issue Scrapling currently doesn't have a direct method to automatically extract pagination's URLs for you, but it will be added with the following updates 😄&lt;/p&gt;

&lt;p&gt;But you can handle most websites if you search for the most common patterns with &lt;code&gt;page.find_by_text('Next').attrib['href']&lt;/code&gt; or &lt;code&gt;page.find_by_text('load more').attrib['href']&lt;/code&gt; or selectors like &lt;code&gt;"a[href*="?page="]""&lt;/code&gt; or &lt;code&gt;"a[href*="/page/"]""&lt;/code&gt;—you get the idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Comparison and Savings
&lt;/h2&gt;

&lt;p&gt;For a quick comparison.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Scrapling&lt;/th&gt;
&lt;th&gt;AI-Based Tools (e.g., Browse AI, Oxylabs)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost Structure&lt;/td&gt;
&lt;td&gt;Likely free or low-cost, no per-use fees&lt;/td&gt;
&lt;td&gt;Starts at $19/month (Browse AI) to $49/month (Oxylabs), scales with usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup Effort&lt;/td&gt;
&lt;td&gt;Requires technical expertise, manual setup&lt;/td&gt;
&lt;td&gt;Often no-code, easier for non-technical users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scalability&lt;/td&gt;
&lt;td&gt;Depends on user implementation&lt;/td&gt;
&lt;td&gt;Built-in support for large-scale, managed services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adaptability&lt;/td&gt;
&lt;td&gt;High with features like automatch&lt;/td&gt;
&lt;td&gt;High, automatic with AI, but costly for frequent changes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This table is based on pricing from &lt;a href="https://www.browse.ai/pricing" rel="noopener noreferrer"&gt;Browse AI Pricing&lt;/a&gt; and &lt;a href="https://oxylabs.io/products/scraper-api/web/pricing" rel="noopener noreferrer"&gt;Oxylabs Web Scraper API Pricing&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;No challenges remain challenging for a long time, but it depends on how you look at it and how you are going to solve it. Will you go for the maybe-easier but expensive solution, or will you decide to stay longer with the challenge till you find a better solution? It all depends on you. I always like to look at how DeepSeek initially defeated OpenAI with fewer resources by thinking of more efficient solutions. Sometimes, it's like that :laugh:&lt;/p&gt;

&lt;p&gt;In the end, nothing is perfect. So, if you find an issue in &lt;a href="https://scrapling.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;Scrapling&lt;/a&gt;, &lt;a href="https://github.com/D4Vinci/Scrapling/issues" rel="noopener noreferrer"&gt;please don't hesitate to report it&lt;/a&gt;, since &lt;a href="https://scrapling.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;Scrapling&lt;/a&gt; is under heavy development (The next update is going to be insane, you'd better &lt;a href="https://discord.gg/EMgGbDceNQ" rel="noopener noreferrer"&gt;join our discord server&lt;/a&gt; to try beta features before anyone else!)&lt;/p&gt;

&lt;p&gt;I hope you like the article, and please let me know if you have any feedback!&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;small&gt;Disclaimer: This article is an improved and expanded version of the original article written by me &lt;a href="https://scrapling.readthedocs.io/en/latest/tutorials/replacing_ai/" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/small&gt;&lt;/small&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
