<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: datacollection</title>
    <description>The latest articles on DEV Community by datacollection (@datacollectionscraper).</description>
    <link>https://dev.to/datacollectionscraper</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2589871%2Fa57573ca-482f-4cce-a773-2ce85cd96b39.png</url>
      <title>DEV Community: datacollection</title>
      <link>https://dev.to/datacollectionscraper</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/datacollectionscraper"/>
    <language>en</language>
    <item>
      <title>Build Smart Business News Monitoring with Dify + Deep SerpApi</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Tue, 17 Jun 2025 10:36:19 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/build-smart-business-news-monitoring-with-dify-deep-serpapi-2307</link>
      <guid>https://dev.to/datacollectionscraper/build-smart-business-news-monitoring-with-dify-deep-serpapi-2307</guid>
      <description>&lt;p&gt;In today’s highly competitive landscape, staying informed of brand reputation, industry developments, and competitor intelligence in real time is crucial for effective decision-making. However, manually monitoring news and information is time-consuming, labor-intensive, and prone to missing critical insights.&lt;/p&gt;

&lt;p&gt;This solution integrates Dify, a leading no-code AI automation platform, with the &lt;a href="https://www.scrapeless.com/en/product/deep-serp-api" rel="noopener noreferrer"&gt;Scrapeless Deep SerpApi&lt;/a&gt;, an enterprise-grade Google Search data interface, to build a smart and scalable business news monitoring system that enables enterprises to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collect and filter real-time news automatically&lt;/li&gt;
&lt;li&gt;Leverage AI for intelligent analysis and actionable insights&lt;/li&gt;
&lt;li&gt;Push alerts and reports across multiple channels automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g1mt0i0xhhr6nmbqi6z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g1mt0i0xhhr6nmbqi6z.png" alt="Build Smart Business News Monitoring with Dify + Deep SerpAPI" width="800" height="188"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Solution Overview
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dify Intelligent Workflow Platform&lt;/td&gt;
&lt;td&gt;No-code workflow design and execution with drag-and-drop support for AI and API integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scrapeless Deep SerpApi&lt;/td&gt;
&lt;td&gt;High-speed, stable, anti-blocking Google Search API supporting multi-region and multilingual queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Models (e.g., GPT-4 / Claude)&lt;/td&gt;
&lt;td&gt;Performs automatic semantic analysis and generates intelligent news summaries and business insights&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notification Plugins (e.g., Discord Webhook)&lt;/td&gt;
&lt;td&gt;Real-time push of monitoring reports to ensure rapid information delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  2. Enterprise-Grade Tooling Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dify Intelligent Workflow Platform
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;A no-code AI automation platform designed for flexible, enterprise-grade workflows&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visual interface for drag-and-drop workflow building—no coding required
&lt;/li&gt;
&lt;li&gt;Seamless integration with mainstream AI models (GPT-4, Claude 3, Gemini, etc.)
&lt;/li&gt;
&lt;li&gt;Plugin ecosystem for connecting with APIs and external data sources
&lt;/li&gt;
&lt;li&gt;Real-time monitoring with detailed logs and error tracing
&lt;/li&gt;
&lt;li&gt;Role-based access control and team collaboration support
&lt;/li&gt;
&lt;li&gt;Suitable for private deployment in secure enterprise environments
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Scrapeless Deep SerpApi
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;A real-time, high-fidelity Google SERP API engineered for AI workflows and business intelligence&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://www.scrapeless.com/en/product/deep-serp-api?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=dify" rel="noopener noreferrer"&gt;Scrapeless Deep SerpApi&lt;/a&gt; is purpose-built for enterprise-grade use cases like brand monitoring, market intelligence, content generation, and AI-powered decision-making. It extracts real-time, structured data directly from Google search results (HTML parsing), ensuring &lt;strong&gt;accuracy, freshness, and reliability&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Key Advantages&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instant access to real-time Google SERP data&lt;/strong&gt; (under 3s response)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comprehensive result coverage&lt;/strong&gt;: organic results, google local, google image, google news etc.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero caching&lt;/strong&gt;: Direct HTML parsing ensures up-to-date, verifiable results
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-scraping technology&lt;/strong&gt;: 99.9% success rate, no manual proxy configuration needed
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supports 195+ countries and multiple languages&lt;/strong&gt; for global monitoring
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured output&lt;/strong&gt; in common data formats, making it easy for AI models and automated workflows to parse and analyze&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparent, usage-based billing&lt;/strong&gt; with no hidden limits or field restrictions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffp6npq973qn94swwm5ke.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffp6npq973qn94swwm5ke.png" alt="scrapeless deep serpapi" width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📌 &lt;em&gt;Ideal for:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building enterprise-grade media monitoring and alert systems
&lt;/li&gt;
&lt;li&gt;Tracking competitor activity and market trends globally
&lt;/li&gt;
&lt;li&gt;Creating search-tuned datasets for retrieval-augmented generation (RAG)
&lt;/li&gt;
&lt;li&gt;Powering SEO and content automation at scale
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Environment Setup &amp;amp; Account Registration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Register a Scrapeless Account and Obtain API Token
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Visit the &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=dify" rel="noopener noreferrer"&gt;Scrapeless Dashboard&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Register a business account&lt;/li&gt;
&lt;li&gt;After logging in, navigate to the &lt;strong&gt;API Management&lt;/strong&gt; page to obtain your API token&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwlhdygh6wewrn3tdifp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwlhdygh6wewrn3tdifp.png" alt="Scrapeless API Token" width="800" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⚠️ &lt;strong&gt;Important&lt;/strong&gt;: Keep your API token secure and never share it publicly.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Register a Dify account and install the Deep SerpApi plugin
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Sign up for Dify if you haven't already and install &lt;a href="https://marketplace.dify.ai/plugins/scrapelesshq/deep_serpapi" rel="noopener noreferrer"&gt;https://marketplace.dify.ai/plugins/scrapelesshq/deep_serpapi&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a new application and select "Workflow"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the workflow studio, click the "+" button to add a new tool&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Navigate to the "Tools" tab in the panel&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Look for "Deep SerpApi" by scrapelesshq (as shown in the tools list)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click on "Deep SerpApi" to add it to your workflow&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpuh0spmxbqe3zyw2melp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpuh0spmxbqe3zyw2melp.png" alt="Deep SerpAPI in Dify Tools" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Detailed Configuration Process
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Add the Deep SerpApi Node
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Click the "+" button in the workflow editor&lt;/li&gt;
&lt;li&gt;Select the &lt;strong&gt;Tools&lt;/strong&gt; tab&lt;/li&gt;
&lt;li&gt;Choose &lt;strong&gt;Deep SerpApi (Scrapeless)&lt;/strong&gt; and add it to your workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5a3x2a5dmk5wk18jfn6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5a3x2a5dmk5wk18jfn6.png" alt="Add Deep SerpAPI Node" width="800" height="314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the configuration panel, paste the API Token copied earlier
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitzzzz0no1prbl67qisu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitzzzz0no1prbl67qisu.png" alt="ApiToken2 Configuration" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 2: Configure Search Parameters
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;In the &lt;strong&gt;Query String&lt;/strong&gt; field of the Deep SerpApi node, enter your search query, for example:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;"Your Company Name" news&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Supports advanced search syntax such as:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;"Your Company Name" OR "Industry Keyword"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"Company Name" AND (announcement OR partnership)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;In this example, we use:
&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{{ company }} latest business news June 2025 site:reuters.com OR site:bloomberg.com OR site:cnn.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewmyb7e8r5wzfbpxp8r2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewmyb7e8r5wzfbpxp8r2.png" alt="Configure Search Parameters" width="800" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Add Template Node to Format Search Results
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Click the &lt;strong&gt;“+”&lt;/strong&gt; button after the &lt;strong&gt;Deep SerpApi&lt;/strong&gt; node.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;“Template”&lt;/strong&gt; from the available blocks.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfsst8f241z8mzeswd99.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfsst8f241z8mzeswd99.png" alt="Template Block" width="800" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the &lt;strong&gt;Template&lt;/strong&gt; field, enter the following formatting template:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Search Results:
{% for item in arg1[0].organic_results %}
- Title: {{ item.title }}
- Link: {{ item.link }}
{% endfor %}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;This template will display the search results in a structured manner to facilitate subsequent AI analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdpyi07oy1n1j3tlt6zka.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdpyi07oy1n1j3tlt6zka.png" alt="Template Configuration" width="800" height="291"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Configure the AI Analysis Node
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Click the &lt;strong&gt;“+”&lt;/strong&gt; button after the &lt;strong&gt;Deep SerpApi&lt;/strong&gt; node.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;“LLM”&lt;/strong&gt; from the available blocks.&lt;/li&gt;
&lt;li&gt;Choose your preferred AI model (&lt;strong&gt;GPT-4 is recommended&lt;/strong&gt;).
&amp;gt; You will need to click &lt;strong&gt;“Model Provider Settings”&lt;/strong&gt; to install or activate your model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bqg1zinp6bypbhy9vit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bqg1zinp6bypbhy9vit.png" alt="AI Analysis Node" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzieak9uy9r60jk8j5mqd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzieak9uy9r60jk8j5mqd.png" alt="Deep SerpAPI Configuration" width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You'll be taken to a page with a choice of LLM. You're free to choose the one you want. For our example, we'll use Claude.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghnzfjxmrunagxaqe9ws.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghnzfjxmrunagxaqe9ws.png" alt="Deep SerpAPI Configuration" width="800" height="616"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the &lt;strong&gt;System prompt&lt;/strong&gt;, reference the search results:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a business intelligence analyst.

Based on the following search results, generate a concise B2B intelligence report for the company "{{ company }}". Your report should include:

1. Overall sentiment (Positive/Neutral/Negative)
2. Major news developments or updates
3. Business risks or opportunities
4. Strategic implications for the company
5. Any urgent or noteworthy items

If the search results are too generic or lack company-specific content, please point that out and suggest how to improve the query.

Use bullet points where appropriate. Keep the tone professional and actionable.


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;In the &lt;strong&gt;User prompt&lt;/strong&gt;, reference the formatted template results:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Please analyze these search results about the company: insights based on the news titles、contents and sources found.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Then, in the Prompt text box, use / to call out the variable selector, and you can call out a list of variables, including output, text, sys., etc., for you to insert into the template or set variables. You can see the picture below.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wzt98ktabanbna536c5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wzt98ktabanbna536c5.png" alt="Deep SerpAPI Configuration" width="800" height="654"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1xsykacjipu4bfnvk8l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1xsykacjipu4bfnvk8l.png" alt="Deep SerpAPI Configuration" width="776" height="1280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Run and Debug the Workflow
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Click the &lt;strong&gt;Run&lt;/strong&gt; button in the top-right corner of the interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkcdp9vuis7lc6pypac6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwkcdp9vuis7lc6pypac6.png" alt="Run and Debug the Workflow" width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wait for the workflow to execute and check the output results
&lt;/li&gt;
&lt;li&gt;Based on the analysis results, adjust the search keywords and AI prompts to optimize performance&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Step 6: Integrate Enterprise Notification Channels (e.g., Discord Webhook) &lt;em&gt;(Optional)&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt;To receive notifications directly in your Discord server when the workflow completes, you can add a webhook integration:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add a New Block:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Click the &lt;strong&gt;“+”&lt;/strong&gt; button after your &lt;strong&gt;LLM analysis&lt;/strong&gt; step
&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;“Tools”&lt;/strong&gt; from the block menu
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8cqvg1o4j5u41m0mllz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz8cqvg1o4j5u41m0mllz.png" alt="Adding Tools Block" width="800" height="621"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find Discord Webhook in the Marketplace:&lt;/li&gt;
&lt;li&gt;In the Tools section, click on &lt;strong&gt;“Marketplace”&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Search for &lt;strong&gt;“Discord”&lt;/strong&gt; or &lt;strong&gt;“webhook”&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Install the &lt;strong&gt;Discord webhook tool&lt;/strong&gt; if it’s not already available  &lt;a href="https://marketplace.dify.ai/plugins/langgenius/discord" rel="noopener noreferrer"&gt;Discord Plugin on Marketplace&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fonb2qv89bt2b0l7czan2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fonb2qv89bt2b0l7czan2.png" alt="discord" width="800" height="529"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Configure Your Webhook:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Select the &lt;strong&gt;Discord Webhook&lt;/strong&gt; tool
&lt;/li&gt;
&lt;li&gt;Enter your &lt;strong&gt;Discord Webhook URL&lt;/strong&gt; (you can obtain this from your Discord server settings)
&lt;/li&gt;
&lt;li&gt;Customize the message format to include the &lt;strong&gt;analysis results&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Use variables from previous steps to include &lt;strong&gt;dynamic content&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71dhopkr72h5h0d9a5x6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71dhopkr72h5h0d9a5x6.png" alt="Discord Webhook Configuration" width="800" height="728"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Message Customization:&lt;/li&gt;
&lt;li&gt;Include the &lt;strong&gt;search query&lt;/strong&gt; in the notification
&lt;/li&gt;
&lt;li&gt;Add a &lt;strong&gt;summary of key findings&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Format the message for &lt;strong&gt;easy reading in Discord&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🔍 **Daily Business Intelligence Report**

/ context

---
📊 *Generated by Dify + Scrapeless Deep SerpAPI*

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; You can use any webhook service of your choice (Slack, Microsoft Teams, etc.) by following the same process and searching for the appropriate tool in the marketplace.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 7: Add an end node to complete the workflow configuration
&lt;/h3&gt;

&lt;p&gt;To properly complete your workflow, add an End block:&lt;br&gt;
&lt;strong&gt;1. Add Final Block:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click the "+" button after your webhook step (or LLM step if you skipped the webhook)&lt;/li&gt;
&lt;li&gt;Select "End" from the block menu&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Configure End Block:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The End block marks the completion of your workflow&lt;/li&gt;
&lt;li&gt;You can optionally configure output variables that will be returned when the workflow completes&lt;/li&gt;
&lt;li&gt;This is useful if you want to use this workflow as part of a larger automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxutgvq0j4qyx5bhm0y91.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxutgvq0j4qyx5bhm0y91.png" alt="End Block Configuration" width="800" height="368"&gt;&lt;/a&gt;&lt;br&gt;
Your complete workflow should now look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcddsgp7x3c7qhy5bfkxc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcddsgp7x3c7qhy5bfkxc.png" alt="AllConfiguration" width="800" height="188"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 8: Output the results
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptoz1jw81lb91v7syrkw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fptoz1jw81lb91v7syrkw.png" alt="Output the results" width="800" height="812"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🚀 Ready to Power Your Intelligence Workflows?&lt;/p&gt;

&lt;p&gt;Sign up for &lt;strong&gt;Scrapeless Google SERP API&lt;/strong&gt; today and instantly receive &lt;strong&gt;2,500 free API calls&lt;/strong&gt; — no credit card required.&lt;br&gt;&lt;br&gt;
Experience real-time, structured search data built for scale, precision, and AI-native workflows.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=dify" rel="noopener noreferrer"&gt;Get Started for Free&lt;/a&gt; and supercharge your next project!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Workflow Demo
&lt;/h2&gt;

&lt;p&gt;To help you better understand how this smart business news monitoring workflow runs from start to finish, we’ve created a short GIF demo. It shows each step in action — from fetching real-time search results with Deep SerpApi, formatting them with a Template block, analyzing the data using an LLM, and finally sending the insights via Discord webhook.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fposts%2Fbuild-smart-business-news-monitoring-with-dify%2Fb4dbedfdd265b58bd1cc5b8cc19cf1f7.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fposts%2Fbuild-smart-business-news-monitoring-with-dify%2Fb4dbedfdd265b58bd1cc5b8cc19cf1f7.gif" alt="Workflow Demo" width="8" height="5"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Success Stories &amp;amp; Performance Impact
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Leading Financial Institution
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;“From Reactive to Proactive” — Real-Time News Monitoring with 95% Accuracy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A major financial institution faced challenges in monitoring fast-moving news cycles related to banking regulations, reputational risks, and macroeconomic events. Prior to deploying the system, their compliance and risk teams relied heavily on manual media tracking, which was time-consuming and often delayed critical responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After integrating the Dify + Scrapeless monitoring system:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;News detection latency was reduced by &lt;strong&gt;80%&lt;/strong&gt;, enabling near real-time awareness of regulatory or reputational risks.&lt;/li&gt;
&lt;li&gt;The accuracy of sentiment-based alerting models improved to &lt;strong&gt;95%&lt;/strong&gt;, thanks to high-quality structured SERP data feeding AI classifiers.&lt;/li&gt;
&lt;li&gt; Cross-departmental collaboration improved, as alerts were pushed directly into internal Slack channels and BI dashboards.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result&lt;/strong&gt;: Risk mitigation windows were shortened from &lt;strong&gt;hours to minutes&lt;/strong&gt;, reducing potential damage from negative press or misinformation.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Global Manufacturing Enterprise
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;“Global Eyes, Local Insights” — Multi-language Market Intelligence at Scale&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This multinational manufacturing firm needed to monitor global news across diverse markets to inform its supply chain strategy, trade risk exposure, and competitor activity—especially across Europe, Southeast Asia, and Latin America.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With the integrated solution in place:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated SERP-based monitoring covered &lt;strong&gt;20+ languages&lt;/strong&gt; and &lt;strong&gt;100+ country-specific domains&lt;/strong&gt;, reducing blind spots in non-English media.&lt;/li&gt;
&lt;li&gt; Alerts about policy shifts, environmental incidents, or labor disputes were surfaced up to &lt;strong&gt;72 hours earlier&lt;/strong&gt; than previous manual workflows.&lt;/li&gt;
&lt;li&gt; Internal dashboards consolidated insights across time zones and teams, allowing senior decision-makers to act faster on global disruptions.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Result&lt;/strong&gt;: Strategic responsiveness improved significantly, particularly in &lt;strong&gt;procurement&lt;/strong&gt; and &lt;strong&gt;logistics planning&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;🔧 Want to Build More Intelligent Workflows?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're looking to take your data monitoring system to the next level, don’t miss these in-depth guides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;📈 &lt;a href="https://www.scrapeless.com/en/blog/build-intelligent-trend-monitoring-systems-with-make?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=dify" rel="noopener noreferrer"&gt;Build an Intelligent Trend Monitoring System with Make&lt;/a&gt;&lt;br&gt;
Learn how to combine Scrapeless with Make to create automated trend alerts and real-time dashboards.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🌍 &lt;a href="https://www.scrapeless.com/en/blog/build-google-trends-monitor-with-pipedream?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=dify" rel="noopener noreferrer"&gt;Build a Google Trends Monitor with Pipedream&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Discover how to set up a scalable trend tracking system using Google Trends API and Pipedream workflows.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Explore these tutorials and start building smarter, faster, and more automated intelligence pipelines today!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  6. FAQs &amp;amp; Best Practices
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Recommended Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No search results&lt;/td&gt;
&lt;td&gt;Check your API token validity and permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inaccurate search results&lt;/td&gt;
&lt;td&gt;Refine keywords and exclude irrelevant search terms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI analysis is not accurate&lt;/td&gt;
&lt;td&gt;Improve the prompt to clarify the main focus of the analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API quota exceeded or errors&lt;/td&gt;
&lt;td&gt;Monitor usage frequency and plan API calls accordingly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  7. Summary
&lt;/h2&gt;

&lt;p&gt;This solution leverages the deep integration between the Dify intelligent workflow platform and Scrapeless Deep SerpApi to enable automated monitoring and intelligent analysis of enterprise-level business news. With this system, companies can stay informed of brand developments in real time, gain insights into industry trends, respond quickly to market changes, and empower decision-makers to strategically plan for the future.&lt;/p&gt;

</description>
      <category>brightdatachallenge</category>
      <category>top7</category>
      <category>programming</category>
      <category>ai</category>
    </item>
    <item>
      <title>Building an AI-Powered Web Data Pipeline with n8n, Scrapeless, and Claude</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Mon, 19 May 2025 10:59:09 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/building-an-ai-powered-web-data-pipeline-with-n8n-scrapeless-and-claude-4eg6</link>
      <guid>https://dev.to/datacollectionscraper/building-an-ai-powered-web-data-pipeline-with-n8n-scrapeless-and-claude-4eg6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In today's data-driven landscape, organizations need efficient ways to extract, process, and analyze web content. Traditional web scraping faces numerous challenges: anti-bot protections, complex JavaScript rendering, and the need for constant maintenance. Furthermore, making sense of unstructured web data requires sophisticated processing.&lt;/p&gt;

&lt;p&gt;This guide demonstrates how to build a complete web data pipeline using n8n workflow automation, Scrapeless web scraping, Claude AI for intelligent extraction, and Qdrant vector database for semantic storage. Whether you're building a knowledge base, conducting market research, or developing an AI assistant, this workflow provides a powerful foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Build
&lt;/h2&gt;

&lt;p&gt;Our n8n workflow combines several cutting-edge technologies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scrapeless Web Unlocker: Advanced web scraping with JavaScript rendering&lt;/li&gt;
&lt;li&gt;Claude 3.7 Sonnet: AI-powered data extraction and structuring&lt;/li&gt;
&lt;li&gt;Ollama Embeddings: Local vector embedding generation&lt;/li&gt;
&lt;li&gt;Qdrant Vector Database: Semantic storage and retrieval&lt;/li&gt;
&lt;li&gt;Notification System: Real-time monitoring via webhooks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This end-to-end pipeline transforms messy web data into structured, vectorized information ready for semantic search and AI applications.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp8gg5ezli4bqhb0eed87.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp8gg5ezli4bqhb0eed87.png" alt="Building an AI-Powered Web Data Pipeline with n8n, Scrapeless, and Claude" width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Installation and Setup
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Installing n8n
&lt;/h3&gt;

&lt;p&gt;n8n requires Node.js v18, v20, or v22. If you encounter version compatibility issues:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Check your Node.js version
node -v

# If you have a newer unsupported version (e.g., v23+), install nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash
# Or for Windows, use NVM for Windows installer

# Install a compatible Node.js version
nvm install 20

# Use the installed version
nvm use 20

# Install n8n globally
npm install n8n -g

# Run n8n
n8n

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your n8n instance should now be available at &lt;a href="http://localhost:5678" rel="noopener noreferrer"&gt;http://localhost:5678&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up Claude API
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Visit Anthropic Console and create an account&lt;/li&gt;
&lt;li&gt;Navigate to API Keys section&lt;/li&gt;
&lt;li&gt;Click "Create Key" and set appropriate permissions&lt;/li&gt;
&lt;li&gt;Copy your API key for use in the n8n workflow (In AI Data Checker, Claude Data extractor and Claude AI Agent)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5rcu30edf1kp5ooty3v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr5rcu30edf1kp5ooty3v.png" alt="Setting up Claude API" width="800" height="631"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up Scrapeless
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Visit &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=n8n" rel="noopener noreferrer"&gt;Scrapeless&lt;/a&gt; and create an account&lt;/li&gt;
&lt;li&gt;Navigate to the Universal Scraping API section in your dashboard &lt;a href="https://app.scrapeless.com/exemple/overview" rel="noopener noreferrer"&gt;https://app.scrapeless.com/exemple/overview&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8chblhb64rblkb9h12c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8chblhb64rblkb9h12c.png" alt="Setting up Scrapeless" width="800" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy your token for use in the n8n workflow&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fijhssk70435fulm5h40q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fijhssk70435fulm5h40q.png" alt="Copy your token for use in the n8n workflow" width="800" height="769"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can customize your Scrapeless web scraping request using this curl command and import it directly into the HTTP Request node in n8n:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST "https://api.scrapeless.com/api/v1/unlocker/request" \
  -H "Content-Type: application/json" \
  -H "x-api-token: scrapeless_api_key" \
  -d '{
    "actor": "unlocker.webunlocker",
    "proxy": {
      "country": "ANY"
    },
    "input": {
      "url": "https://www.scrapeless.com",
      "method": "GET",
      "redirect": true,
      "js_render": true,
      "js_instructions": [{"wait":100}],
      "block": {
        "resources": ["image","font","script"],
        "urls": ["https://example.com"]
      }
    }
  }'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqpf3obzysieow1dvf4s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwqpf3obzysieow1dvf4s.png" alt="You can customize your Scrapeless web scraping request" width="800" height="224"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing Qdrant with Docker
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
# Pull Qdrant image
docker pull qdrant/qdrant

# Run Qdrant container with data persistence
docker run -d \
  --name qdrant-server \
  -p 6333:6333 \
  -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify Qdrant is running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl http://localhost:6333/healthz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Installing Ollama
&lt;/h3&gt;

&lt;p&gt;macOS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brew install ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Linux:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -fsSL https://ollama.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Windows: Download and install from Ollama's website.&lt;/p&gt;

&lt;p&gt;Start Ollama server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install the required embedding model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama pull all-minilm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify model installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting Up the n8n Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Workflow Overview
&lt;/h3&gt;

&lt;p&gt;Our workflow consists of these key components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Manual/Scheduled Trigger: Starts the workflow&lt;/li&gt;
&lt;li&gt;Collection Check: Verifies if Qdrant collection exists&lt;/li&gt;
&lt;li&gt;URL Configuration: Sets the target URL and parameters&lt;/li&gt;
&lt;li&gt;Scrapeless Web Request: Extracts HTML content&lt;/li&gt;
&lt;li&gt;Claude Data Extraction: Processes and structures the data&lt;/li&gt;
&lt;li&gt;Ollama Embeddings: Generates vector embeddings&lt;/li&gt;
&lt;li&gt;Qdrant Storage: Saves vectors and metadata&lt;/li&gt;
&lt;li&gt;Notification: Sends status updates via webhook&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 1: Configure Workflow Trigger and Collection Check
&lt;/h3&gt;

&lt;p&gt;Start by adding a Manual Trigger node, then add a HTTP Request node to check if your Qdrant collection exists. You can customize the collection name in this initial step - the workflow will automatically create the collection if it doesn't exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important Note:&lt;/strong&gt; If you want to use a different collection name than the default "hacker-news", make sure to change it consistently in ALL nodes that reference Qdrant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Configure Scrapeless Web Request
&lt;/h3&gt;

&lt;p&gt;Add an HTTP Request node for Scrapeless web scraping. Configure the node using the curl command provided earlier as a reference, replacing YOUR_API_TOKEN with your actual Scrapeless API token.&lt;/p&gt;

&lt;p&gt;You can configure more advanced scraping parameters at Scrapeless Web Unlocker.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Claude Data Extraction
&lt;/h3&gt;

&lt;p&gt;Add a node to process the HTML content using Claude. You'll need to provide your Claude API key for authentication. The Claude extractor analyzes the HTML content and returns structured data in JSON format.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Format Claude Output
&lt;/h3&gt;

&lt;p&gt;This node takes Claude's response and prepares it for vectorization by extracting the relevant information and formatting it appropriately.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Ollama Embeddings Generation
&lt;/h3&gt;

&lt;p&gt;This node sends the structured text to Ollama for embedding generation. Make sure your Ollama server is running and the all-minilm model is installed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Qdrant Vector Storage
&lt;/h3&gt;

&lt;p&gt;This node takes the generated embeddings and stores them in your Qdrant collection along with relevant metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Notification System
&lt;/h3&gt;

&lt;p&gt;The final node sends a notification with the status of the workflow execution via your configured webhook.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting Common Issues
&lt;/h2&gt;

&lt;h3&gt;
  
  
  n8n Node.js Version Issues
&lt;/h3&gt;

&lt;p&gt;If you see an error like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your Node.js version X is currently not supported by n8n.
Please use Node.js v18.17.0 (recommended), v20, or v22 instead!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix by installing nvm and using a compatible Node.js version as described in the setup section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scrapeless API Connection Issues
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Verify your API token is correct&lt;/li&gt;
&lt;li&gt;Check if you're hitting API rate limits&lt;/li&gt;
&lt;li&gt;Ensure proper URL formatting&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Ollama Embedding Errors
&lt;/h3&gt;

&lt;p&gt;Common error: connect ECONNREFUSED ::1:11434&lt;/p&gt;

&lt;p&gt;Fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure Ollama is running: ollama serve&lt;/li&gt;
&lt;li&gt;Verify model is installed: ollama pull all-minilm&lt;/li&gt;
&lt;li&gt;Use direct IP (127.0.0.1) instead of localhost&lt;/li&gt;
&lt;li&gt;Check if another process is using port 11434&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Advanced Usage Scenarios
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Batch Processing Multiple URLs
&lt;/h3&gt;

&lt;p&gt;To process multiple URLs in one workflow execution:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use a Split In Batches node to process URLs in parallel&lt;/li&gt;
&lt;li&gt;Configure proper error handling for each batch&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use the Merge node to combine results&lt;/p&gt;
&lt;h3&gt;
  
  
  Scheduled Data Updates
&lt;/h3&gt;

&lt;p&gt;Keep your vector database current with scheduled updates:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Replace manual trigger with Schedule node&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Configure update frequency (daily, weekly, etc.)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use the If node to process only new or changed content&lt;/p&gt;
&lt;h3&gt;
  
  
  Custom Extraction Templates
&lt;/h3&gt;

&lt;p&gt;Adapt Claude's extraction for different content types:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create specific prompts for news articles, product pages, documentation, etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use the Switch node to select the appropriate prompt&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Store extraction templates as environment variables&lt;/p&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This n8n workflow creates a powerful data pipeline combining the strengths of Scrapeless web scraping, Claude AI extraction, vector embeddings, and Qdrant storage. By automating these complex processes, you can focus on using the extracted data rather than the technical challenges of obtaining it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The modular nature of n8n allows you to extend this workflow with additional processing steps, integration with other systems, or custom logic to meet your specific needs. Whether you're building an AI knowledge base, conducting competitive analysis, or monitoring web content, this workflow provides a solid foundation.&lt;/p&gt;

</description>
      <category>n8n</category>
      <category>tooling</category>
      <category>scraping</category>
      <category>scrapingbrowser</category>
    </item>
    <item>
      <title>Best Practices for Automation and Web Scraping Using Scrapeless Scraping Browser</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Thu, 08 May 2025 13:47:37 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/best-practices-for-automation-and-web-scraping-using-scrapeless-scraping-browser-373l</link>
      <guid>https://dev.to/datacollectionscraper/best-practices-for-automation-and-web-scraping-using-scrapeless-scraping-browser-373l</guid>
      <description>&lt;h2&gt;
  
  
  Introduction: A New Paradigm of Browser Automation and Data Collection in the AI Era
&lt;/h2&gt;

&lt;p&gt;With the rapid rise of generative AI, AI agents, and data-intensive applications, browsers are evolving from traditional "user interaction tools" into "data execution engines" for intelligent systems. In this new paradigm, many tasks no longer rely on single API endpoints but instead leverage automated browser control to handle complex page interactions, content scraping, task orchestration, and context retrieval.&lt;/p&gt;

&lt;p&gt;From price comparisons on e-commerce sites and map screenshots to search engine result parsing and social media content extraction, the browser is becoming a crucial interface for AI to access real-world data. However, the complexity of modern web structures, robust anti-bot measures, and high concurrency demands pose significant technical and operational challenges for traditional solutions like local Puppeteer/Playwright instances or proxy rotation strategies.&lt;/p&gt;

&lt;p&gt;Enter the Scrapeless Scraping Browser—an advanced, cloud-based browser platform purpose-built for large-scale automation. It overcomes key technical barriers such as anti-scraping mechanisms, fingerprint detection, and proxy maintenance. Furthermore, it offers cloud-native concurrency scheduling, human-like behavior simulation, and structured data extraction, positioning itself as a vital infrastructure component in the next generation of automation systems and data pipelines.&lt;/p&gt;

&lt;p&gt;This article explores the core capabilities of Scrapeless and its practical applications in browser automation and web scraping. By analyzing current industry trends and future directions, we aim to provide developers, product builders, and data teams with a comprehensive and systematic guide.&lt;/p&gt;

&lt;h2&gt;
  
  
  I. Background: Why Do We Need Scrapeless Scraping Browser?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1.1 The Evolution of Browser Automation
&lt;/h3&gt;

&lt;p&gt;In the AI-driven automation era, browsers are no longer just tools for human interaction—they have become essential execution endpoints for acquiring both structured and unstructured data. In many real-world scenarios, APIs are either unavailable or limited, making it necessary to simulate human behavior via browsers for data collection, task execution, and information extraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common use cases include:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Price comparison on e-commerce sites&lt;/strong&gt;: Price and stock data are often loaded asynchronously in the browser.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parsing search engine result pages&lt;/strong&gt;: Content must be fully loaded by scrolling and clicking on page elements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multilingual websites, legacy systems, and intranet platforms&lt;/strong&gt;: Data access is impossible via API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional scraping solutions (e.g., locally run Puppeteer/Playwright or proxy rotation setups) often suffer from poor stability under high concurrency, frequent anti-bot blocking, and high maintenance costs. Scrapeless Scraping Browser, with its cloud-native deployment and real browser behavior simulation, provides developers with a high-availability, reliable browser automation platform—serving as critical infrastructure for AI automation systems and data workflows.&lt;/p&gt;




&lt;h3&gt;
  
  
  1.2 The Challenge of Anti-Bot Mechanisms
&lt;/h3&gt;

&lt;p&gt;At the same time, as anti-bot technologies evolve, traditional crawler tools are increasingly flagged as bot traffic by target websites, resulting in IP bans and access restrictions. Common anti-scraping mechanisms include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Browser fingerprinting&lt;/strong&gt;: Detects abnormal access patterns via User-Agent, canvas rendering, TLS handshake, and more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CAPTCHA verification&lt;/strong&gt;: Requires users to prove they are human.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IP blacklisting&lt;/strong&gt;: Blocks IPs that access too frequently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavioral analysis algorithms&lt;/strong&gt;: Detect unusual mouse movement, scroll speeds, and interaction logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scrapeless Scraping Browser effectively overcomes these challenges through precise browser fingerprint customization, built-in CAPTCHA solving, and flexible proxy support—becoming core infrastructure for the next generation of automation tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  II. Core Capabilities of Scrapeless
&lt;/h2&gt;

&lt;p&gt;The Scrapeless Scraping Browser delivers powerful core capabilities, offering users stable, efficient, and scalable data interaction features. Below are its main functional modules and technical details:&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Real Browser Environment
&lt;/h3&gt;

&lt;p&gt;Scrapeless is built on the &lt;strong&gt;Chromium engine&lt;/strong&gt;, providing a complete browser environment capable of simulating real user behavior. Key features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TLS fingerprint spoofing&lt;/strong&gt;: Fakes TLS handshake parameters to bypass traditional anti-bot mechanisms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic fingerprint obfuscation&lt;/strong&gt;: Adjusts User-Agent, screen resolution, timezone, etc., to make each session appear highly human-like.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Localization support&lt;/strong&gt;: Customize language, region, and timezone settings to make interactions with target websites more natural.&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  Deep Customization of Browser Fingerprints
&lt;/h5&gt;

&lt;p&gt;Scrapeless offers comprehensive customization of browser fingerprints, allowing users to create more "authentic" browsing environments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;User-Agent control&lt;/strong&gt;: Define the User-Agent string in browser HTTP requests, including browser engine, version, and OS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Screen resolution mapping&lt;/strong&gt;: Set the return values of &lt;code&gt;screen.width&lt;/code&gt; and &lt;code&gt;screen.height&lt;/code&gt; to simulate common display sizes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform property locking&lt;/strong&gt;: Specify the return value of &lt;code&gt;navigator.platform&lt;/code&gt; in JavaScript to simulate the operating system type.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Localized environment emulation&lt;/strong&gt;: Fully supports custom localization settings, affecting content rendering, time format, and language preference detection on websites.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2.2 Cloud-Based Deployment and Scalability
&lt;/h3&gt;

&lt;p&gt;Scrapeless is fully deployed in the cloud and offers the following advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No local resources required&lt;/strong&gt;: Reduces hardware costs and improves deployment flexibility.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Globally distributed nodes&lt;/strong&gt;: Supports large-scale concurrent tasks and overcomes geographic restrictions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High concurrency support&lt;/strong&gt;: From 50 to unlimited concurrent sessions—ideal for everything from small tasks to complex automation workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Performance Comparison
&lt;/h4&gt;

&lt;p&gt;Compared with traditional tools such as &lt;strong&gt;Selenium&lt;/strong&gt; and &lt;strong&gt;Playwright&lt;/strong&gt;, Scrapeless excels in high-concurrency scenarios. Below is a simple comparison table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Scrapeless&lt;/th&gt;
&lt;th&gt;Selenium&lt;/th&gt;
&lt;th&gt;Playwright&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency Support&lt;/td&gt;
&lt;td&gt;Unlimited (Enterprise-grade customization)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fingerprint Customization&lt;/td&gt;
&lt;td&gt;Advanced&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CAPTCHA Solving&lt;/td&gt;
&lt;td&gt;Built-in (98% success rate)  &lt;br&gt; Supports reCAPTCHA, Cloudflare Turnstile/Challenge, AWS WAF, DataDome, etc.&lt;/td&gt;
&lt;td&gt;External dependency&lt;/td&gt;
&lt;td&gt;External dependency&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At the same time, Scrapeless performs better than other competing products in high-concurrency scenarios. The following is a summary of its capabilities from different dimensions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature / Platform&lt;/th&gt;
&lt;th&gt;Scrapeless&lt;/th&gt;
&lt;th&gt;Browserless&lt;/th&gt;
&lt;th&gt;Browserbase&lt;/th&gt;
&lt;th&gt;HyperBrowser&lt;/th&gt;
&lt;th&gt;Bright Data&lt;/th&gt;
&lt;th&gt;ZenRows&lt;/th&gt;
&lt;th&gt;Steel.dev&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deployment Method&lt;/td&gt;
&lt;td&gt;Cloud-based&lt;/td&gt;
&lt;td&gt;Cloud-based Puppeteer containers&lt;/td&gt;
&lt;td&gt;Multi-browser cloud cluster&lt;/td&gt;
&lt;td&gt;Cloud-based headless browser platform&lt;/td&gt;
&lt;td&gt;Cloud deployment&lt;/td&gt;
&lt;td&gt;Browser API interface&lt;/td&gt;
&lt;td&gt;Browser cloud cluster + Browser API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency Support&lt;/td&gt;
&lt;td&gt;50 to Unlimited&lt;/td&gt;
&lt;td&gt;3–50&lt;/td&gt;
&lt;td&gt;3–50&lt;/td&gt;
&lt;td&gt;1–250&lt;/td&gt;
&lt;td&gt;Up to unlimited (depending on plan)&lt;/td&gt;
&lt;td&gt;Up to 100 (Business plan)&lt;/td&gt;
&lt;td&gt;No official data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anti-Detection Capability&lt;/td&gt;
&lt;td&gt;Free CAPTCHA recognition &amp;amp; bypass, supports reCAPTCHA, Cloudflare Turnstile/Challenge, AWS WAF, DataDome, etc.&lt;/td&gt;
&lt;td&gt;CAPTCHA bypass&lt;/td&gt;
&lt;td&gt;CAPTCHA bypass + Incognito Mode&lt;/td&gt;
&lt;td&gt;CAPTCHA bypass + Incognito + Session Mgmt&lt;/td&gt;
&lt;td&gt;CAPTCHA bypass + Fingerprint spoofing + Proxy&lt;/td&gt;
&lt;td&gt;Custom browser fingerprints&lt;/td&gt;
&lt;td&gt;Proxy + Fingerprint recognition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser Runtime Cost&lt;/td&gt;
&lt;td&gt;$0.063 – $0.090/hour (includes free CAPTCHA bypass)&lt;/td&gt;
&lt;td&gt;$0.084 – $0.15/hour (unit-based)&lt;/td&gt;
&lt;td&gt;$0.10 – $0.198/hour (includes 2–5GB free proxy)&lt;/td&gt;
&lt;td&gt;$30–$100/month&lt;/td&gt;
&lt;td&gt;~$0.10/hour&lt;/td&gt;
&lt;td&gt;~$0.09/hour&lt;/td&gt;
&lt;td&gt;$0.05 – $0.08/hour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proxy Cost&lt;/td&gt;
&lt;td&gt;$1.26 – $1.80/GB&lt;/td&gt;
&lt;td&gt;$4.3/GB&lt;/td&gt;
&lt;td&gt;$10/GB (beyond free quota)&lt;/td&gt;
&lt;td&gt;No official data&lt;/td&gt;
&lt;td&gt;$9.5/GB (standard); $12.5/GB (premium domains)&lt;/td&gt;
&lt;td&gt;$2.8 – $5.42/GB&lt;/td&gt;
&lt;td&gt;$3 – $8.25/GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.3 CAPTCHA automatic solution and event monitoring mechanism
&lt;/h3&gt;

&lt;p&gt;Scrapeless provides advanced CAPTCHA solutions and extends a series of custom functions through Chrome DevTools Protocol (CDP) to enhance the reliability of browser automation.&lt;/p&gt;

&lt;h4&gt;
  
  
  CAPTCHA solving ability
&lt;/h4&gt;

&lt;p&gt;Scrapeless can automatically handle mainstream CAPTCHA types, including: reCAPTCHA, Cloudflare Turnstile/Challange, AWS WAF, DataDome, etc.&lt;/p&gt;

&lt;h4&gt;
  
  
  Event monitoring mechanism
&lt;/h4&gt;

&lt;p&gt;Scrapeless provides three core events for monitoring the CAPTCHA solving process:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Event Name&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Captcha.detected&lt;/td&gt;
&lt;td&gt;CAPTCHA detected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Captcha.solveFinished&lt;/td&gt;
&lt;td&gt;CAPTCHA solved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Captcha.solveFailed&lt;/td&gt;
&lt;td&gt;CAPTCHA solving failed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h5&gt;
  
  
  Event Response Data Structure
&lt;/h5&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;type&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;CAPTCHA type (e.g., recaptcha, turnstile)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;success&lt;/td&gt;
&lt;td&gt;boolean&lt;/td&gt;
&lt;td&gt;Result of solving&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;message&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Status message (e.g., "NOT_DETECTED", "SOLVE_FINISHED")&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;token?&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Returned token upon success (optional)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2.4 Powerful proxy support
&lt;/h3&gt;

&lt;p&gt;Scrapeless provides a flexible and controllable proxy integration system that supports multiple proxy modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built-in residential proxy: supports geographic proxy in 195 countries/regions around the world, out of the box.&lt;/li&gt;
&lt;li&gt;Custom proxy (premium subscription): allows users to connect to their own proxy service, which is not included in Scrapeless's proxy billing.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2.5 Session replay
&lt;/h3&gt;

&lt;p&gt;Session replay is one of the most powerful features of Scrapeless Scraping Browser. It allows you to replay the session page by page to check the operations and network requests performed.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Code example: Scrapeless integration and use
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Use of Scrapeless Scraping Browser
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Puppeteer example&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const puppeteer = require('puppeteer-core');
const connectionURL = 'wss://browser.scrapeless.com/browser?token=your-scrapeless-api-key&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY';

(async () =&amp;gt; {
    const browser = await puppeteer.connect({browserWSEndpoint: connectionURL});
    const page = await browser.newPage();
    await page.goto('https://www.scrapeless.com');
    console.log(await page.title());
    await browser.close();
})();

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Playwright Example&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const {chromium} = require('playwright-core');
const connectionURL = 'wss://browser.scrapeless.com/browser?token=your-scrapeless-api-key&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY';

(async () =&amp;gt; {
    const browser = await chromium.connectOverCDP(connectionURL);
    const page = await browser.newPage();
    await page.goto('https://www.scrapeless.com');
    console.log(await page.title());
    await browser.close();
})();

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.2 Scrapeless Scraping Browser Fingerprint Parameters Example Code
&lt;/h3&gt;

&lt;p&gt;The following is a simple example code showing how to integrate Scrapeless's browser fingerprint customization function through Puppeteer and Playwright:&lt;br&gt;
&lt;strong&gt;Puppeteer Example&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const puppeteer = require('puppeteer-core');

// custom browser fingerprint
const fingerprint = {
    userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.1.2.3 Safari/537.36',
    platform: 'Windows',
    screen: {
        width: 1280, height: 1024
    },
    localization: {
        languages: ['zh-HK', 'en-US', 'en'], timezone: 'Asia/Hong_Kong',
    }
}

const query = new URLSearchParams({
  token: 'APIKey', // required
  session_ttl: 180,
  proxy_country: 'ANY',
  fingerprint: encodeURIComponent(JSON.stringify(fingerprint)),
});

const connectionURL = `wss://browser.scrapeless.com/browser?${query.toString()}`;

(async () =&amp;gt; {
    const browser = await puppeteer.connect({browserWSEndpoint: connectionURL});
    const page = await browser.newPage();
    await page.goto('https://www.scrapeless.com');
    const info = await page.evaluate(() =&amp;gt; {
        return {
            screen: {
                width: screen.width,
                height: screen.height,
            },
            userAgent: navigator.userAgent,
            timeZone: Intl.DateTimeFormat().resolvedOptions().timeZone,
            languages: navigator.languages
        };
    });
    console.log(info);
    await browser.close();
})();


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Playwright Example&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const { chromium } = require('playwright-core');

// custom browser fingerprint
const fingerprint = {
    userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.1.2.3 Safari/537.36',
    platform: 'Windows',
    screen: {
        width: 1280, height: 1024
    },
    localization: {
        languages: ['zh-HK', 'en-US', 'en'], timezone: 'Asia/Hong_Kong',
    }
}

const query = new URLSearchParams({
  token: 'APIKey', // required
  session_ttl: 180,
  proxy_country: 'ANY',
  fingerprint: encodeURIComponent(JSON.stringify(fingerprint)),
});

const connectionURL = `wss://browser.scrapeless.com/browser?${query.toString()}`;

(async () =&amp;gt; {
    const browser = await chromium.connectOverCDP(connectionURL);
    const page = await browser.newPage();
    await page.goto('https://www.scrapeless.com');
    const info = await page.evaluate(() =&amp;gt; {
        return {
            screen: {
                width: screen.width,
                height: screen.height,
            },
            userAgent: navigator.userAgent,
            timeZone: Intl.DateTimeFormat().resolvedOptions().timeZone,
            languages: navigator.languages
        };
    });
    console.log(info);
    await browser.close();
})();


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.3 CAPTCHA event monitoring example
&lt;/h3&gt;

&lt;p&gt;The following is a complete code example of using Scrapeless to monitor CAPTCHA events, showing how to monitor the solution status of CAPTCHA in real time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Listen for CAPTCHA solving events
const client = await page.createCDPSession();

client.on('Captcha.detected', (result) =&amp;gt; {
  console.log('Captcha detected:', result);
});

await new Promise((resolve, reject) =&amp;gt; {
  client.on('Captcha.solveFinished', (result) =&amp;gt; {
    if (result.success) resolve();
  });
  client.on('Captcha.solveFailed', () =&amp;gt;
    reject(new Error('Captcha solve failed'))
  );
  setTimeout(() =&amp;gt;
      reject(new Error('Captcha solve timeout')),
    5 * 60 * 1000
  );
});

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After mastering the core features and advantages of Scrapeless Scraping Browser, we can not only better understand its value in modern web scraping but also leverage its performance advantages more effectively. To help developers automate and scrape websites more efficiently and securely, we will now explore how to apply Scrapeless Scraping Browser in specific use cases, based on common scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Best Practices for Automation and Web Scraping Using Scrapeless Scraping Browser
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Legal Disclaimer and Precautions&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This tutorial covers popular web scraping techniques for education. Interacting with public servers requires diligence and respect and here's a good summary of what not to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do not scrape at rates that could damage the website.&lt;/li&gt;
&lt;li&gt;Do not scrape data that's not available publicly.&lt;/li&gt;
&lt;li&gt;Do not store PII of EU citizens who are protected by GDPR.&lt;/li&gt;
&lt;li&gt;Do not repurpose the entire public datasets which can be illegal in some countries.
### Understanding Cloudflare Protection&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;What is Cloudflare?&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cloudflare is a cloud platform that integrates content delivery network (CDN), DNS acceleration, and security protection. Websites use Cloudflare to mitigate Distributed Denial of Service (DDoS) attacks (i.e., websites going offline due to multiple access requests) and ensure that websites using it are always operational.&lt;br&gt;&lt;br&gt;
Here’s a simple example to understand how Cloudflare works:&lt;br&gt;&lt;br&gt;
When you visit a website that has Cloudflare enabled (such as example.com), your request first reaches Cloudflare’s edge server, not the origin server. Cloudflare will then determine whether to allow your request to continue based on several rules, such as:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether the cached page can be returned directly;
&lt;/li&gt;
&lt;li&gt;Whether you need to pass a CAPTCHA test;
&lt;/li&gt;
&lt;li&gt;Whether your request will be blocked;
&lt;/li&gt;
&lt;li&gt;Whether the request will be forwarded to the actual website server (origin).
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are identified as a legitimate user, Cloudflare will forward the request to the origin server and return the content to you. This mechanism greatly enhances the website's security but also presents significant challenges for automated access.&lt;br&gt;&lt;br&gt;
Bypassing Cloudflare is one of the toughest technical challenges in many data collection tasks. Below, we will dive deeper into why bypassing Cloudflare is difficult.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Challenges in Bypassing Cloudflare Protection&lt;/strong&gt;
Bypassing Cloudflare is not easy, especially when advanced anti-bot features (such as Bot Management, Managed Challenge, Turnstile Verification, JS challenges, etc.) are enabled. Many traditional scraping tools (like Selenium and Puppeteer) are often detected and blocked before requests are even made due to obvious fingerprint features or unnatural behavior simulations.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Although there are some open-source tools specifically designed to bypass Cloudflare (such as FlareSolverr, undetected-chromedriver), these tools typically have a short lifespan. Once they are widely used, Cloudflare quickly updates its detection rules to block them. This means that to bypass Cloudflare's protection mechanisms in a sustained and stable manner, teams often need in-house development capabilities and continuous resource investment for maintenance and updates.&lt;br&gt;&lt;br&gt;
Here are the main challenges in bypassing Cloudflare protection:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strict Browser Fingerprint Recognition&lt;/strong&gt;: Cloudflare detects fingerprint features in requests such as User-Agent, language settings, screen resolution, time zone, and Canvas/WebGL rendering. If it detects abnormal browsers or automation behaviors, it blocks the request.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex JS Challenge Mechanisms&lt;/strong&gt;: Cloudflare dynamically generates JavaScript challenges (such as CAPTCHA, delayed redirects, logical calculations, etc.), and automated scripts often struggle to correctly parse or execute these complex logics.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavioral Analysis Systems&lt;/strong&gt;: In addition to static fingerprints, Cloudflare also analyzes user behavior trajectories, such as mouse movements, time spent on a page, scrolling actions, etc. This requires high precision in simulating human behavior.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate and Concurrency Control&lt;/strong&gt;: High-frequency access can easily trigger Cloudflare’s rate limiting and IP blocking strategies. Proxy pools and distributed scheduling must be highly optimized.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invisible Server-Side Validation&lt;/strong&gt;: Since Cloudflare is an edge interceptor, many real requests are blocked before reaching the origin server, making traditional packet capture analysis methods ineffective.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, successfully bypassing Cloudflare requires simulating real browser behavior, executing JavaScript dynamically, configuring fingerprints flexibly, and using high-quality proxies and dynamic scheduling mechanisms.&lt;/p&gt;
&lt;h2&gt;
  
  
  Bypassing Idealista Cloudflare with Scrapeless Scraping Browser to Collect Real Estate Data
&lt;/h2&gt;



&lt;p&gt;In this chapter, we will demonstrate how to use Scrapeless Scraping Browser to build an efficient, stable, and anti-anti-scraping automation system for scraping real estate data from Idealista, a leading European real estate platform. Idealista employs multiple protection mechanisms, including Cloudflare, dynamic loading, IP rate limiting, and user behavior recognition, making it a highly challenging target platform.&lt;br&gt;&lt;br&gt;
We will focus on the following technical aspects:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bypassing Cloudflare verification pages
&lt;/li&gt;
&lt;li&gt;Custom fingerprinting and simulating real user behavior
&lt;/li&gt;
&lt;li&gt;Using Session Replay
&lt;/li&gt;
&lt;li&gt;High-concurrency scraping with multiple proxy pools
&lt;/li&gt;
&lt;li&gt;Cost optimization
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Understanding the Challenge: Idealista's Cloudflare Protection
&lt;/h3&gt;

&lt;p&gt;Idealista is a leading online real estate platform in Southern Europe, offering millions of listings for various types of properties, including residential homes, apartments, and shared rooms. Given the highly commercial value of its property data, the platform has implemented strict anti-scraping measures.&lt;br&gt;&lt;br&gt;
To combat automated scraping, Idealista has deployed Cloudflare — a widely used anti-bot and security protection system designed to defend against malicious bots, DDoS attacks, and data abuse. Cloudflare's anti-scraping mechanisms primarily consist of the following elements:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Access Verification Mechanisms&lt;/strong&gt;: Including JS Challenge, browser integrity checks, and CAPTCHA verification, to determine whether the visitor is a real user.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavioral Analysis&lt;/strong&gt;: Detecting real users through actions such as mouse movements, clicking patterns, and scroll speeds.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP Header Analysis&lt;/strong&gt;: Inspecting browser types, language settings, and referrer data to check for discrepancies. Suspicious headers may expose attempts to disguise automated bots.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fingerprint Detection and Blocking&lt;/strong&gt;: Identifying traffic generated by automation tools (like Selenium and Puppeteer) through browser fingerprints, TLS fingerprints, and header information.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge Node Filtering&lt;/strong&gt;: Requests first enter Cloudflare's global edge network, which evaluates their risk. Only requests deemed low-risk are forwarded to Idealista's origin servers.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next, we will explain in detail how to use Scrapeless Scraping Browser to bypass Idealista's Cloudflare protection and successfully collect real estate data.&lt;/p&gt;
&lt;h3&gt;
  
  
  Bypassing Idealista Cloudflare with Scrapeless Scraping Browser
&lt;/h3&gt;


&lt;h4&gt;
  
  
  Prerequisites
&lt;/h4&gt;

&lt;p&gt;Before we begin, let's make sure we have the necessary tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python&lt;/strong&gt;: If you haven't installed Python yet, please download the latest version and install it on your system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Required Libraries&lt;/strong&gt;: You need to install several Python libraries. Open a terminal or command prompt and run the following command:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  pip install requests beautifulsoup4 lxml selenium selenium-wire undetected-chromedriver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ChromeDriver&lt;/strong&gt;: Download &lt;a href="https://developer.chrome.com/docs/chromedriver/downloads" rel="noopener noreferrer"&gt;ChromeDriver&lt;/a&gt;. Make sure to choose the version that matches your installed version of Chrome.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scrapeless Account&lt;/strong&gt;: To bypass Idealista's bot protection, you’ll need a Scrapeless Scraping Browser account. You can &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=scrapingbrowser" rel="noopener noreferrer"&gt;sign up here&lt;/a&gt; and receive a $2 free trial.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Locating the Data
&lt;/h4&gt;

&lt;p&gt;Our goal is to extract detailed information about each property listing on Idealista. We can use the browser’s developer tools to understand the structure of the site and identify the HTML elements we need to target.&lt;/p&gt;

&lt;p&gt;Right-click anywhere on the page and select &lt;strong&gt;Inspect&lt;/strong&gt; to view the page source.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frzu55pdt0mac2589uxb7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frzu55pdt0mac2589uxb7.png" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, we will focus on scraping property listings from Alcala de Henares, Madrid using the following URL:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.idealista.com/venta-viviendas/alcala-de-henares-madrid/" rel="noopener noreferrer"&gt;https://www.idealista.com/venta-viviendas/alcala-de-henares-madrid/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We want to extract the following data points from each listing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Title&lt;/li&gt;
&lt;li&gt;Price&lt;/li&gt;
&lt;li&gt;Area information&lt;/li&gt;
&lt;li&gt;Property description&lt;/li&gt;
&lt;li&gt;Image URLs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below you can see the annotated property listing page showing where all the information for each property is located.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72q8h2m4xmyy1bo86pcn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72q8h2m4xmyy1bo86pcn.png" width="800" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By inspecting the HTML source code, we can identify the CSS selector for each data point. CSS selectors are patterns used to select elements in an HTML document.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flksqx5ltzjh55wiqibdc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flksqx5ltzjh55wiqibdc.png" width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By inspecting the HTML source code, we found that each property listing is contained within an &lt;code&gt;&amp;lt;article&amp;gt;&lt;/code&gt; tag with the class &lt;code&gt;item&lt;/code&gt;. Within each item:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The title is located in an &lt;code&gt;&amp;lt;a&amp;gt;&lt;/code&gt; tag with the class &lt;code&gt;item-link&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The price is found in a &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; tag with the class &lt;code&gt;item-price&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;And so on for other data points.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 1: Set Up Selenium with ChromeDriver
&lt;/h3&gt;

&lt;p&gt;First, we need to configure Selenium to use ChromeDriver. Start by setting up &lt;code&gt;chrome_options&lt;/code&gt; and initializing the ChromeDriver.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time
from datetime import datetime
import json
def listings(url):
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    s = Service("Replace with your path to ChromeDriver")
    driver = webdriver.Chrome(service=s, chrome_options=chrome_options)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code imports the necessary modules, including &lt;code&gt;seleniumwire&lt;/code&gt; for advanced browser interactions and &lt;code&gt;BeautifulSoup&lt;/code&gt; for HTML parsing. &lt;/p&gt;

&lt;p&gt;We define a function &lt;code&gt;listings(url)&lt;/code&gt; and configure Chrome to run in headless mode by adding the &lt;code&gt;--headless&lt;/code&gt; argument to &lt;code&gt;chrome_options&lt;/code&gt;. Then, we initialize ChromeDriver using the specified service path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Load the Target URL
&lt;/h3&gt;

&lt;p&gt;Next, we load the target URL and wait for the page to fully load.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    driver.get(url)
    time.sleep(8)  # Adjust based on website's load time

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the &lt;code&gt;driver.get(url)&lt;/code&gt; command instructs the browser to navigate to the specified URL. &lt;/p&gt;

&lt;p&gt;We use &lt;code&gt;time.sleep(8)&lt;/code&gt; to pause the script for 8 seconds, allowing enough time for the web page to fully load. This wait time can be adjusted depending on the website's loading speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Parse the Page Content
&lt;/h3&gt;

&lt;p&gt;Once the page is loaded, we use BeautifulSoup to parse its content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    soup = BeautifulSoup(driver.page_source, "lxml")
    driver.quit()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we use &lt;code&gt;driver.page_source&lt;/code&gt; to retrieve the HTML content of the loaded page, and parse it using BeautifulSoup with the &lt;code&gt;lxml&lt;/code&gt; parser. Finally, we call &lt;code&gt;driver.quit()&lt;/code&gt; to close the browser instance and clean up resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Extract Data from the Parsed HTML
&lt;/h3&gt;

&lt;p&gt;Next, we extract the relevant data from the parsed HTML.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    house_listings = soup.find_all("article", class_="item")
    extracted_data = []
    for listing in house_listings:
        description_elem = listing.find("div", class_="item-description")
        description_text = description_elem.get_text(strip=True) if description_elem else "nil"
        item_details = listing.find_all("span", class_="item-detail")
        bedrooms = item_details[0].get_text(strip=True) if len(item_details) &amp;gt; 0 else "nil"
        area = item_details[1].get_text(strip=True) if len(item_details) &amp;gt; 1 else "nil"
        image_urls = [img["src"] for img in listing.find_all("img") if img.get("src")]
        first_image_url = image_urls[0] if image_urls else "nil"
        listing_info = {
            "Title": listing.find("a", class_="item-link").get("title", "nil"),
            "Price": listing.find("span", class_="item-price").get_text(strip=True),
            "Bedrooms": bedrooms,
            "Area": area,
            "Description": description_text,
            "Image URL": first_image_url,
        }
        extracted_data.append(listing_info)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we look for all elements matching the &lt;code&gt;article&lt;/code&gt; tag with the class name &lt;code&gt;item&lt;/code&gt;, which represent individual property listings. For each listing, we extract its title, details (such as number of bedrooms and area), and the image URL. We store these details in a dictionary and append each dictionary to a list called &lt;code&gt;extracted_data&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Save the Extracted Data
&lt;/h3&gt;

&lt;p&gt;Finally, we save the extracted data into a JSON file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   current_datetime = datetime.now().strftime("%Y%m%d%H%M%S")
    json_filename = f"new_revised_data_{current_datetime}.json"
    with open(json_filename, "w", encoding="utf-8") as json_file:
        json.dump(extracted_data, json_file, ensure_ascii=False, indent=2)
    print(f"Extracted data saved to {json_filename}")
url = "https://www.idealista.com/venta-viviendas/alcala-de-henares-madrid/"
idealista_listings = listings(url)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the complete code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time
from datetime import datetime
import json
def listings(url):
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    s = Service("Replace with your path to ChromeDriver")
    driver = webdriver.Chrome(service=s, chrome_options=chrome_options)
    driver.get(url)
    time.sleep(8)  # Adjust based on website's load time
    soup = BeautifulSoup(driver.page_source, "lxml")
    driver.quit()
    house_listings = soup.find_all("article", class_="item")
    extracted_data = []
    for listing in house_listings:
        description_elem = listing.find("div", class_="item-description")
        description_text = description_elem.get_text(strip=True) if description_elem else "nil"
        item_details = listing.find_all("span", class_="item-detail")
        bedrooms = item_details[0].get_text(strip=True) if len(item_details) &amp;gt; 0 else "nil"
        area = item_details[1].get_text(strip=True) if len(item_details) &amp;gt; 1 else "nil"
        image_urls = [img["src"] for img in listing.find_all("img") if img.get("src")]
        first_image_url = image_urls[0] if image_urls else "nil"
        listing_info = {
            "Title": listing.find("a", class_="item-link").get("title", "nil"),
            "Price": listing.find("span", class_="item-price").get_text(strip=True),
            "Bedrooms": bedrooms,
            "Area": area,
            "Description": description_text,
            "Image URL": first_image_url,
        }
        extracted_data.append(listing_info)
    current_datetime = datetime.now().strftime("%Y%m%d%H%M%S")
    json_filename = f"new_revised_data_{current_datetime}.json"
    with open(json_filename, "w", encoding="utf-8") as json_file:
        json.dump(extracted_data, json_file, ensure_ascii=False, indent=2)
    print(f"Extracted data saved to {json_filename}")
url = "https://www.idealista.com/venta-viviendas/alcala-de-henares-madrid/"
idealista_listings = listings(url)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bypassing Bot Detection
&lt;/h3&gt;

&lt;p&gt;If you’ve run the script at least twice during this tutorial, you may have noticed that a CAPTCHA page appears.&lt;/p&gt;

&lt;p&gt;The Cloudflare Challenge page initially loads the &lt;code&gt;cf-chl-bypass&lt;/code&gt; script and performs JavaScript computations, which typically takes about 5 seconds.&lt;/p&gt;

&lt;p&gt;Scrapeless offers a simple and reliable way to access data from sites like Idealista without having to build and maintain your own scraping infrastructure. The Scrapeless Scraping Browser is a high-concurrency automation solution built for AI. It’s a high-performance, cost-effective, anti-blocking browser platform designed for large-scale data scraping and simulates highly human-like behavior. It can handle reCAPTCHA, Cloudflare Turnstile/Challenge, AWS WAF, DataDome, and more in real time, making it an efficient web scraping solution.&lt;/p&gt;

&lt;p&gt;Below are the steps to bypass Cloudflare protection using Scrapeless:&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1: Preparation
&lt;/h4&gt;

&lt;h5&gt;
  
  
  1.1 Create a Project Folder
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;Create a new folder for your project, for example, &lt;code&gt;scrapeless-bypass&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Navigate to the folder in your terminal:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd path/to/scrapeless-bypass

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  1.2 Initialize the Node.js project
&lt;/h5&gt;

&lt;p&gt;Run the following command to create the package.json file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm init -y

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  1.3 Install required dependencies
&lt;/h5&gt;

&lt;p&gt;Install Puppeteer-core, which allows remote connections to the browser instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm install puppeteer-core

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If Puppeteer is not already installed on your system, install the full version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm install puppeteer puppeteer-core

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Get Your Scrapeless API Key
&lt;/h4&gt;

&lt;h5&gt;
  
  
  2.1 Sign Up on Scrapeless
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=scrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless&lt;/a&gt; and create an account.&lt;/li&gt;
&lt;li&gt;Navigate to the &lt;strong&gt;API Key Management&lt;/strong&gt; section.&lt;/li&gt;
&lt;li&gt;Generate a new API key and copy it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu20e0jra72r01ruf5swa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu20e0jra72r01ruf5swa.png" width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 3: Connect to Scrapeless Browserless
&lt;/h4&gt;

&lt;h5&gt;
  
  
  3.1 Get the WebSocket connection URL
&lt;/h5&gt;

&lt;p&gt;Scrapeless provides Puppeteer with a WebSocket connection URL to interact with the cloud-based browser.&lt;/p&gt;

&lt;p&gt;The format is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wss://browser.scrapeless.com/browser?token=APIKey&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Replace APIKey with your actual Scrapeless API key.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h5&gt;
  
  
  3.2 Configure Connection Parameters
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;token&lt;/code&gt;: Your Scrapeless API key
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;session_ttl&lt;/code&gt;: Duration of the browser session (in seconds), e.g., &lt;code&gt;180&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;proxy_country&lt;/code&gt;: Country code of the proxy server (e.g., &lt;code&gt;GB&lt;/code&gt; for the United Kingdom, &lt;code&gt;US&lt;/code&gt; for the United States)&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  Step 4: Write the Puppeteer Script
&lt;/h4&gt;

&lt;h5&gt;
  
  
  4.1 Create the Script File
&lt;/h5&gt;

&lt;p&gt;Inside your project folder, create a new JavaScript file named &lt;code&gt;bypass-cloudflare.js&lt;/code&gt;.&lt;/p&gt;

&lt;h5&gt;
  
  
  4.2 Connect to Scrapeless and Launch Puppeteer
&lt;/h5&gt;

&lt;p&gt;Add the following code to &lt;code&gt;bypass-cloudflare.js&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import puppeteer from 'puppeteer-core';

const API_KEY = 'your_api_key'; // Replace with your actual API Keyconst host = 'wss://browser.scrapeless.com';
const query = new URLSearchParams({token: API_KEY,session_ttl: '180', // Browser session duration in secondsproxy_country: 'GB', // Proxy country codeproxy_session_id: 'test_session', // Proxy session ID (keeps the same IP)proxy_session_duration: '5' // Proxy session duration in minutes
}).toString();

const connectionURL = `${host}/browser?${query}`;

const browser = await puppeteer.connect({browserWSEndpoint: connectionURL,defaultViewport: null,
});
console.log('Connected to Scrapeless');

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  4.3 Open a webpage and bypass Cloudflare
&lt;/h5&gt;

&lt;p&gt;Extend the script to open a new page and navigate to a website protected by Cloudflare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const page = await browser.newPage();
await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', { waitUntil: 'domcontentloaded' });

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  4.4 Waiting for page elements to load
&lt;/h5&gt;

&lt;p&gt;Make sure Cloudflare protection is bypassed before proceeding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;await page.waitForSelector('main.page-content .challenge-info', { timeout: 30000 }); // Adjust selector as needed

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  4.5 Take a screenshot
&lt;/h5&gt;

&lt;p&gt;To verify whether Cloudflare protection has been successfully bypassed, take a screenshot of the page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;await page.screenshot({ path: 'challenge-bypass.png' });
console.log('Screenshot saved as challenge-bypass.png');

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  4.6 Complete script
&lt;/h5&gt;

&lt;p&gt;The following is the complete script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import puppeteer from 'puppeteer-core';

const API_KEY = 'your_api_key'; // Replace with your actual API Key
const host = 'wss://browser.scrapeless.com';
const query = new URLSearchParams({
  token: API_KEY,
  session_ttl: '180',
  proxy_country: 'GB',
  proxy_session_id: 'test_session',
  proxy_session_duration: '5'
}).toString();

const connectionURL = `${host}/browser?${query}`;

(async () =&amp;gt; {
  try {
    // Connect to Scrapeless
    const browser = await puppeteer.connect({
      browserWSEndpoint: connectionURL,
      defaultViewport: null,
    });
    console.log('Connected to Scrapeless');

    // Open a new page and navigate to the target website
    const page = await browser.newPage();
    await page.goto('https://www.scrapingcourse.com/cloudflare-challenge', { waitUntil: 'domcontentloaded' });

    // Wait for the page to load completely
    await page.waitForTimeout(5000); // Adjust delay if necessary
    await page.waitForSelector('main.page-content', { timeout: 30000 });

    // Capture a screenshot
    await page.screenshot({ path: 'challenge-bypass.png' });
    console.log('Screenshot saved as challenge-bypass.png');

    // Close the browser
    await browser.close();
    console.log('Browser closed');
  } catch (error) {
    console.error('Error:', error);
  }
})();

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 5: Run the script
&lt;/h4&gt;

&lt;h5&gt;
  
  
  5.1 Save the script
&lt;/h5&gt;

&lt;p&gt;Make sure the script is saved as bypass-cloudflare.js.&lt;/p&gt;

&lt;h5&gt;
  
  
  5.2 Execute the script
&lt;/h5&gt;

&lt;p&gt;Run the script using Node.js:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;node bypass-cloudflare.js

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  5.3 Expected Output
&lt;/h5&gt;

&lt;p&gt;If everything is set up correctly, the terminal will display:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Connected to Scrapeless
Screenshot saved as challenge-bypass.png
Browser closed

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The challenge-bypass.png file will appear in your project folder, confirming that Cloudflare protection has been successfully bypassed.&lt;/p&gt;

&lt;p&gt;You can also integrate Scrapeless Scraping Browser directly into your scraping code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const puppeteer = require('puppeteer-core');
const connectionURL = 'wss://browser.scrapeless.com/browser?token=C4778985476352D77C08ECB031AF0857&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY';

(async () =&amp;gt; {
    const browser = await puppeteer.connect({browserWSEndpoint: connectionURL});
    const page = await browser.newPage();
    await page.goto('https://www.scrapeless.com');
    console.log(await page.title());
    await browser.close();
})();

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fingerprint Customization
&lt;/h3&gt;

&lt;p&gt;When scraping data from websites—especially large real estate platforms like &lt;strong&gt;Idealista&lt;/strong&gt;—even if you successfully bypass &lt;strong&gt;Cloudflare&lt;/strong&gt; challenges using &lt;strong&gt;Scrapeless&lt;/strong&gt;, you might still be flagged as a bot due to repetitive or high-volume access.&lt;/p&gt;

&lt;p&gt;Websites often use &lt;strong&gt;browser fingerprinting&lt;/strong&gt; to detect automated behavior and restrict access.&lt;/p&gt;




&lt;h4&gt;
  
  
  ⚠️ Common Issues You May Encounter
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Slow response times after multiple scrapes&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The site may throttle requests based on IP or behavioral patterns.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Page layout fails to render&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Dynamic content may rely on real browser environments, causing missing or broken data during scraping.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Missing listings in certain regions&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Websites may block or hide content based on suspicious traffic patterns.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;These problems are usually caused by identical browser configurations for each request. If your browser fingerprint remains unchanged, it’s easy for anti-bot systems to detect automation.&lt;/p&gt;




&lt;h4&gt;
  
  
  Solution: Custom Fingerprinting with Scrapeless
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Scrapeless Scraping Browser&lt;/strong&gt; provides built-in support for fingerprint customization to mimic real user behavior and avoid detection.&lt;/p&gt;

&lt;p&gt;You can &lt;strong&gt;randomize or customize&lt;/strong&gt; the following fingerprint elements:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fingerprint Element&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User-Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mimic various OS/browser combinations (e.g., Chrome on Windows/Mac).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simulate different operating systems (Windows, macOS, etc.).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Screen Size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Emulate various device resolutions to avoid mobile/desktop mismatches.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Localization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Align language and timezone with geolocation for consistency.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;By rotating or customizing these values, each request appears more natural—reducing the risk of detection and improving data extraction reliability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const puppeteer = require('puppeteer-core');

const query = new URLSearchParams({
  token: 'your-scrapeless-api-key', // required
  session_ttl: 180,
  proxy_country: 'ANY',
  // Set fingerprint parameters
  userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.6998.45 Safari/537.36',
  platform: 'Windows',
  screen: JSON.stringify({ width: 1280, height: 1024 }),
  localization: JSON.stringify({
    locale: 'zh-HK',
    languages: ['zh-HK', 'en-US', 'en'],
    timezone: 'Asia/Hong_Kong',
  })
});

const connectionURL = `wss://browser.Scrapeless.com/browser?${query.toString()}`;

(async () =&amp;gt; {
    const browser = await puppeteer.connect({browserWSEndpoint: connectionURL});
    const page = await browser.newPage();
    await page.goto('https://www.Scrapeless.com');
    console.log(await page.title());
    await browser.close();
})();



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Session Replay
&lt;/h3&gt;

&lt;p&gt;After customizing browser fingerprints, page stability significantly improves, and content extraction becomes more reliable.&lt;/p&gt;

&lt;p&gt;However, during large-scale scraping operations, unexpected issues may still cause extraction failures. To address this, &lt;strong&gt;Scrapeless&lt;/strong&gt; offers a powerful &lt;strong&gt;Session Replay&lt;/strong&gt; feature.&lt;/p&gt;




&lt;h4&gt;
  
  
  What is Session Replay?
&lt;/h4&gt;

&lt;p&gt;Session Replay records the entire browser session in detail, capturing all interactions, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Page load process
&lt;/li&gt;
&lt;li&gt;Network request and response data
&lt;/li&gt;
&lt;li&gt;JavaScript execution behavior
&lt;/li&gt;
&lt;li&gt;Dynamically loaded but unparsed content
&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  Why Use Session Replay?
&lt;/h4&gt;

&lt;p&gt;When scraping complex websites like &lt;strong&gt;Idealista&lt;/strong&gt;, Session Replay can greatly improve debugging efficiency.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Precise Issue Tracking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Quickly identify failed requests without guesswork&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;No Need to Re-run Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Analyze issues directly from the replay instead of rerunning the scraper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Improved Collaboration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Share replay logs with team members for easier troubleshooting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dynamic Content Analysis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Understand how dynamically loaded data behaves during scraping&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h4&gt;
  
  
  Usage Tip
&lt;/h4&gt;

&lt;p&gt;Once &lt;strong&gt;Session Replay&lt;/strong&gt; is enabled, check the replay logs first whenever a scrape fails or data looks incomplete. This helps you diagnose the issue faster and reduce debugging time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Proxy Configuration
&lt;/h3&gt;

&lt;p&gt;When scraping Idealista, it's important to note that the platform is highly sensitive to non-local IP addresses—especially when accessing listings from specific cities. If your IP originates from outside the country, Idealista may:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Block the request entirely
&lt;/li&gt;
&lt;li&gt;Return a simplified or stripped-down version of the page
&lt;/li&gt;
&lt;li&gt;Serve empty or incomplete data, even without triggering a CAPTCHA
&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  Scrapeless Built-in Proxy Support
&lt;/h4&gt;

&lt;p&gt;Scrapeless offers &lt;strong&gt;built-in proxy configuration&lt;/strong&gt;, allowing you to specify your geographic source directly.&lt;/p&gt;

&lt;p&gt;You can configure this using either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;proxy_country&lt;/code&gt;: A two-letter country code (e.g., &lt;code&gt;'ES'&lt;/code&gt; for Spain)
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;proxy_url&lt;/code&gt;: Your own proxy server URL
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;proxy_country: 'ES',

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  High Concurrency
&lt;/h3&gt;

&lt;p&gt;The page we just scraped from Idealista—&lt;a href="https://www.idealista.com/venta-viviendas/alcala-de-henares-madrid/" rel="noopener noreferrer"&gt;Alcalá de Henares Real Estate Listings&lt;/a&gt;—has as many as 6 pages of listings. &lt;/p&gt;

&lt;p&gt;When you're researching industry trends or gathering competitive marketing strategies, you might need to scrape real estate data from &lt;strong&gt;20+ cities daily&lt;/strong&gt;, covering &lt;strong&gt;thousands of pages&lt;/strong&gt;. In some cases, you may even need to refresh this data every hour.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxwxqd6l7acuomi5s85b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxwxqd6l7acuomi5s85b.png" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  High-Concurrency Requirements
&lt;/h4&gt;

&lt;p&gt;To handle this volume efficiently, consider the following requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multiple concurrent connections&lt;/strong&gt;: To scrape data from hundreds of pages without long wait times.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation tools&lt;/strong&gt;: Use Scrapeless Scraping Browser or similar tools that can handle concurrent requests at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session management&lt;/strong&gt;: Maintain persistent sessions to avoid excessive CAPTCHAs or IP blocks.&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  Scrapeless Scalability
&lt;/h4&gt;

&lt;p&gt;Scrapeless is specifically designed for high-concurrency scraping. It offers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parallel browser sessions&lt;/strong&gt;: Handle multiple requests simultaneously, allowing you to scrape large amounts of data across many cities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low-cost, high-efficiency scraping&lt;/strong&gt;: Scraping in parallel reduces the cost per page scraped while optimizing throughput.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bypass high-volume anti-bot defenses&lt;/strong&gt;: Automatically handles CAPTCHA and other verification systems, even during high-load scraping.&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip&lt;/strong&gt;: Ensure your requests are spaced out enough to mimic human-like browsing behavior and prevent rate-limiting or bans from Idealista.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Scalability &amp;amp; Cost Efficiency
&lt;/h4&gt;

&lt;p&gt;Regular Puppeteer struggles to efficiently scale sessions and integrate with queuing systems. However, Scrapeless Scraping Browser supports seamless scaling from &lt;strong&gt;dozens&lt;/strong&gt; of concurrent sessions to &lt;strong&gt;unlimited&lt;/strong&gt; concurrent sessions, ensuring &lt;strong&gt;zero queue time and zero timeouts&lt;/strong&gt; even during peak task loads.&lt;/p&gt;

&lt;p&gt;Here’s a comparison of various tools for high-concurrency scraping. Even with Scrapeless' high-concurrency browser, you don’t need to worry about costs—in fact, it can help you save nearly &lt;strong&gt;50%&lt;/strong&gt; in fees.&lt;/p&gt;




&lt;h4&gt;
  
  
  Tool Comparison
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Tool Name&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Hourly Rate (USD/hour)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Proxy Fees (USD/GB)&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Concurrent Support&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scrapeless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.063 – $0.090/hour (depends on concurrency &amp;amp; usage)&lt;/td&gt;
&lt;td&gt;$1.26 – $1.80/GB&lt;/td&gt;
&lt;td&gt;50 / 100 / 200 / 400 / 600 / 1000 / Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Browserbase&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.10 – $0.198/hour (includes 2-5GB free proxies)&lt;/td&gt;
&lt;td&gt;$10/GB (after the free allocation)&lt;/td&gt;
&lt;td&gt;3 (Basic) / 50 (Advanced)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Brightdata&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.10/hour&lt;/td&gt;
&lt;td&gt;$9.5/GB (Standard); $12.5/GB (Advanced domains)&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zenrows&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.09/hour&lt;/td&gt;
&lt;td&gt;$2.8 – $5.42/GB&lt;/td&gt;
&lt;td&gt;Up to 100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Browserless&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.084 – $0.15/hour (unit-based billing)&lt;/td&gt;
&lt;td&gt;$4.3/GB&lt;/td&gt;
&lt;td&gt;3 / 10 / 50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip&lt;/strong&gt;: If you require &lt;strong&gt;massive-scale scraping&lt;/strong&gt; and &lt;strong&gt;high-concurrency support&lt;/strong&gt;, &lt;strong&gt;Scrapeless&lt;/strong&gt; offers the best cost-to-performance ratio.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Cost Control Strategies for Web Scraping
&lt;/h3&gt;

&lt;p&gt;Careful users may have noticed that the Idealista pages we scrape often contain large amounts of high-definition property images, interactive maps, video presentations, and ad scripts. While these elements are user-friendly for end users, they are unnecessary for data extraction and significantly increase bandwidth consumption and costs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyoch05yctopi11hu1slj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyoch05yctopi11hu1slj.png" width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To optimize traffic usage, we recommend users employ the following strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Resource Interception&lt;/strong&gt;: Intercept unnecessary resource requests to reduce traffic consumption.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request URL Interception&lt;/strong&gt;: Intercept specific requests based on URL characteristics to further minimize traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simulate Mobile Devices&lt;/strong&gt;: Use mobile device configurations to fetch lighter page versions.&lt;/li&gt;
&lt;/ol&gt;




&lt;h4&gt;
  
  
  Detailed Strategies
&lt;/h4&gt;

&lt;h5&gt;
  
  
  1. &lt;strong&gt;Resource Interception&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;Enabling resource interception can significantly improve scraping efficiency. By configuring Puppeteer's &lt;code&gt;setRequestInterception&lt;/code&gt; function, we can block resources such as images, media, fonts, and stylesheets, avoiding large content downloads.&lt;/p&gt;

&lt;h5&gt;
  
  
  2. &lt;strong&gt;Request URL Filtering&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;By examining request URLs, we can filter out irrelevant requests like advertising services and third-party analytics scripts that are unrelated to the data extraction. This reduces unnecessary network traffic.&lt;/p&gt;

&lt;h5&gt;
  
  
  3. &lt;strong&gt;Simulating Mobile Devices&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;Simulating a mobile device (e.g., setting the user agent to an iPhone) allows you to fetch a lighter, mobile-optimized version of the page. This results in fewer resources being loaded and speeds up the scraping process.&lt;/p&gt;

&lt;blockquote&gt;
&lt;h2&gt;
  
  
  For more information, please refer to the &lt;a href="https://docs.scrapeless.com/en/scraping-browser/guides/optimizing-cost/" rel="noopener noreferrer"&gt;Scrapeless official documentation&lt;/a&gt;
&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Example Code
&lt;/h4&gt;

&lt;p&gt;Here’s an example of combining these three strategies using Scrapeless Cloud Browser + Puppeteer for optimized resource scraping:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import puppeteer from 'puppeteer-core';

const scrapelessUrl = 'wss://browser.scrapeless.com/browser?token=your_api_key&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY';

async function scrapeWithResourceBlocking(url) {
    const browser = await puppeteer.connect({
        browserWSEndpoint: scrapelessUrl,
        defaultViewport: null
    });
    const page = await browser.newPage();

    // Enable request interception
    await page.setRequestInterception(true);

    // Define resource types to block
    const BLOCKED_TYPES = new Set([
        'image',
        'font',
        'media',
        'stylesheet',
    ]);

    // Intercept requests
    page.on('request', (request) =&amp;gt; {
        if (BLOCKED_TYPES.has(request.resourceType())) {
            request.abort();
            console.log(`Blocked: ${request.resourceType()} - ${request.url().substring(0, 50)}...`);
        } else {
            request.continue();
        }
    });

    await page.goto(url, {waitUntil: 'domcontentloaded'});

    // Extract data
    const data = await page.evaluate(() =&amp;gt; {
        return {
            title: document.title,
            content: document.body.innerText.substring(0, 1000)
        };
    });

    await browser.close();
    return data;
}

// Usage
scrapeWithResourceBlocking('https://www.scrapeless.com')
    .then(data =&amp;gt; console.log('Scraping result:', data))
    .catch(error =&amp;gt; console.error('Scraping failed:', error));

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this way, you can not only save high traffic costs, but also speed up the crawling speed while ensuring data quality, thereby improving the overall stability and efficiency of the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  5.  Security and Compliance Recommendations
&lt;/h2&gt;

&lt;p&gt;When using Scrapeless for data scraping, developers should pay attention to the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Comply with the target website's &lt;code&gt;robots.txt&lt;/code&gt; file and relevant laws and regulations&lt;/strong&gt;: Ensure that your scraping activities are legal and respect the site's guidelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid excessive requests that could lead to website downtime&lt;/strong&gt;: Be mindful of scraping frequency to prevent server overload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do not scrape sensitive information&lt;/strong&gt;: Do not collect user privacy data, payment information, or any other sensitive content.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  6.  Conclusion
&lt;/h2&gt;

&lt;p&gt;In the age of big data, data collection has become a crucial foundation for digital transformation across industries. Especially in fields such as market intelligence, e-commerce price comparison, competitive analysis, financial risk management, and real estate analysis, the demand for data-driven decision-making has become increasingly urgent. However, with the continuous evolution of web technologies, particularly the widespread use of dynamically loaded content, traditional web scrapers are gradually revealing their limitations. These limitations not only make scraping more difficult but also lead to the escalation of anti-scraping mechanisms, raising the barrier for web scraping.&lt;/p&gt;

&lt;p&gt;With the advancement of web technologies, traditional scrapers can no longer meet complex scraping needs. Below are some key challenges and corresponding solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Content Loading&lt;/strong&gt;: Browser-based scrapers, by simulating real browser rendering of JavaScript content, ensure they can scrape dynamically loaded web data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-Scraping Mechanisms&lt;/strong&gt;: Using proxy pools, fingerprint recognition, behavior simulation, and other techniques, we can bypass the anti-scraping mechanisms commonly triggered by traditional scrapers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-Concurrency Scraping&lt;/strong&gt;: Headless browsers support high-concurrency task deployment, paired with proxy scheduling, to meet the needs of large-scale data scraping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance Issues&lt;/strong&gt;: By using legal APIs and proxy services, scraping activities can be ensured to comply with the terms of the target websites.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a result, browser-based scrapers have become the new trend in the industry. This technology not only simulates user behavior through real browsers but also flexibly handles modern websites' complex structures and anti-scraping mechanisms, offering developers more stable and efficient scraping solutions.&lt;/p&gt;

&lt;p&gt;Scrapeless Scraping Browser embraces this technological trend by combining browser rendering, proxy management, anti-detection technologies, and high-concurrency task scheduling, helping developers efficiently and stably complete data scraping tasks in complex online environments. It improves scraping efficiency and stability through several core advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-Concurrency Browser Solutions&lt;/strong&gt;: Scrapeless supports large-scale, high-concurrency tasks, enabling rapid deployment of thousands of scraping tasks to meet long-term scraping demands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-Detection as a Service&lt;/strong&gt;: Built-in CAPTCHA Solvers and customizable fingerprints help developers bypass fingerprint and behavior recognition mechanisms, greatly reducing the risk of being blocked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual Debugging Tool - Session Replay&lt;/strong&gt;: By replaying each browser interaction during the scraping process, developers can easily debug and diagnose issues in the scraping process, especially for handling complex pages and dynamically loaded content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance and Transparency Assurance&lt;/strong&gt;: Scrapeless emphasizes compliant data scraping, supporting adherence to website &lt;code&gt;robots.txt&lt;/code&gt; rules and providing detailed scraping logs to ensure that users' data scraping activities comply with target websites' policies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible Scalability&lt;/strong&gt;: Scrapeless integrates seamlessly with Puppeteer, allowing users to customize their scraping strategies and connect with other tools or platforms for a one-stop data scraping and analysis workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whether scraping e-commerce platforms for price comparisons, extracting real estate website data, or applying it in financial risk monitoring and market intelligence analysis, Scrapeless provides high-efficiency, intelligent, and reliable solutions for various industries.&lt;/p&gt;

&lt;p&gt;With the technical details and best practices covered in this article, you now understand how to leverage Scrapeless for large-scale data scraping. Whether handling dynamic pages, extracting complex interactive data, optimizing traffic usage, or overcoming anti-scraping mechanisms, Scrapeless helps you achieve your scraping goals swiftly and efficiently.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
    <item>
      <title>Optimizing Headless Browser Traffic: Cost Reduction Strategies with Puppeteer for Efficient Data Scraping</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Sun, 27 Apr 2025 02:30:58 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/optimizing-headless-browser-traffic-cost-reduction-strategies-with-puppeteer-for-efficient-data-21d0</link>
      <guid>https://dev.to/datacollectionscraper/optimizing-headless-browser-traffic-cost-reduction-strategies-with-puppeteer-for-efficient-data-21d0</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;When using Puppeteer for data scraping, traffic consumption is an important consideration.  Especially when using proxy services, traffic costs can increase significantly. To optimize traffic usage, we can adopt the following strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Resource interception&lt;/strong&gt;: Reduce traffic consumption by intercepting unnecessary resource requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request URL interception&lt;/strong&gt;: Further reduce traffic by intercepting specific requests based on URL characteristics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simulate mobile devices&lt;/strong&gt;: Use mobile device configurations to obtain lighter page versions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comprehensive optimization&lt;/strong&gt;: Combine the above methods to achieve the best results.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Optimization Scheme 1: Resource Interception
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Resource Interception Introduction
&lt;/h3&gt;

&lt;p&gt;In Puppeteer, &lt;code&gt;page.setRequestInterception(true)&lt;/code&gt; can capture every network request initiated by the browser and decide to &lt;strong&gt;continue&lt;/strong&gt; (&lt;code&gt;request.continue()&lt;/code&gt;), &lt;strong&gt;terminate&lt;/strong&gt; (&lt;code&gt;request.abort()&lt;/code&gt;), or &lt;strong&gt;customize the response&lt;/strong&gt; (&lt;code&gt;request.respond()&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;This method can significantly reduce bandwidth consumption, especially suitable for &lt;strong&gt;crawling&lt;/strong&gt;, &lt;strong&gt;screenshotting&lt;/strong&gt;, and &lt;strong&gt;performance optimization&lt;/strong&gt; scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interceptable Resource Types and Suggestions
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Impact After Interception&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;image&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Image resources&lt;/td&gt;
&lt;td&gt;JPG/PNG/GIF/WebP images&lt;/td&gt;
&lt;td&gt;Images will not be displayed&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;font&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Font files&lt;/td&gt;
&lt;td&gt;TTF/WOFF/WOFF2 fonts&lt;/td&gt;
&lt;td&gt;System default fonts will be used instead&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;media&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Media files&lt;/td&gt;
&lt;td&gt;Video/audio files&lt;/td&gt;
&lt;td&gt;Media content cannot be played&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;manifest&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Web App Manifest&lt;/td&gt;
&lt;td&gt;PWA configuration file&lt;/td&gt;
&lt;td&gt;PWA functionality may be affected&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;prefetch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Prefetch resources&lt;/td&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;link rel="prefetch"&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Minimal impact on the page&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stylesheet&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CSS Stylesheet&lt;/td&gt;
&lt;td&gt;External CSS files&lt;/td&gt;
&lt;td&gt;Page styles are lost, may affect layout&lt;/td&gt;
&lt;td&gt;⚠️ Caution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;websocket&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;WebSocket&lt;/td&gt;
&lt;td&gt;Real-time communication connection&lt;/td&gt;
&lt;td&gt;Real-time functionality disabled&lt;/td&gt;
&lt;td&gt;⚠️ Caution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;eventsource&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server-Sent Events&lt;/td&gt;
&lt;td&gt;Server push data&lt;/td&gt;
&lt;td&gt;Push functionality disabled&lt;/td&gt;
&lt;td&gt;⚠️ Caution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;preflight&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CORS preflight request&lt;/td&gt;
&lt;td&gt;OPTIONS request&lt;/td&gt;
&lt;td&gt;Cross-origin requests fail&lt;/td&gt;
&lt;td&gt;⚠️ Caution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;script&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;JavaScript scripts&lt;/td&gt;
&lt;td&gt;External JS files&lt;/td&gt;
&lt;td&gt;Dynamic functionality disabled, SPA may not render&lt;/td&gt;
&lt;td&gt;❌ Avoid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;xhr&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;XHR requests&lt;/td&gt;
&lt;td&gt;AJAX data requests&lt;/td&gt;
&lt;td&gt;Unable to obtain dynamic data&lt;/td&gt;
&lt;td&gt;❌ Avoid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fetch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fetch requests&lt;/td&gt;
&lt;td&gt;Modern AJAX requests&lt;/td&gt;
&lt;td&gt;Unable to obtain dynamic data&lt;/td&gt;
&lt;td&gt;❌ Avoid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;document&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Main document&lt;/td&gt;
&lt;td&gt;HTML page itself&lt;/td&gt;
&lt;td&gt;Page cannot load&lt;/td&gt;
&lt;td&gt;❌ Avoid&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Recommendation Level Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⭐ &lt;strong&gt;Safe&lt;/strong&gt;: Interception has almost no impact on data scraping or first-screen rendering; it is recommended to block by default.&lt;/li&gt;
&lt;li&gt;⚠️ &lt;strong&gt;Caution&lt;/strong&gt;: May break styles, real-time functions, or cross-origin requests; requires business judgment.&lt;/li&gt;
&lt;li&gt;❌ &lt;strong&gt;Avoid&lt;/strong&gt;: High probability of causing SPA/dynamic sites to fail to render or obtain data normally, unless you are absolutely sure you don't need these resources.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Resource Interception Example Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;puppeteer-core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wss://browser.scrapeless.com/browser?token=your_api_key&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;scrapeWithResourceBlocking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;browserWSEndpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;defaultViewport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Enable request interception&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRequestInterception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Define resource types to block&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;BLOCKED_TYPES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;font&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;media&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;stylesheet&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="c1"&gt;// Intercept requests&lt;/span&gt;
    &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;request&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;BLOCKED_TYPES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resourceType&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Blocked: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resourceType&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt; - &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;url&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;...`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;waitUntil&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;domcontentloaded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Extract data&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="nf"&gt;scrapeWithResourceBlocking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.scrapeless.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping result:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping failed:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Optimization Scheme 2: Request URL Interception
&lt;/h2&gt;

&lt;p&gt;In addition to intercepting by resource type, more granular interception control can be performed based on URL characteristics. This is particularly effective for blocking ads, analytics scripts, and other unnecessary third-party requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  URL Interception Strategies
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Intercept by domain&lt;/strong&gt;: Block all requests from a specific domain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intercept by path&lt;/strong&gt;: Block requests from a specific path&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intercept by file type&lt;/strong&gt;: Block files with specific extensions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intercept by keyword&lt;/strong&gt;: Block requests whose URLs contain specific keywords&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Common Interceptable URL Patterns
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;URL Pattern&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Advertising services&lt;/td&gt;
&lt;td&gt;Advertising network domains&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ad.doubleclick.net&lt;/code&gt;, &lt;code&gt;googleadservices.com&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Analytics services&lt;/td&gt;
&lt;td&gt;Statistics and analytics scripts&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google-analytics.com&lt;/code&gt;, &lt;code&gt;hotjar.com&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Social media plugins&lt;/td&gt;
&lt;td&gt;Social sharing buttons, etc.&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;platform.twitter.com&lt;/code&gt;, &lt;code&gt;connect.facebook.net&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tracking pixels&lt;/td&gt;
&lt;td&gt;Pixels that track user behavior&lt;/td&gt;
&lt;td&gt;URLs containing &lt;code&gt;pixel&lt;/code&gt;, &lt;code&gt;beacon&lt;/code&gt;, &lt;code&gt;tracker&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large media files&lt;/td&gt;
&lt;td&gt;Large video, audio files&lt;/td&gt;
&lt;td&gt;Extensions like &lt;code&gt;.mp4&lt;/code&gt;, &lt;code&gt;.webm&lt;/code&gt;, &lt;code&gt;.mp3&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Font services&lt;/td&gt;
&lt;td&gt;Online font services&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;fonts.googleapis.com&lt;/code&gt;, &lt;code&gt;use.typekit.net&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;⭐ Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CDN resources&lt;/td&gt;
&lt;td&gt;Static resource CDN&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;cdn.jsdelivr.net&lt;/code&gt;, &lt;code&gt;unpkg.com&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;⚠️ Caution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  URL Interception Example Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;puppeteer-core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wss://browser.scrapeless.com/browser?token=your_api_key&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;scrapeWithUrlBlocking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;browserWSEndpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;defaultViewport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Enable request interception&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRequestInterception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Define domains and URL patterns to block&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;BLOCKED_DOMAINS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google-analytics.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;googletagmanager.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;doubleclick.net&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;facebook.net&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;twitter.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;linkedin.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;adservice.google.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;BLOCKED_PATHS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/ads/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/analytics/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/pixel/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/tracking/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/stats/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="c1"&gt;// Intercept requests&lt;/span&gt;
    &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;request&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;url&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="c1"&gt;// Check domain&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;BLOCKED_DOMAINS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Blocked domain: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;...`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Check path&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;BLOCKED_PATHS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Blocked path: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;...`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Allow other requests&lt;/span&gt;
        &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;waitUntil&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;domcontentloaded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Extract data&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="nf"&gt;scrapeWithUrlBlocking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.scrapeless.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping result:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping failed:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Optimization Scheme 3: Simulate Mobile Devices
&lt;/h2&gt;

&lt;p&gt;Simulating mobile devices is another effective traffic optimization strategy because mobile websites usually provide lighter page content.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advantages of Mobile Device Simulation
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Lighter page versions&lt;/strong&gt;: Many websites provide more concise content for mobile devices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smaller image resources&lt;/strong&gt;: Mobile versions usually load smaller images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplified CSS and JavaScript&lt;/strong&gt;: Mobile versions usually use simplified styles and scripts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced ads and non-core content&lt;/strong&gt;: Mobile versions often remove some non-core functionality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adaptive response&lt;/strong&gt;: Obtain content layouts optimized for small screens&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Mobile Device Simulation Configuration
&lt;/h3&gt;

&lt;p&gt;Here are the configuration parameters for several commonly used mobile devices:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;iPhoneX&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;viewport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;375&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;812&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;deviceScaleFactor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;isMobile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;hasTouch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;isLandscape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or directly use the built-in methods of puppeteer to simulate mobile devices&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;KnownDevices&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;puppeteer-core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;iPhone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;KnownDevices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;iPhone 15 Pro&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;emulate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;iPhone&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mobile Device Simulation Example Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;KnownDevices&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;puppeteer-core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wss://browser.scrapeless.com/browser?token=your_api_key&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;scrapeWithMobileEmulation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;browserWSEndpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;defaultViewport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Set mobile device simulation&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;iPhone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;KnownDevices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;iPhone 15 Pro&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;emulate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;iPhone&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;waitUntil&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;domcontentloaded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="c1"&gt;// Extract data&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="nf"&gt;scrapeWithMobileEmulation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.scrapeless.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping result:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping failed:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Comprehensive Optimization Example
&lt;/h2&gt;

&lt;p&gt;Here is a comprehensive example combining all optimization schemes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;KnownDevices&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;puppeteer-core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wss://browser.scrapeless.com/browser?token=your_api_key&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;optimizedScraping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Starting optimized scraping: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Record traffic usage&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;totalBytesUsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;browserWSEndpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;defaultViewport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Set mobile device simulation&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;iPhone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;KnownDevices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;iPhone 15 Pro&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;emulate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;iPhone&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Set request interception&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRequestInterception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Define resource types to block&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;BLOCKED_TYPES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;media&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;font&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="c1"&gt;// Define domains to block&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;BLOCKED_DOMAINS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google-analytics.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;googletagmanager.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;facebook.net&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;doubleclick.net&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;adservice.google.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="c1"&gt;// Define URL paths to block&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;BLOCKED_PATHS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/ads/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/analytics/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/tracking/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="c1"&gt;// Intercept requests&lt;/span&gt;
    &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;request&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;url&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resourceType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resourceType&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="c1"&gt;// Check resource type&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;BLOCKED_TYPES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resourceType&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Blocked resource type: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;resourceType&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; - &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;...`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Check domain&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;BLOCKED_DOMAINS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Blocked domain: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;...`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Check path&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;BLOCKED_PATHS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Blocked path: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;...`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Allow other requests&lt;/span&gt;
        &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Monitor network traffic&lt;/span&gt;
    &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;contentLength&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content-length&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;parseInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content-length&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nx"&gt;totalBytesUsed&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;contentLength&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;waitUntil&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;domcontentloaded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Simulate scrolling to trigger lazy-loading content&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scrollBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHeight&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;// Extract data&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="na"&gt;links&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelectorAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;href&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt;
            &lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Output traffic usage statistics&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`\nTraffic Usage Statistics:`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Used: &lt;/span&gt;&lt;span class="p"&gt;${(&lt;/span&gt;&lt;span class="nx"&gt;totalBytesUsed&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt; MB`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="nf"&gt;optimizedScraping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.scrapeless.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping complete:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping failed:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Optimization Comparison
&lt;/h3&gt;

&lt;p&gt;We try removing the optimized code from the comprehensive example to compare the traffic before and after optimization. Here is the unoptimized example code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;puppeteer-core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wss://browser.scrapeless.com/browser?token=your_api_key&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;optimizedScraping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Starting optimized scraping: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Record traffic usage&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;totalBytesUsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;browserWSEndpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;scrapelessUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;defaultViewport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Set request interception&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setRequestInterception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Intercept requests&lt;/span&gt;
  &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;request&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Monitor network traffic&lt;/span&gt;
  &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;response&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;contentLength&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content-length&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nf"&gt;parseInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content-length&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;totalBytesUsed&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;contentLength&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;waitUntil&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;domcontentloaded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Simulate scrolling to trigger lazy-loading content&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scrollBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHeight&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

  &lt;span class="c1"&gt;// Extract data&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;links&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelectorAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;href&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt;
      &lt;span class="p"&gt;}))&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Output traffic usage statistics&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`\nTraffic Usage Statistics:`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Used: &lt;/span&gt;&lt;span class="p"&gt;${(&lt;/span&gt;&lt;span class="nx"&gt;totalBytesUsed&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toFixed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt; MB`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="nf"&gt;optimizedScraping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.scrapeless.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping complete:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Scraping failed:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running the unoptimized code, we can see the traffic difference very intuitively from the printed information:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Traffic Used (MB)&lt;/th&gt;
&lt;th&gt;Saving Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Unoptimized&lt;/td&gt;
&lt;td&gt;6.03&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Optimized&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.81&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≈ 86.6 %&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;By combining the above optimization schemes, proxy traffic consumption can be significantly reduced, scraping efficiency can be improved, and ensuring that the required core content is obtained.&lt;/p&gt;

</description>
      <category>scraping</category>
      <category>puppeteer</category>
      <category>browser</category>
    </item>
    <item>
      <title>Scrapeless Scraping Browser - Browser Fingerprint Customization</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Thu, 24 Apr 2025 09:09:20 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/scrapeless-scraping-browser-browser-fingerprint-customization-340a</link>
      <guid>https://dev.to/datacollectionscraper/scrapeless-scraping-browser-browser-fingerprint-customization-340a</guid>
      <description>&lt;p&gt;Over the past three decades, browsers have consistently served as the primary gateway to the Internet. From early pioneers like Mosaic and Internet Explorer that transformed how people accessed the web, to today’s mainstream products led by Chrome, browsers have remained the core environment for information retrieval, task execution, and contextual interaction.&lt;/p&gt;

&lt;p&gt;With the rapid rise of artificial intelligence, the role of the browser is undergoing an unprecedented transformation. Whether it’s Opera Aria, Perplexity, or products currently incubated by OpenAI, a shared understanding is emerging: AI needs a browser of its own—a platform purpose-built for task execution and contextual understanding, rather than merely functioning as a plugin embedded in traditional browsers.&lt;/p&gt;

&lt;p&gt;From the perspective of AI integration, AI browser products can be roughly categorized into three types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Traditional browsers enhanced with AI&lt;/strong&gt;, typically in the form of copilot-style assistants, such as browser extensions for Microsoft Edge and Chrome.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Browsers with built-in AI capabilities&lt;/strong&gt; at the core level, enabling enhanced permissions and interactions—for instance, Arc Max for organizing tabs or Opera Aria for executing tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dedicated AI-native browsers&lt;/strong&gt;, which is the foundational vision behind Scrapeless. In this model, users interact with an AI that operates within a browser running in a virtual machine, providing a more complete and autonomous solution.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;Scrapeless Scraping Browser&lt;/strong&gt; was born from this vision. Designed specifically for AI agents, it not only addresses the challenges of &lt;a href="https://www.scrapeless.com/en/blog/scrapeless-scraping-browser-for-ai?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=fingerprintcustomization" rel="noopener noreferrer"&gt;high-concurrency and task automation&lt;/a&gt; but also pushes the boundaries of AI execution capabilities. However, through real-world deployment, a critical limitation has become evident: despite having powerful control over commands and web pages, all advantages vanish if the system is flagged as bot traffic by the target website. This reveals a key technical bottleneck in the current generation of AI browsers—&lt;strong&gt;the authenticity and diversity of browser fingerprints&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In response, Scrapeless has significantly enhanced its fingerprint customization capabilities in its latest product update. By deeply customizing the Chromium engine, Scrapeless enables highly granular fingerprint strategies, ensuring that each virtual browser instance possesses uniquely &lt;strong&gt;“human-like”&lt;/strong&gt; characteristics. This drastically reduces the risk of being flagged by platform security systems. The upgrade not only improves the stability of AI operations in high-frequency tasks but also provides a safer and more reliable execution environment for future agent-based systems.&lt;/p&gt;

&lt;p&gt;In the following sections, we’ll take a deep dive into the technical details behind Scrapeless’ fingerprinting layer and explore how it is becoming a critical component in the infrastructure of the next generation of AI-native browsers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scrapeless Scraping Browser: Advantages and Core Features
&lt;/h2&gt;

&lt;p&gt;Scrapeless Scraping Browser is a future-oriented cloud-based browser solution specifically designed for AI agents and automated task execution. It integrates a high-performance concurrent processing architecture, advanced browser fingerprint customization, and intelligent anti-anti-bot logic to provide users with a stable, efficient, and scalable data interaction platform.&lt;/p&gt;

&lt;p&gt;Whether used in intelligent agent systems for executing large-scale web tasks, or in complex scenarios like multi-account marketing, dynamic content extraction, and public opinion monitoring, Scrapeless delivers a secure, stealthy, and intelligent environment simulation capability—effectively bypassing traditional anti-bot mechanisms and fingerprint detection limits.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Technical Advantages
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Authentic Browser Environment
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Chromium Engine Support: Provides a fully functional browser environment to simulate real user behavior.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;TLS Fingerprint Spoofing: Masks TLS fingerprint to bypass conventional bot detection systems and appear as a regular browser.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Dynamic Fingerprint Obfuscation: Randomly adjusts browser environment variables (e.g., User-Agent, Canvas, WebGL) to enhance human-like behavior and evade sophisticated anti-bot strategies.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Cloud-Based Architecture and Scalability
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Cloud Deployment: Fully cloud-based, requiring no local resources, and supports global distributed deployments.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;High Concurrency Support: Scalable from dozens to unlimited concurrent sessions—ideal for large-scale scraping and complex automation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Easy Integration: Fully compatible with existing automation frameworks (e.g., Playwright and Puppeteer) with no code refactoring required.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Purpose-Built for AI Agents
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Automation Proxy Support: Offers powerful proxy capabilities to help AI agents execute complex browser automation tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Flexible Invocation: Supports multi-task parallel execution, making it an ideal tool for building intelligent agent systems and AI-driven applications.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Core Features
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Deep Customization of Browser Fingerprints
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.scrapeless.com/en/glossary/browser-fingerprint%EF%BC%9Fsutm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=fingerprintcustomization" rel="noopener noreferrer"&gt;Browser fingerprints&lt;/a&gt; are unique digital identifiers generated from browser and device configurations, often used to track user activity even without cookies. Scrapeless Scraping Browser allows full customization of these fingerprints—supporting adjustments to User-Agent, timezone, language, screen resolution, and other key parameters—to enhance multi-account management, data collection, and privacy protection.&lt;/p&gt;

&lt;p&gt;By enabling controlled adjustments to standardized parameters exposed by the browser, Scrapeless helps users construct highly “authentic” browsing environments. Below are the main fingerprint customization features currently supported:&lt;/p&gt;

&lt;h5&gt;
  
  
  User-Agent Control
&lt;/h5&gt;

&lt;p&gt;Allows custom User-Agent strings in HTTP request headers to simulate specific browser versions, operating systems, and device environments—enhancing stealth and compatibility.&lt;/p&gt;

&lt;h5&gt;
  
  
  Screen Resolution Mapping
&lt;/h5&gt;

&lt;p&gt;Permits custom values for screen.width and screen.height to emulate common device display dimensions, supporting responsive rendering and resisting device fingerprinting strategies.&lt;/p&gt;

&lt;h5&gt;
  
  
  Platform Property Locking
&lt;/h5&gt;

&lt;p&gt;Enables customization of navigator.platform return values to simulate standard platform types (e.g., Windows, macOS, Linux), influencing how websites adapt to different OS environments.&lt;/p&gt;

&lt;h5&gt;
  
  
  Localization Environment Simulation
&lt;/h5&gt;

&lt;p&gt;Fully supports customization of browser localization settings, affecting website content localization, time format rendering, and language preference inference. Supported parameters include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;localization.timezone:&lt;/strong&gt; Set IANA-compliant timezone identifiers (e.g., Asia/Shanghai)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;localization.locale:&lt;/strong&gt; Set BCP 47-compliant language-region codes (e.g., zh-CN)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;localization.languages:&lt;/strong&gt; Define prioritized language lists for navigator.languages and the Accept-Language HTTP header&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;localization.timezone&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sets the timezone identifier (compliant with IANA format, e.g., &lt;code&gt;Asia/Shanghai&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;localization.locale&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sets the language and region (compliant with BCP 47 format, e.g., &lt;code&gt;zh-CN&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;localization.languages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Defines the language priority list, mapped to &lt;code&gt;navigator.languages&lt;/code&gt; and the &lt;code&gt;Accept-Language&lt;/code&gt; HTTP header&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For more advanced fingerprint customization (such as Canvas, WebGL, font detection, etc.), Scrapeless is continuously under development. In the future, it will support even more fine-grained environment simulation capabilities—stay tuned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Detailed Explanation of Scrapeless Scraping Browser Fingerprint Parameters&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter Name&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;userAgent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Defines the User-Agent string in the browser's HTTP request header, which includes browser engine, version, OS, and other key identifiers. Websites use this for client environment detection, affecting content adaptation and feature availability. &lt;strong&gt;Default:&lt;/strong&gt; Follow browser&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;platform&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;enum&lt;/td&gt;
&lt;td&gt;Specifies the return value of the JavaScript &lt;code&gt;navigator.platform&lt;/code&gt; property, indicating the OS type of the runtime environment. Optional values: &lt;code&gt;"Windows"&lt;/code&gt;, &lt;code&gt;"macOS"&lt;/code&gt;, &lt;code&gt;"Linux"&lt;/code&gt;. This is used for feature detection and enabling OS-specific behaviors. &lt;strong&gt;Default:&lt;/strong&gt; Windows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;screen&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;td&gt;Defines the physical display characteristics reported by the browser, directly mapped to JavaScript's &lt;code&gt;window.screen&lt;/code&gt; object.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;screen.width&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;td&gt;Physical screen width (in pixels), mapped to &lt;code&gt;screen.width&lt;/code&gt;, affects media queries and responsive layouts. &lt;strong&gt;Default:&lt;/strong&gt; Randomized with fingerprint, minimum 640&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;screen.height&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;td&gt;Physical screen height (in pixels), mapped to &lt;code&gt;screen.height&lt;/code&gt;, together with width defines resolution. &lt;strong&gt;Default:&lt;/strong&gt; Randomized with fingerprint, minimum 480&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;localization&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;object&lt;/td&gt;
&lt;td&gt;Controls the browser’s localization settings, including language, region, and timezone. These settings influence formatting and content localization.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;localization.timezone&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Timezone identifier compliant with the IANA database (e.g., &lt;code&gt;"Asia/Shanghai"&lt;/code&gt;), controls JavaScript date object behavior and &lt;code&gt;Intl.DateTimeFormat&lt;/code&gt; output. A key part of timezone fingerprinting. &lt;strong&gt;Default:&lt;/strong&gt; America/New_York&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;localization.languages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;[string]&lt;/td&gt;
&lt;td&gt;A prioritized list of supported languages, mapped to &lt;code&gt;navigator.languages&lt;/code&gt; and HTTP &lt;code&gt;Accept-Language&lt;/code&gt; header, influencing site language selection. &lt;strong&gt;Default:&lt;/strong&gt; &lt;code&gt;"en"&lt;/code&gt;, &lt;code&gt;"en-US"&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  2. CAPTCHA Solving Capabilities
&lt;/h4&gt;

&lt;p&gt;Scraping Browser features an advanced CAPTCHA solving solution that can automatically handle most mainstream CAPTCHA types, including reCAPTCHA and Cloudflare Turnstile.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Industry-Leading Success Rate:&lt;/strong&gt; Scrapeless delivers highly effective CAPTCHA solving with a success rate exceeding 98%.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No Extra Cost:&lt;/strong&gt; While most competitors charge additional fees for CAPTCHA-solving features, Scrapeless includes this functionality as part of its core service—no extra charges required.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Real-Time Processing:&lt;/strong&gt; The CAPTCHA solving engine in Scrapeless operates with millisecond-level response times, ensuring smooth task execution.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Flexible and Controllable Proxy Integration System
&lt;/h4&gt;

&lt;p&gt;Scraping Browser comes with a highly configurable proxy support system, allowing for fine-grained routing and traffic management in automated workflows.&lt;/p&gt;

&lt;h5&gt;
  
  
  3.1 Built-in Residential Proxies
&lt;/h5&gt;

&lt;p&gt;With Scrapeless’s built-in, managed residential proxy network, you can instantly route traffic across the globe—perfect for bypassing geo-restrictions and anti-bot measures.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No configuration required – ready to use out of the box&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Supports geolocation-based proxies in 195 countries and regions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stable, high-anonymity proxies suitable for large-scale automation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Easy to test and deploy via the built-in Playground&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  3.2 Bring Your Own Proxies
&lt;/h5&gt;

&lt;p&gt;If you have your own proxy service or prefer a specific provider, Scrapeless offers flexible proxy integration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Assign proxies directly to tasks by specifying parameters during session creation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using your own proxies will not count towards Scrapeless’s proxy usage billing&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. Toolkit Support
&lt;/h4&gt;

&lt;p&gt;Comprehensive Automation Tool Compatibility: Scrapeless supports popular browser automation tools like Puppeteer and Playwright, making it easy for developers to integrate.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI Integration Capabilities:&lt;/strong&gt; Scrapeless is planning deep integrations with tools like Browser Use, Computer Use, and LangChain. Future updates will further unlock the potential of large language models in dynamic web interactions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ease of Use:&lt;/strong&gt; Comes with detailed documentation and example code to help users get started quickly.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  5. Concurrency Support
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flexible Concurrency Options:&lt;/strong&gt; Scrapeless supports anywhere from 50 to unlimited concurrent sessions, scalable from small tasks to large-scale automation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No Extra Concurrency Fees:&lt;/strong&gt; While competitors often charge for high-concurrency use cases, Scrapeless offers a transparent and flexible pricing model with no hidden costs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Scrapeless Scraping Browser Fingerprint Parameters Example Code
&lt;/h2&gt;

&lt;p&gt;The following is a simple example code showing how to integrate Scrapeless's browser fingerprint customization function through Puppeteer and Playwright:&lt;/p&gt;

&lt;h3&gt;
  
  
  Puppeteer Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const puppeteer = require('puppeteer-core');

// custom browser fingerprint
const fingerprint = {
    userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.1.2.3 Safari/537.36',
    platform: 'Windows',
    screen: {
        width: 1280, height: 1024
    },
    localization: {
        languages: ['zh-HK', 'en-US', 'en'], timezone: 'Asia/Hong_Kong',
    }
}

const query = new URLSearchParams({
  token: 'APIKey', // required
  session_ttl: 180,
  proxy_country: 'ANY',
  fingerprint: encodeURIComponent(JSON.stringify(fingerprint)),
});

const connectionURL = `wss://browser.scrapeless.com/browser?${query.toString()}`;

(async () =&amp;gt; {
    const browser = await puppeteer.connect({browserWSEndpoint: connectionURL});
    const page = await browser.newPage();
    await page.goto('https://www.scrapeless.com');
    const info = await page.evaluate(() =&amp;gt; {
        return {
            screen: {
                width: screen.width,
                height: screen.height,
            },
            userAgent: navigator.userAgent,
            timeZone: Intl.DateTimeFormat().resolvedOptions().timeZone,
            languages: navigator.languages
        };
    });
    console.log(info);
    await browser.close();
})();

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Playwright Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const { chromium } = require('playwright-core');

// custom browser fingerprint
const fingerprint = {
    userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.1.2.3 Safari/537.36',
    platform: 'Windows',
    screen: {
        width: 1280, height: 1024
    },
    localization: {
        languages: ['zh-HK', 'en-US', 'en'], timezone: 'Asia/Hong_Kong',
    }
}

const query = new URLSearchParams({
  token: 'APIKey', // required
  session_ttl: 180,
  proxy_country: 'ANY',
  fingerprint: encodeURIComponent(JSON.stringify(fingerprint)),
});

const connectionURL = `wss://browser.scrapeless.com/browser?${query.toString()}`;

(async () =&amp;gt; {
    const browser = await chromium.connectOverCDP(connectionURL);
    const page = await browser.newPage();
    await page.goto('https://www.scrapeless.com');
    const info = await page.evaluate(() =&amp;gt; {
        return {
            screen: {
                width: screen.width,
                height: screen.height,
            },
            userAgent: navigator.userAgent,
            timeZone: Intl.DateTimeFormat().resolvedOptions().timeZone,
            languages: navigator.languages
        };
    });
    console.log(info);
    await browser.close();
})();


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Applicable Scenarios for Scrapeless Scraping Browser Fingerprint Customization
&lt;/h2&gt;

&lt;p&gt;The fingerprint customization feature of Scrapeless Scraping Browser is suitable for a variety of use cases, including but not limited to the following:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Basic Multi-Account Isolation and Risk Control
&lt;/h3&gt;

&lt;p&gt;For users who manage multiple accounts—such as those in cross-border e-commerce or social media marketing—Scrapeless enables flexible configuration of browser fingerprint parameters like User-Agent, screen resolution, timezone, and language preferences. This helps avoid environmental overlap between accounts, significantly reducing the risk of platform detection and account linkage.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Typical Applications:&lt;/strong&gt; Account environment isolation on platforms like Shopify, Facebook, and Google Ads.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2. Lightweight Data Collection and Anti-Bot Evasion
&lt;/h3&gt;

&lt;p&gt;When performing web scraping tasks, Scrapeless Scraping Browser helps users disguise their automation as "real user" traffic rather than bot activity. By simulating mainstream device configurations (e.g., Windows 10 + Chrome 114 + 1080p monitor) and fine-tuning fingerprint details, users can effectively bypass basic anti-bot mechanisms of target websites, such as:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- User-Agent blacklists&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without the need for complex scripts or large-scale IP pool scheduling, users can achieve fast and stable data collection.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Typical Applications:&lt;/strong&gt; Price monitoring, public opinion tracking, product comparison, SEO data scraping.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3. Compatibility Testing
&lt;/h3&gt;

&lt;p&gt;Frontend developers and QA engineers can use Scrapeless to quickly switch between different operating systems (e.g., Windows/macOS), screen sizes, and other parameters to simulate diverse access environments. This allows for testing rendering behavior and functional integrity across multiple configurations.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Typical Applications:&lt;/strong&gt; A/B testing for ad campaigns, responsive UI validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ethical Statement&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We advocate responsible fingerprint customization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only used in legally authorized scenarios (such as corporate data &amp;gt; compliance collection, internal risk control testing).&lt;/li&gt;
&lt;li&gt;It is prohibited to commit online fraud or infringe on user privacy by forging fingerprints.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Future Roadmap of Scrapeless Scraping Browser
&lt;/h2&gt;

&lt;p&gt;Looking ahead, &lt;a href="https://www.scrapeless.com/en/product/scraping-browser?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=fingerprintcustomization" rel="noopener noreferrer"&gt;Scrapeless Scraping Browser&lt;/a&gt; will continue to optimize its core functionalities to meet a wide range of needs—from basic data scraping to advanced AI-driven automation. Our goal is to provide users with even more powerful tools and seamless experiences. The following are our key development directions:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Debugging and Monitoring
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Live Preview: Real-time view within the Playground to facilitate debugging and task takeover.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Session Management: Support for session replay, inspector tools, and metadata querying to enhance task monitoring and control.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. File Handling
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Upload: Easily upload files to target websites using Playwright, Puppeteer, or Selenium.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Download: Downloaded files are automatically stored in the cloud, with Unix timestamps appended to filenames (e.g., sample-1719265797164.pdf) to avoid conflicts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retrieval: Quickly access downloaded files via API—ideal for data extraction and report generation scenarios.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Context API &amp;amp; Extension Support
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Context API: Enables session persistence to optimize login flows and multi-step automation scenarios.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extension Support: Enhance browser sessions with your own Chrome extensions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Metadata Query
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use custom tags and metadata queries to filter and locate specific sessions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. SDK and API Enhancements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Session API: Offers robust session management capabilities to simplify workflow operations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;CDP Event Enhancements: Broaden support for Chrome DevTools Protocol (CDP) features, including retrieving page HTML, clicking elements, scrolling, and capturing screenshots.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In the previous sections, we discussed the various challenges that current browser automation tools face when supporting AI-driven automation tasks. These issues significantly impact developers' productivity and the feasibility of tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High Concurrency Bottleneck:&lt;/strong&gt; Traditional browsers often struggle under heavy parallel requests, leading to frequent task failures. In high concurrency scenarios, they cannot effectively support AI-driven automation tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Easily Detected by Anti-Scraping Mechanisms:&lt;/strong&gt; Traditional browsers exhibit predictable behaviors and lack human-like intelligent behavior simulation, making it easy for websites' anti-scraping systems to detect and block them, preventing them from bypassing these protections.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High Costs&lt;/strong&gt;: In large-scale tasks, traditional browsers consume significant resources and incur high operational costs, limiting task scale and frequency, thereby reducing efficiency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Complex Integration and Learning Curve:&lt;/strong&gt; Integrating traditional browsers for automation tasks typically requires complex configurations and coding, increasing the learning difficulty for developers and reducing development efficiency.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To address these issues, Scrapeless Scraping Browser has redefined the concept of the "browser for AI," aiming to provide a more efficient, intelligent, and cost-effective solution for AI-driven automation tasks. Below are the key innovations we have already implemented:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Breaking the High Concurrency Bottleneck:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Elastic Scaling:&lt;/strong&gt; With an innovative cloud architecture, Scrapeless has achieved seamless scaling from fifty to unlimited concurrent sessions, greatly improving throughput and ensuring task stability and efficiency. Even in high concurrency scenarios, tasks can be executed smoothly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Human-like Behavior and Fingerprint Customization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full-Stack Human Protection:&lt;/strong&gt; Scrapeless deeply customizes the browser engine to simulate real user browsing behaviors, bypassing anti-scraping detection mechanisms. This upgrade particularly enhances fingerprint customization features, allowing developers to fine-tune browser fingerprint attributes, including but not limited to User-Agent, screen resolution, etc., further enhancing the browser's stealth and flexibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Significantly Reducing Costs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unmatched Cost Efficiency:&lt;/strong&gt; Compared to other solutions, Scrapeless offers a &lt;strong&gt;60%-80%&lt;/strong&gt; cost reduction while ensuring compatibility with tools like Playwright and Puppeteer, enabling developers to automate large-scale tasks at a lower cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Simplified Integration and Usability:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compatibility and Ease of Use:&lt;/strong&gt; Scrapeless lowers the development threshold, reducing integration complexity and allowing developers to quickly get started without facing a steep learning curve. With intuitive APIs and interfaces, Scrapeless makes browser automation simpler and more efficient.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While we have made significant progress, Scrapeless continues to evolve. Future versions will include more intelligent features, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;More precise fingerprint spoofing and behavior simulation;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Session Replay Debug and extended support;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SDK and API support;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deep integration with the Browser Use framework, offering powerful LLM crawling capabilities, full-site extraction, and deep research capabilities to further enhance the efficiency and accuracy of automated data scraping and deep research.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scrapeless Scraping Browser, as the "browser for AI," not only addresses key current issues but is also continuously improving to meet future challenges. We invite developers and teams to join us on this innovative journey, share your needs and suggestions, and work together to drive browser automation technology into a smarter and more efficient new era.&lt;/p&gt;

&lt;h2&gt;
  
  
  About Scrapeless
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.scrapeless.com/en?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=fingerprintcustomization" rel="noopener noreferrer"&gt;Scrapeless official website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/Np4CAHxB9a?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=fingerprintcustomization" rel="noopener noreferrer"&gt;Scrapeless Discord&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://apidocs.scrapeless.com/?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=fingerprintcustomization" rel="noopener noreferrer"&gt;Scrapeless API documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>scraping</category>
      <category>browser</category>
      <category>scrapingbrowserr</category>
    </item>
    <item>
      <title>用 10 行代码爬取 Naver 智能商店数据 – 从 API 调用到结构化输出</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Tue, 22 Apr 2025 07:51:41 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/yong-10-xing-dai-ma-pa-qu-naver-zhi-neng-shang-dian-shu-ju-cong-api-diao-yong-dao-jie-gou-hua-shu-chu-kn2</link>
      <guid>https://dev.to/datacollectionscraper/yong-10-xing-dai-ma-pa-qu-naver-zhi-neng-shang-dian-shu-ju-cong-api-diao-yong-dao-jie-gou-hua-shu-chu-kn2</guid>
      <description>&lt;p&gt;在当今数据驱动的时代，从 Naver Smart Store 等电商平台获取有价值的洞察，可以为企业带来竞争优势。无论您是分析产品趋势、监控竞争对手，还是优化定价策略，高效地抓取数据都是关键。本文将向您展示如何使用 Scrapeless（一款功能强大且开发者友好的工具）抓取 Naver Smart Store 数据，只需 10 行代码即可。&lt;/p&gt;

&lt;h2&gt;
  
  
  为什么要抓取 Naver Smart Store？
&lt;/h2&gt;

&lt;p&gt;Naver Smart Store 是韩国最大的在线购物平台之一，托管着数百万种不同类别的产品。从中提取数据可以帮助企业：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;洞察市场趋势和消费者偏好。&lt;/li&gt;
&lt;li&gt;监控竞争对手的定价和产品表现。&lt;/li&gt;
&lt;li&gt;识别新兴产品类别和客户情绪。&lt;/li&gt;
&lt;li&gt;自动化库存跟踪和销售分析。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;然而，手动收集这些数据既费时又低效。Scrapeless 应运而生——这是一款专为简便性、可扩展性和可靠性而设计的尖端抓取工具。&lt;/p&gt;

&lt;h2&gt;
  
  
  如何抓取 Naver Smart Store 传统方法 vs. 现代解决方案
&lt;/h2&gt;

&lt;h3&gt;
  
  
  (1) 传统网页抓取
&lt;/h3&gt;

&lt;p&gt;传统方法需要使用 BeautifulSoup、Selenium 或 Playwright 等工具编写自定义脚本。虽然这些工具功能强大，但也存在一些明显的缺点：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;维护成本高：脚本需要频繁更新才能适应网站的变化。&lt;/li&gt;
&lt;li&gt;反抓取障碍：验证码解析、IP 地址轮换和 TLS 指纹识别必须手动实现。&lt;/li&gt;
&lt;li&gt;可扩展性有限：扩展以处理数千个请求需要大量资源。&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  (2) 基于现代 API 的解决方案
&lt;/h3&gt;

&lt;p&gt;现代解决方案（例如 Scrapeless Naver Scraping API）消除了传统数据抓取面临的许多挑战。Scrapeless API 提供以下功能：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;配备强大的内置基础架构和解锁功能，确保您通过简单的 API 调用即可大规模获取结构化数据。&lt;/li&gt;
&lt;li&gt;快速将原始 HTML 转换为 JSON 或 CSV 文件等结构化数据格式。&lt;/li&gt;
&lt;li&gt;易于使用，只需极少的设置即可简化结构化数据的提取流程。&lt;/li&gt;
&lt;li&gt;与主流编程语言和工具完全兼容。
## Scrapeless 如何简化流程&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Scrapeless 提倡合法合规地抓取公开数据。请确保您获取的信息仅用于合法用途，并避免任何形式的盈利性使用。严格遵守相关法律法规和数据抓取规则，维护健康的数据生态系统。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Scrapeless 提供直观的 API，可在后台处理复杂的数据抓取任务。它具备智能 IP 轮换、验证码绕过和实时数据提取等功能，确保高成功率，同时最大限度地降低被屏蔽的风险。让我们来看看如何仅用 10 行代码使用 Scrapeless 抓取 Naver Smart Store。&lt;/p&gt;

&lt;h2&gt;
  
  
  分步指南：使用 Scrapeless 抓取 Naver Smart Store 数据
&lt;/h2&gt;

&lt;h3&gt;
  
  
  步骤 1：设置您的 Scrapeless 帐户
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=devto&amp;amp;utm_campaign=scrapenavercoupon" rel="noopener noreferrer"&gt;注册&lt;/a&gt;一个Scrapeless免费账户&lt;/li&gt;
&lt;li&gt;从仪表板获取您的 API 密钥。此密钥将用于验证您的请求&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oqofi1baifbm158ujf0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oqofi1baifbm158ujf0.png" alt="获取api密钥" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  第2步：选择Naver并进入Scrapeless仪表板界面。
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2uwey8h4otwpm6yi44n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2uwey8h4otwpm6yi44n.png" alt="进入Scrapeless仪表板界面" width="800" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  第三步：设置抓取参数
&lt;/h3&gt;

&lt;p&gt;产品 ID 和商店 ID 可以直接在产品 URL 中找到。让我们来看看： &lt;a href="https://brand.naver.com/barudak/products/4469033180?NaPm=ct%3Dm9mo5x4g%7Cci%3D800b828f830f1d3d81df0575f6009efc9235fd9a%7Ctr%3Dnshsnx%7Csn%3D727239%7Cic%3D%7Chk%3De39ed35e26996b18c35ced568d18f83bc39fdf94" rel="noopener noreferrer"&gt;[바르닭] 닭і슴살 143종 크런치 소품닭 닭스테ց 소스큐브 골라담기 [원산지:국산(경기도 포천시) 등]&lt;/a&gt; 为例：&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;店铺ID: barudak&lt;/p&gt;

&lt;p&gt;产品编号：4469033180&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  步骤 4：抓取基本商品信息
&lt;/h3&gt;

&lt;p&gt;设置好必要的抓取参数后，点击“开始抓取”，抓取结果将显示在右侧。&lt;/p&gt;

&lt;p&gt;以下是一些抓取结果示例：&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"additionalAttributes": {"A/S 안내": ["********","********"],"영수증발급": "신용카드전표, 현금영수증발급"},"adultAuthorizationType": "NOT_LOGIN","afterServiceInfo": {"afterServiceGuideContent": "********","afterServiceTelephoneNumber": "********"},"arrivalGuarantee": false,"authenticationType": "NORMAL","authorizationDisplay": "NORMAL","averageDeliveryLeadTime": {"productAverageDeliveryLeadTime": 1.6511627,"sellerAverageDeliveryLeadTime": 1.6331967},"benefitsPolicy": {"givePresent": true,"managerBankbookAccumulatePolicyNo": 12306300388384,"managerBankbookAccumulateValue": 0.5,"managerBankbookAccumulateValueUnit": "PERCENT","managerMaxBankbookAccumulateAmount": 10000,"managerMaxPaymoneyAccumulateAmount": 30000,"managerMaxPurchasePointAmount": 100000,"managerPaymoneyAccumulatePolicyNo": 439583905,"managerPaymoneyAccumulateValue": 1.5,"managerPaymoneyAccumulateValueUnit": "PERCENT","managerPurchasePointPolicyNo": 10511031105304,"managerPurchasePointValue": 1,"managerPurchasePointValueUnit": "PERCENT","sellerImmediateDiscountPolicyNo": "SE_4460099867","sellerImmediateDiscountValue": 1220,"sellerImmediateDiscountValueUnit": "WON"},"benefitsView": {"afterUsePhotoVideoReviewPoint": 0,"afterUseTextReviewPoint": 0,"discountedRatio": 55,"discountedSalePrice": 990,"generalPurchaseReviewPoint": 0,"givePresent": true,"managerAfterUsePhotoVideoReviewPoint": 0,"managerAfterUseTextReviewPoint": 0,"managerArrivalGuaranteePoint": 0,"managerBankbookAccumulatePoint": 4,"managerGeneralPurchaseReviewPoint": 50,"managerImmediateDiscountAmount": 0,"managerMembershipArrivalGuaranteePoint": 0,"managerPaymoneyAccumulatePoint": 14,"managerPhotoVideoReviewPoint": 150,"managerPremiumPurchaseReviewPoint": 150,"managerPurchaseExtraPoint": 0,"managerPurchasePoint": 9,"managerTextReviewPoint": 50,"mobileDiscountedRatio": 55,"mobileDiscountedSalePrice": 990,"mobileManagerArrivalGuaranteePoint": 0,"mobileManagerBankbookAccumulatePoint": 4,"mobileManagerImmediateDiscountAmount": 0,"mobileManagerMembershipArrivalGuaranteePoint": 0,"mobileManagerPaymoneyAccumulatePoint": 14,"mobileManagerPurchaseExtraPoint": 0,"mobileManagerPurchasePoint": 9,"mobileSellerCustomerManagementPoint": 0,"mobileSellerImmediateDiscountAmount": 1220,"mobileSellerPurchasePoint": 0,"photoVideoReviewPoint": 0,"premiumPurchaseReviewPoint": 0,"sellerCustomerManagementPoint": 0,"sellerImmediateDiscountAmount": 1220,"sellerPurchasePoint": 0,"specialDiscountAmount": {},"storeMemberReviewPoint": 0,"textReviewPoint": 0},"best": false,"cardPromotions": [],"category": {"category1Id": "50000006","category1Name": "식품","category2Id": "50000145","category2Name": "축산물","category3Id": "50001172","category3Name": "닭고기","category4Id": "50013800","category4Name": "닭가슴살","categoryId": "50013800","categoryLevel": 4,"categoryName": "닭가슴살","exceptionalCategoryTypes": ["FREE_RETURN_INSURANCE","ORIGINAREA_PRODUCTS","REGULAR_SUBSCRIPTION","REVIEW_UNEXPOSE","GROUP_PRODUCT_MAX"],
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  步骤5：抓取Naver产品优惠券信息
&lt;/h3&gt;

&lt;p&gt;从以上抓取结果中，我们可以看到以下信息：&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"productNo": "4460099867"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;此外，您还可以找到其他与产品相关的唯一标识符，例如：&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"productId": "10217226674"&lt;/p&gt;

&lt;p&gt;categoryId: 50013800 对应类别 닭가슴살&lt;/p&gt;

&lt;p&gt;"wholeCategoryId": "50000006&amp;gt;50000145&amp;gt;50001172&amp;gt;50013800",&lt;/p&gt;

&lt;p&gt;"channelUid": "2sWDx0OygJl5sQcE9f6rD"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;设置抓取参数后，即可抓取结果。&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fbxeymnvi3be0329qv8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fbxeymnvi3be0329qv8.png" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;使用 Scrapeless Naver Scraping API 获取优惠券数据。以下是 Python 请求代码示例：&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;您只需用您的 API KEY 替换令牌部分。&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  如何绕过 Naver Shop 的反机器人措施
&lt;/h2&gt;

&lt;p&gt;Scrapeless 提供优质的全球清洁 IP 代理服务，专注于动态住宅 IPv4 代理。Scrapeless 住宅代理网络拥有遍布 195 个国家/地区的超过 7000 万个 IP 地址，提供全面的全球代理支持，助力您的业务增长。&lt;/p&gt;

&lt;p&gt;获取代理的步骤：&lt;/p&gt;

&lt;h3&gt;
  
  
  步骤 1：登陆
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=csdn&amp;amp;utm_campaign=scrapenavercoupon" rel="noopener noreferrer"&gt;登陆&lt;/a&gt; Scrapeless。
### 步骤 2：点击“代理”并创建频道。
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uyjlp2zkd2azftx132r.png" alt="点击“代理”并创建频道。" width="800" height="304"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  步骤3：获取代码
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;点击“开始”，然后在操作框中填写您需要的信息，然后点击“生成”。稍等片刻，您将在右侧看到我们为您生成的旋转代理。现在点击“复制”即可使用。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4syoi1fwx6gf8jiezkd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh4syoi1fwx6gf8jiezkd.png" alt="获取代码" width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;或者，您可以将我们的代理代码集成到您的项目中：&lt;/p&gt;

&lt;p&gt;代码：&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl --proxy host:port --proxy-user username:password API_URL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Browser:&lt;/p&gt;

&lt;p&gt;Selenium&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from seleniumbase import Driver proxy = 'username:password@gw-us.scrapeless.com:8789' driver = Driver(browser="chrome", headless=False, proxy=proxy) driver.get("API_URL") driver.quit()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Puppeteer&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const puppeteer =require('puppeteer'); (async() =&amp;gt; {const proxyUrl = 'http://gw-us.scrapeless.com:8789';const username = 'username';const password = 'password';const browser = await puppeteer.launch({args: [`--proxy-server=${proxyUrl}`],headless: false });const page = await browser.newPage();await page.authenticate({ username, password });await page.goto('API_URL');await browser.close(); })();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  总结
&lt;/h2&gt;

&lt;p&gt;抓取 Naver Smart Store 数据并非易事。使用 Scrapeless，您只需 10 行代码即可提取有价值的数据，节省您的时间和精力。无论您是开发人员、分析师还是企业主，Scrapeless 都能让您专注于获取洞见，而无需费力应对技术挑战。&lt;/p&gt;

&lt;p&gt;准备好了吗？立即&lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=devto&amp;amp;utm_campaign=scrapenavercoupon" rel="noopener noreferrer"&gt;访问&lt;/a&gt;获取所需工具，释放电商数据的全部潜力！&lt;/p&gt;

&lt;h2&gt;
  
  
  更多关于Scrapeless的信息
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://www.scrapeless.com/en?utm_source=official&amp;amp;utm_medium=devto&amp;amp;utm_campaign=scrapenavershopcoupon" rel="noopener noreferrer"&gt;官方网站&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://discord.gg/Np4CAHxB9a?utm_source=official&amp;amp;utm_medium=devto&amp;amp;utm_campaign=scrapenavershopcoupon" rel="noopener noreferrer"&gt;Discord社区&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=devto&amp;amp;utm_campaign=scrapenavershopcoupon" rel="noopener noreferrer"&gt;Scrapeless仪表盘&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>scraping</category>
      <category>webscraping</category>
      <category>scrapingtool</category>
      <category>python</category>
    </item>
    <item>
      <title>无刮擦抓取浏览器：一种高并发的AI自动化解决方案</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Fri, 18 Apr 2025 13:34:15 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/wu-gua-ca-zhua-qu-liu-lan-qi-chong-gao-bing-fa-de-aizi-dong-hua-jie-jue-fang-an-5313</link>
      <guid>https://dev.to/datacollectionscraper/wu-gua-ca-zhua-qu-liu-lan-qi-chong-gao-bing-fa-de-aizi-dong-hua-jie-jue-fang-an-5313</guid>
      <description>&lt;h2&gt;
  
  
  介绍：升级无缝抓取浏览器的并发能力
&lt;/h2&gt;

&lt;p&gt;作为 &lt;a href="https://www.scrapeless.com/zh/?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=scrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless&lt;/a&gt; 的开发者和创始团队，我们对人工智能自动化的未来充满真诚的热情。我们的使命是创建一个真正为 AI 设计的自动化浏览器。在过去的几年中，从 Browserless.io 到众多云服务供应商推出的“浏览器即服务”（BaaS），市场已经证明 AI 代理急需一种新的交互媒介——一个专为 AI 设计的基于云的浏览器。例如，Auto-GPT 可以自主在 Booking.com 上搜索最佳航班，或自动提交 Google 表单中的调查响应。同样，ChainGPT 的智能客户服务系统可以实时登录电子商务后台以检索订单数据并完成多步骤操作。这些能力背后追求的是高并发和“类人”模拟的极致。&lt;/p&gt;




&lt;p&gt;然而，我们观察到现有解决方案常常在两个关键点上出错：&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. 高并发扩展性：&lt;/strong&gt; 当数百或数千个代理任务同时针对一个网站时，单个节点迅速成为瓶颈。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. 真实的浏览行为：&lt;/strong&gt; 多维伪装，如指纹轮换、TLS 特性和鼠标轨迹，如果不够精确，会被电子商务平台和社交媒体的风险控制系统迅速标记。&lt;/p&gt;

&lt;p&gt;考虑到这些挑战，我们在产品设计阶段专注于两个关键领域：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;云弹性扩展：&lt;/strong&gt; Scrapeless 支持从十个到无限制的并发会话的无缝扩展，确保在高峰任务负载下零排队和零超时。&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;全栈类人保护：&lt;/strong&gt; 通过深度定制 Chromium 内核，Scrapeless 实现了多维指纹模糊、可控的 TLS 握手策略和渐进的鼠标/键盘模拟，使目标网站几乎不可能检测到异常。&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;更值得注意的是，在提供顶级性能的同时，我们将成本降低到行业标准解决方案的 70%，帮助开发者节省 60%-80% 的大规模测试和长时间运行任务的费用。无论您是需要通过每日抓取监控数千个 SKU，还是驱动数千个客户服务机器人跨多个网站，Scrapeless 都提供了最可靠和最具成本效益的基础设施。&lt;/p&gt;

&lt;p&gt;在接下来的章节中，我们将深入探讨 &lt;a href="https://www.scrapeless.com/zh/product/scraping-browser?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=scrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless 抓取浏览器&lt;/a&gt; 的定价优势、核心功能和未来路线图，让您全面了解为什么它是“AI 浏览器”时代的终极选择。&lt;/p&gt;




&lt;h2&gt;
  
  
  Scrapeless 抓取浏览器价格比较分析
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yhnob6320su4b3dnsfl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yhnob6320su4b3dnsfl.png" alt="Scrapeless 抓取浏览器价格" width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. 每小时费率和代理费的比较
&lt;/h3&gt;

&lt;p&gt;以下是竞争产品的每小时费率和代理费的价格范围比较。我们提炼出大致的定价范围，以帮助用户快速了解 Scrapeless 的性价比优势。&lt;/p&gt;

&lt;h4&gt;
  
  
  表：价格范围比较
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;工具名称&lt;/th&gt;
&lt;th&gt;每小时费率范围（美元/小时）&lt;/th&gt;
&lt;th&gt;代理费范围（美元/GB）&lt;/th&gt;
&lt;th&gt;并发支持&lt;/th&gt;
&lt;th&gt;备注&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scrapeless&lt;/td&gt;
&lt;td&gt;$0.063 – $0.090 /小时（根据并发和使用情况有所不同）&lt;/td&gt;
&lt;td&gt;$1.26 - $1.80 / GB&lt;/td&gt;
&lt;td&gt;50 / 100 / 200 / 400 / 600 / 1000 / 无限&lt;/td&gt;
&lt;td&gt;- 支持自定义代理&lt;br&gt;- 免费解决 Cloudflare、reCAPTCHA、AWS WAF 的 CAPTCHA；未来支持 Imagetotext CAPTCHA&lt;br&gt;- 费率根据实际使用情况而异&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browserbase&lt;/td&gt;
&lt;td&gt;$0.10 – $0.198 /小时（包括 2-5GB 免费代理）&lt;/td&gt;
&lt;td&gt;$10 / GB（超出免费配额后）&lt;/td&gt;
&lt;td&gt;3（基础） / 50（高级）&lt;/td&gt;
&lt;td&gt;- 支持自定义代理&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brightdata&lt;/td&gt;
&lt;td&gt;$0.10 /小时&lt;/td&gt;
&lt;td&gt;$9.5 / GB（标准）；$12.5 / GB（优质域名）&lt;/td&gt;
&lt;td&gt;无限&lt;/td&gt;
&lt;td&gt;- 不支持自定义代理&lt;br&gt;- 实际并发会话可能受以下因素影响：&lt;br&gt;  - 账户计划和使用限制&lt;br&gt;  - 可用带宽和系统资源&lt;br&gt;  - 计费设置和信用余额&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zenrows&lt;/td&gt;
&lt;td&gt;每小时 $0.09&lt;/td&gt;
&lt;td&gt;每GB $2.8 - $5.42&lt;/td&gt;
&lt;td&gt;多达 100&lt;/td&gt;
&lt;td&gt;- 可根据需求定制计划，价格为每GB $2.8&lt;br&gt;- 商业计划支持最高 100 个并发&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browserless&lt;/td&gt;
&lt;td&gt;每小时 $0.084 – $0.15（按“单位”计费）&lt;/td&gt;
&lt;td&gt;每GB $4.3&lt;/td&gt;
&lt;td&gt;3 / 10 / 50&lt;/td&gt;
&lt;td&gt;- 支持定制代理&lt;br&gt;- 每1000个hCaptcha和reCaptcha解决方案$7&lt;br&gt;- 每个“单位”等于0.00833小时的浏览器时间&lt;br&gt;- Cloudflare旁路功能免费提供&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  2. 并发场景下的价格比较
&lt;/h3&gt;

&lt;p&gt;为了更直观地展示Scrapeless的价格优势，我们通过典型使用场景进行比较。&lt;/p&gt;

&lt;h4&gt;
  
  
  案例 1：单请求（1个浏览器实例）
&lt;/h4&gt;

&lt;p&gt;假设用户发起单个请求（例如，登录ChatGPT），持续1小时，消耗1GB流量：&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scrapeless（基于标准套餐费率）：&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;每小时费率：$0.072&lt;/li&gt;
&lt;li&gt;代理费用：$1.44&lt;/li&gt;
&lt;li&gt;总成本 = 0.072 + 1.44 = $1.512&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;竞争对手（以Brightdata为例）：&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;每小时费率：$0.10&lt;/li&gt;
&lt;li&gt;代理费用：$9.5（标准）&lt;/li&gt;
&lt;li&gt;总成本 = 0.10 + 9.5 = $9.6&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;成本优势：Scrapeless节省了约84.25%的成本。&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  案例 2：大规模并发场景（100个浏览器实例）
&lt;/h4&gt;

&lt;p&gt;Scrapeless的用户正在建立一个基于LLM的市场排名监控系统，以实时抓取多个网站的数据并生成动态排名报告。他们目前的业务需求要求同时运行100个浏览器实例，持续1小时，消耗40GB流量。&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scrapeless（基于标准套餐费率）：&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;每小时费率：0.072 × 100 = 7.2&lt;/li&gt;
&lt;li&gt;代理费用：1.44 × 40 = 57.6&lt;/li&gt;
&lt;li&gt;总成本 = 7.2 + 57.6 = $64.8&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;竞争对手（以Zenrows为例）：&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;每小时费率：0.09 × 100 = 9&lt;/li&gt;
&lt;li&gt;代理费用：2.8 × 40 = 112&lt;/li&gt;
&lt;li&gt;总成本 = 9 + 112 = $121&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;成本优势：Scrapeless节省了约46.45%的成本。&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;该用户在项目早期阶段对主流浏览器自动化工具进行了详细的价格和性能比较。他们发现许多竞争对手在处理大规模并发任务时存在以下问题：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;高并发支持不足：大多数工具的最大并发限制低，无法满足100个实例的需求。用户未来的并发需求将超过500，而市场上很少有产品能支持这一水平的需求。&lt;/li&gt;
&lt;li&gt;高额附加费用：某些产品对高并发任务收取额外费用，导致整体成本暴增。&lt;/li&gt;
&lt;li&gt;技术支持有限：在遇到验证码或反抓取机制时，某些工具缺乏内置解决方案，增加了开发复杂性。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;经过全面评估，该用户最终选择了&lt;a href="https://www.scrapeless.com/zh/product/scraping-browser?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=scrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless Scraping Browser&lt;/a&gt;。他们表示Scrapeless不仅提供显著的成本优势（节省了近47%的成本），还确保了数据抓取系统的效率和可靠性。&lt;/p&gt;




&lt;h2&gt;
  
  
  Scrapeless Scraping Browser：针对AI代理的基于云的浏览器自动化工具
&lt;/h2&gt;

&lt;p&gt;Scrapeless Scraping Browser是一个基于云的浏览器自动化工具，旨在用于数据抓取、AI代理和代理系统。通过深层模拟技术提供真实的浏览器环境，支持动态指纹混淆和TLS指纹欺骗，以确保高度人性化的用户行为。此外，它完全由用户控制，不存储任何数据，确保合规性和隐私保护。&lt;/p&gt;




&lt;h3&gt;
  
  
  技术优势
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. 真实浏览器环境
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Chrome内核支持：提供完整的浏览器环境，模拟真实用户行为。&lt;/li&gt;
&lt;li&gt;TLS指纹欺骗：通过伪造TLS指纹打破传统的反抓取机制，伪装成普通浏览器。&lt;/li&gt;
&lt;li&gt;动态指纹混淆：动态调整浏览器环境变量（例如，User-Agent、Canvas、WebGL），增强人性化行为并绕过高级反抓取策略。&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  2. 云部署和可扩展性
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;云架构：完全基于云，消除对本地资源的需求，支持无缝的全球分布式部署。&lt;/li&gt;
&lt;li&gt;高并发支持：支持无限并行任务，适合大规模数据抓取和复杂的自动化场景。&lt;/li&gt;
&lt;li&gt;易于集成：可以与现有自动化框架（如Playwright、Puppeteer）无缝集成，而无需代码重构。&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  3. 专为AI代理设计
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;自动代理支持：提供强大的代理功能，帮助AI代理执行复杂的浏览器自动化任务。&lt;/li&gt;
&lt;li&gt;灵活调用：支持多任务并行处理，使其成为构建智能代理系统和AI驱动应用程序的理想工具。&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  核心特点
&lt;/h3&gt;

&lt;p&gt;Scrapeless Scraping Browser 的核心竞争力在于其强大的功能和灵活性，特别在以下三个领域表现突出：&lt;/p&gt;

&lt;h4&gt;
  
  
  (1) CAPTCHA 解决能力
&lt;/h4&gt;

&lt;p&gt;Scrapeless Scraping Browser 具有先进的 CAPTCHA 解决能力，能够自动处理主流的 CAPTCHA 类型，如 reCAPTCHA 和 Cloudflare Turnstile。&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;行业领先的成功率：Scrapeless 提供了一个高效的 CAPTCHA 解决方案，成功率超过 98%。&lt;/li&gt;
&lt;li&gt;无额外费用：虽然大多数竞争对手会收取额外的 CAPTCHA 解决费用，Scrapeless 将这一功能集成到基础服务中，无需额外费用。&lt;/li&gt;
&lt;li&gt;实时处理：CAPTCHA 解决引擎在毫秒级别内完成任务，确保任务执行流畅。
#### (2) 工具集成支持&lt;/li&gt;
&lt;li&gt;全面自动化工具支持：Scrapeless 支持 Puppeteer 和 Playwright 等流行的浏览器自动化工具，使开发人员能够快速集成。&lt;/li&gt;
&lt;li&gt;AI 集成能力：Scrapeless 计划与 Browser Use、Computer Use 和 LangChain 深度集成，探索大型语言模型（LLMs）进一步的能力，以扩展 AI 驱动的动态网络互动用例。&lt;/li&gt;
&lt;li&gt;易用性：提供详细的文档和示例代码，帮助用户快速上手。
#### (3) 并发支持&lt;/li&gt;
&lt;li&gt;灵活的并发能力：Scrapeless 支持从 50 到无限的并发，满足小型任务和大规模自动化需求。&lt;/li&gt;
&lt;li&gt;无额外费用：虽然竞争对手通常在高并发场景下收取额外费用，Scrapeless 提供透明灵活的定价模型。&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Scrapeless Scraping Browser 的未来计划
&lt;/h2&gt;

&lt;p&gt;未来，Scrapeless Scraping Browser 将继续优化其核心功能，以满足多样化的需求，从基础抓取到复杂的 AI 驱动自动化，为用户提供更强大的工具。以下是我们更新的重点关注领域：&lt;/p&gt;

&lt;h3&gt;
  
  
  1. 核心功能增强
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;指纹配置：支持灵活配置时区、语言、用户代理和屏幕分辨率等环境变量，以增强类人行为。&lt;/li&gt;
&lt;li&gt;代理路由规则：推出自定义代理路由功能，允许根据域名或位置将流量引导到不同的代理。提供会话 API 用于会话管理。&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. 调试和监控
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;直播视图：在 Playground 中提供实时视图，便于调试和任务接管。&lt;/li&gt;
&lt;li&gt;会话管理：支持会话重放、检查器和元数据查询，以增强任务监控能力。&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. 文件处理
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;上传：使用 Playwright、Puppeteer 或 Selenium 轻松上传文件到目标网站。&lt;/li&gt;
&lt;li&gt;下载：下载的文件自动存储在云中，并在文件名中附加 Unix 时间戳（例如，sample-1719265797164.pdf），以避免冲突。&lt;/li&gt;
&lt;li&gt;检索：通过 API 快速检索文件，适用于数据抓取和报告生成等场景。&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  4. 上下文 API 和扩展支持
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;上下文 API：引入上下文会话持久化，以优化登录和多步骤自动化场景。&lt;/li&gt;
&lt;li&gt;扩展支持：通过加载您自己的 Chrome 扩展增强浏览器会话。&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  5. 元数据查询
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;使用自定义标签和带有元数据的会话查询。&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  6. SDK 和 API 升级
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;会话 API：提供会话管理功能，以简化任务操作。&lt;/li&gt;
&lt;li&gt;CDP 事件优化：扩展 CDP 支持，包括获取页面 HTML、点击元素、滚动和截屏等功能。&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  总结
&lt;/h2&gt;

&lt;p&gt;当前的浏览器自动化工具在赋能 AI 驱动场景时面临诸多挑战：&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;高并发瓶颈导致任务失败。&lt;/li&gt;
&lt;li&gt;人类行为不足，使反抓取机制容易检测到自动化。&lt;/li&gt;
&lt;li&gt;高成本限制了大规模任务的可行性。&lt;/li&gt;
&lt;li&gt;复杂的集成造成陡峭的学习曲线，导致效率低下。&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scrapeless Scraping Browser 通过三项关键创新重新定义了“针对 AI 的浏览器”：&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;云弹性扩展：支持从几十个到无限的并发会话的无缝扩展，充分释放高吞吐量潜力。&lt;/li&gt;
&lt;li&gt;全栈类人保护：对 Chromium 内核进行深度定制，提供指纹混淆、TLS 握手策略和渐进式行为模拟，轻松绕过反抓取限制。&lt;/li&gt;
&lt;li&gt;无与伦比的成本效率和兼容性：与其他解决方案相比，成本降低 60%-80%，同时保持与 Playwright 和 Puppeteer 的兼容性，降低开发门槛。
我们也在积极探索以AI为中心的下一代技术。我们热烈欢迎开发者和团队分享对我们产品的优化建议或功能请求。您的反馈至关重要，将帮助我们不断改进Scrapeless Scraping Browser，为您提供更好的体验。&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  了解更多关于Scrapeless
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.scrapeless.com/en?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=scrapingbrowser" rel="noopener noreferrer"&gt;&lt;strong&gt;官方网站&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://discord.gg/Np4CAHxB9a?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=scrapingbrowser" rel="noopener noreferrer"&gt;&lt;strong&gt;Discord社区&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://app.scrapeless.com/passport/login" rel="noopener noreferrer"&gt;&lt;strong&gt;Scrapeless仪表板&lt;/strong&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>scraping</category>
      <category>scrapingbrowser</category>
    </item>
    <item>
      <title>Scrapeless MCP co-creation plan is coming!</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Thu, 17 Apr 2025 12:15:06 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/scrapeless-mcp-co-creation-plan-is-coming-cok</link>
      <guid>https://dev.to/datacollectionscraper/scrapeless-mcp-co-creation-plan-is-coming-cok</guid>
      <description>&lt;p&gt;Scrapeless officially launches the MCP (Model Context Protocol) ecosystem partner program, targeting AI application developers, industry solution providers, and toolchain developers, opening up our next-generation AI real-time enhancement capabilities and sharing market dividends!&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Scrapeless MCP Server?
&lt;/h2&gt;

&lt;p&gt;Scrapeless MCP Server is an AI-enhanced server built on the MCP protocol , which helps LLM (such as Claude, GPT) call external information. It can directly integrate all Scrapeless tools: Scraping Browser, Scraping API, SerpAPI.&lt;br&gt;
You can click to view the Scrapeless MCP Server info:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MCP.SO&lt;/li&gt;
&lt;li&gt;Github&lt;/li&gt;
&lt;li&gt;NPM&lt;/li&gt;
&lt;li&gt;Glama.ai&lt;/li&gt;
&lt;li&gt;Smithery.ai&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;We need you:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Help Scrapeless Build MCP Server

&lt;ul&gt;
&lt;li&gt;You can also submit a PR to our existing Scrapeless Server &lt;a href="https://github.com/scrapeless-ai/scrapeless-mcp-server" rel="noopener noreferrer"&gt;Github repository &lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;You can also use Scrapeless MCP Server to build tools for specific scenarios
You can submit your content to us in the form of a document, or you can publish your content to any platform.
You can click to see our sample cases：&lt;a href="https://www.scrapeless.com/en/blog/mcp-cursor-ecommerce-assistant" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://www.scrapeless.com/en/blog/mcp-cursor-ecommerce-assistant" rel="noopener noreferrer"&gt;https://www.scrapeless.com/en/blog/mcp-cursor-ecommerce-assistant&lt;/a&gt;
&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  🎁 Rewards
&lt;/h2&gt;

&lt;p&gt;Best Project Award (3 in total)&lt;br&gt;
Scrapeless offers an annual free subscription to recognize the most creative and innovative proposals.&lt;br&gt;
Special Awards (5 in total)&lt;br&gt;
Scrapeless $99 monthly subscription to recognize outstanding MCP application scenarios, tutorials, or documentation submitted.&lt;br&gt;
Share Communication Award (several)&lt;/p&gt;

&lt;p&gt;Publicly share your proposal on social media with a hashtag (such as #Scrapeless MCP Server) and we will select lucky participants to receive a free trial of Scrapeless.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do apply?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Join the &lt;a href="https://discord.gg/Np4CAHxB9a" rel="noopener noreferrer"&gt;Scrapeless Discord community&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Contact &lt;a class="mentioned-user" href="https://dev.to/liam"&gt;@liam&lt;/a&gt; to submit a cooperation application&lt;/li&gt;
&lt;li&gt;We will complete the preliminary assessment and docking within three working days&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Related Documents&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scrapeless Documentation: &lt;a href="https://apidocs.scrapeless.com/" rel="noopener noreferrer"&gt;https://apidocs.scrapeless.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Scrapeless Discord：&lt;a href="https://discord.gg/Np4CAHxB9a" rel="noopener noreferrer"&gt;https://discord.gg/Np4CAHxB9a&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you have any ideas or suggestions about our products, please feel free to join our Discord community and communicate with us directly. We look forward to hearing from you!&lt;/p&gt;

</description>
      <category>mcp</category>
    </item>
    <item>
      <title>Why Browserless (Scrapeless scraping browser) can be the infrastructure of your AI Agent</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Thu, 17 Apr 2025 02:07:35 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/why-browserless-scrapeless-scraping-browser-can-be-the-infrastructure-of-your-ai-agent-22l9</link>
      <guid>https://dev.to/datacollectionscraper/why-browserless-scrapeless-scraping-browser-can-be-the-infrastructure-of-your-ai-agent-22l9</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;




&lt;p&gt;In the context of the rapid development of artificial intelligence technology, AI agents are playing an increasingly important role in automating tasks, especially those that involve retrieving web information. For such tasks, efficiently and accurately scraping and parsing web content presents a significant challenge. In this article, we will explore the recently released Browser Use and Scrapeless Scraping Browser and their impact on AI agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  PART1.Browser Use: Enable AI agents to efficiently parse web pages
&lt;/h2&gt;




&lt;p&gt;On March 23, 2025, the startup Browser Use announced the completion of a $17 million funding round, led by Felicis Ventures with support from several well-known investment firms. Browser Use is an AI-driven browser automation agent capable of efficiently parsing web content and helping AI agents automate a variety of online tasks. The company was founded by Gregor Žunič and Magnus Müller, who initially developed a prototype within four days and successfully launched it on Hacker News, gaining widespread attention.&lt;/p&gt;

&lt;p&gt;The core technology of Browser Use is transforming each website into structured text, helping AI agents better understand and interact with webpages without relying on costly and inefficient computer vision methods. This approach allows AI agents to parse webpages as if handling databases, improving task execution efficiency and addressing common issues like IP bans and captchas. With proxy rotation and persistent session support, Browser Use ensures the stability and efficiency of tasks, enhancing the web browsing speed and accuracy of AI agents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3pm6ksyt7aq43271j6f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3pm6ksyt7aq43271j6f.png" alt="Browser Use" width="800" height="583"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, simply enabling AI agents to "understand webpages" is not enough. In reality, websites are constantly changing and implementing various anti-scraping measures such as IP blocking, captcha triggers, and user behavior detection, creating significant obstacles for AI agents when performing tasks. &lt;/p&gt;

&lt;p&gt;While Browser Use addresses some of these issues through proxy rotation and persistent sessions, AI agents may still face challenges like fingerprint detection, dynamic rendering, and TLS anti-detection in more complex scenarios. &lt;/p&gt;

&lt;p&gt;This is where the Scrapeless Scraping Browser comes into play. Additionally, it is better suited for large-scale scraping and automation tasks, supporting parallel scraping and efficient management of large data requests to ensure task stability and efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  PART2.Browserless (Scrapeless Scraping Browser): The Ideal Infrastructure for AI Agents
&lt;/h2&gt;




&lt;p&gt;In the previous section, we explored how Browser Use helps &lt;a href="https://www.scrapeless.com/en/ai-agent?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;AI agents&lt;/a&gt; handle web tasks more effectively through efficient webpage parsing and information structuring. However, to truly enable AI agents to perform various online tasks in a stable and intelligent manner, the Scrapeless Scraping Browser offers a more advanced and comprehensive infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  PART 2.1 How to Achieve Browser Use's Capabilities and Enhance Data Scraping Performance with Scraping Browser
&lt;/h3&gt;

&lt;p&gt;Before diving into a detailed comparison between Scraping Browser and Browser Use, it's important to first understand their respective functionalities and technical implementations. While both involve browser automation and data scraping, they differ significantly in many aspects and are suitable for different use cases. In this section, we will analyze the differences between the two in terms of functionality, technical implementation, use cases, and ease of use, and explore how Scraping Browser can achieve the existing capabilities of Browser Use.&lt;/p&gt;

&lt;h4&gt;
  
  
  1.1  Functionality Overview
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Browser Use:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a Python library focused on automation, Browser Use is primarily aimed at developers and provides AI agents with browser control to facilitate automated tasks. It offers users a simple API that makes it easy to navigate, interact with, and scrape data from websites.&lt;/p&gt;

&lt;p&gt;Its core strength lies in its flexibility, making it ideal for developers who wish to perform customized browser operations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scraping browser:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In comparison, Scraping Browser&lt;a href="https://www.scrapeless.com/en/product/scraping-browser" rel="noopener noreferrer"&gt;Scraping Browser&lt;/a&gt; is more focused on offering efficient web scraping solutions, especially when it comes to bypassing anti-scraping technologies. With cloud fingerprinting technology, Scraping Browser simulates real user behavior to minimize the risk of being detected as a bot by target websites.&lt;/p&gt;

&lt;p&gt;Its functionality is better suited for large-scale data scraping, especially in scenarios involving complex anti-scraping measures.&lt;/p&gt;

&lt;h4&gt;
  
  
  1.2 Technical Implementation
&lt;/h4&gt;

&lt;p&gt;Next, we’ll take a deeper look at the technical differences between the two:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser Use:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Browser Use relies on powerful browser automation frameworks (such as Playwright) to perform browser operations locally or on the cloud. Its technical implementation is highly flexible, making it suitable for developers with custom needs.&lt;/p&gt;

&lt;p&gt;Users can highly customize operations according to specific requirements, such as simulating different user behaviors or controlling the browser to perform specific tasks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scraping browser:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike Browser Use, Scraping Browser uses cloud services and fingerprint technology, employing methods like dynamic IP rotation and user agent masking to ensure simulated user behavior appears more realistic. This allows it to bypass target websites' anti-scraping measures, resulting in more efficient data scraping.&lt;/p&gt;

&lt;p&gt;Scraping Browser’s technical advantage lies in its ability to support large-scale scraping tasks, handle complex anti-scraping mechanisms, and ensure successful data scraping even when frequently changing IPs and user agents.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Don’t let complex anti-scraping measures slow you down! &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Log in now&lt;/a&gt; and use Scrapeless Scraping Browser to enhance your web scraping tasks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  1.3 Use Cases
&lt;/h4&gt;

&lt;p&gt;The differences in functionality naturally lead to different use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser Use:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Browser Use is more suitable for developers performing small-scale, customized automation tasks, or in scenarios where AI agents are involved. For tasks that don't require large-scale, high-frequency data scraping, Browser Use offers sufficient flexibility and customization options.&lt;/p&gt;

&lt;p&gt;For example, developers might use Browser Use to automate data extraction tasks from specific websites or create AI tools that integrate browser control.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scraping browser:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scraping Browser shines in its adaptability, particularly in large-scale data scraping tasks that involve overcoming complex anti-scraping technologies. For tasks requiring frequent access and scraping of vast amounts of data, Scraping Browser is undoubtedly the better choice.&lt;/p&gt;

&lt;p&gt;It is particularly useful for high-frequency, large-scale scraping tasks, such as e-commerce websites or social media data scraping, where it can effectively bypass stringent anti-scraping measures.&lt;/p&gt;

&lt;h4&gt;
  
  
  1.4 Ease of Use
&lt;/h4&gt;

&lt;p&gt;While both tools offer automation features, there are notable differences in terms of ease of use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser Use:
As a Python library aimed at developers, Browser Use provides extensive documentation, examples, and tutorials to help developers get started quickly. However, it requires users to have a certain level of programming skills to customize operations as needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developers with programming experience, Browser Use's flexibility makes it an attractive choice.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scraping browser:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scraping Browser typically offers a more comprehensive service where users don’t need to focus on technical details and can focus more on data scraping itself. It provides a more intuitive user interface and better usability, especially for those without programming skills.&lt;/p&gt;

&lt;p&gt;Since it uses cloud fingerprinting technology behind the scenes, users only need to configure scraping tasks without diving deep into the technical implementation.&lt;/p&gt;

&lt;p&gt;In summary, Browser Use is more flexible and suited for developers looking to perform customized automation tasks, while Scraping Browser focuses on efficient and secure data scraping, particularly when dealing with anti-scraping technologies. The choice of which tool to use depends on specific needs and use cases.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Start scraping smarter today! No more hassle with complex webpage parsing—use Scrapeless' scraping browser to make your AI agent tasks faster and more accurate. Log in now and begin your journey: &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Login Here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Overall, Browser Use is more flexible and suitable for developers to perform personalized automation operations, while Cloud Fingerprint Crawling Browser focuses on efficient and secure data capture, especially when dealing with anti-crawling technology. The choice of which tool to use depends on specific needs and usage scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  PART2.2 Scraping Browser vs. Browser Use
&lt;/h3&gt;

&lt;p&gt;In this section, we explore how &lt;a href="https://www.scrapeless.com/en/product/scraping-browser?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scraping Browser&lt;/a&gt; achieves the existing capabilities of Browser Use.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tlgnw8g7lixt1wvp38l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tlgnw8g7lixt1wvp38l.png" alt="Scraping Browser" width="800" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Optimize web scraping and boost your productivity! Let Scrapeless' scraping browser become the backbone of your AI agent, solving web scraping challenges. Log in now and experience its powerful features: &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Login Here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  1. Strong anti-blockade capability
&lt;/h4&gt;

&lt;p&gt;For most network tasks, especially data scraping tasks, preventing blocking and &lt;a href="https://www.scrapeless.com/en/blog/get-around-anti-bot?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;bypassing anti-crawling mechanisms&lt;/a&gt; is crucial. Scrapeless Scraping Browser provides multiple layers of protection in this regard.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proxy IP pool and auto-rotation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scrapeless Scraping Browser provides a richer proxy IP pool that can automatically rotate IPs to avoid being blocked due to frequent requests from the same IP. This dynamic IP switching method greatly reduces the probability of being detected by the target website for crawlers.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Efficient Captcha unlocking technology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many websites employ CAPTCHA mechanisms such as reCAPTCHA or &lt;a href="https://www.scrapeless.com/en/blog/bypass-cloudflare-challenges?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Cloudflare Turnstile Challenge&lt;/a&gt; to block automated tools. Scrapeless Scraping Browser has strong CAPTCHA handling capabilities, using intelligent algorithms and automated unlocking techniques to quickly bypass these challenges, ensuring that AI agents can continue scraping data without interruptions due to CAPTCHAs. This makes &lt;a href="https://www.scrapeless.com/en/product/scraping-browser?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless Scraping Browser&lt;/a&gt; highly effective and stable when working with highly secure websites.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Highly personified interactive simulation
&lt;/h4&gt;

&lt;p&gt;To ensure that AI agents can browse a web like a real user, the Scrapeless Scraping Browser integrates multiple, anthropomorphic interaction simulation techniques.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic Fingerprint Obfuscation Technology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This technology allows Scraping Browser to simulate user behaviors such as mouse tracks, scrolling, clicking, etc. at the Chrome kernel level , thus avoiding being recognized as an automation tool by the target website. In this way, Scrapeless Scraping Browser makes requests from AI agents appear almost identical to the behavior of ordinary users, effectively bypassing common anti-crawling strategies.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support dynamic rendering of JavaScript-heavy websites&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many modern websites rely on JavaScript to dynamically load content, which poses challenges to traditional crawlers. Scrapeless Scraping Browser can handle JavaScript-heavy websites, ensuring that AI agents can access all dynamically rendered content on the webpage, not just static HTML pages. This enables it to crawl more complex webpage data and meet the needs of modern internet.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Advanced anti-detection mechanism
&lt;/h4&gt;

&lt;p&gt;Scrapeless Scraping Browser uses various technologies to hide the crawling features of AI agents, avoiding recognition and blocking by target websites.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS Fingerprint Forgery Technology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Through &lt;a href="https://www.scrapeless.com/en/blog/tls-fingerprinting?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;TLS fingerprint&lt;/a&gt; forgery, Scrapeless Scraping Browser can disguise itself as a normal browser access, avoiding the detection of crawler tools by target websites. TLS (Transport Layer Security Protocol) fingerprint forgery is an advanced security technology that simulates the unique identity of the browser during connection, increasing the anti-interference ability of anti-crawling technology.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real browser environment for anti-detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In order to avoid being recognized by crawlers, Scrapeless Scraping Browser makes the browser environment as close as possible to the behavior of real users, using a real browser environment to perform tasks. Unlike crawlers that use Computer Vision and image recognition, this method can effectively reduce the risk of recognition and interception, ensuring that requests from AI agents are not marked as malicious by the target website.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Real-time data statistics and session management
&lt;/h4&gt;

&lt;p&gt;Scrapeless Scraping Browser introduces real-time data statistics to ensure efficient and controllable Data Acquisition process. Users can track session status in real-time, view the progress of each browser session (such as running, success, failure), and intuitively grasp the status of task execution to ensure smooth data capture.&lt;/p&gt;

&lt;p&gt;In addition, &lt;a href="https://www.scrapeless.com/en/product?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless&lt;/a&gt; has enhanced session management capabilities, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat list and records: Users can easily view historical and current conversations, and easily manage and monitor all conversations.&lt;/li&gt;
&lt;li&gt;Session stop function: Through the dashboard, users can directly terminate running sessions without manual intervention, greatly improving operational efficiency and flexibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Start scraping smarter today! No more hassle with complex webpage parsing—use Scrapeless' scraping browser to make your AI agent tasks faster and more accurate. Log in now and begin your journey: &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Login Here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  PART 2.3 Use Cases of Scraping Browser
&lt;/h3&gt;

&lt;p&gt;To more clearly demonstrate the powerful capabilities of Scraping Browser, let's look at a few typical use cases and how AI agents enable more intelligent data scraping.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. E-Commerce Website Data Collection and Price Monitoring
&lt;/h4&gt;

&lt;p&gt;Use Case: Cross-border e-commerce companies need to regularly monitor product prices and stock information on competitor websites to optimize their own pricing strategies.&lt;/p&gt;

&lt;p&gt;Challenges: The target website employs strict anti-scraping mechanisms, including dynamic IP blocking, CAPTCHA detection, and JavaScript-rendered pages.&lt;/p&gt;

&lt;p&gt;Solution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic IP Rotation: Use Scraping Browser’s proxy pool functionality to regularly change IPs and avoid being blocked.&lt;/li&gt;
&lt;li&gt;Advanced Fingerprint Simulation: Implement dynamic fingerprint obfuscation to make the browsing behavior resemble that of a real user.&lt;/li&gt;
&lt;li&gt;Automatic JavaScript Parsing: Ensure the scraped pages include all dynamically rendered content.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Whether you're monitoring eCommerce prices or collecting real-time travel data, Scrapeless is the solution you need. &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Log in now&lt;/a&gt; and streamline your data scraping with advanced automation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  2. Travel Industry Information Scraping and Price Comparison Analysis
&lt;/h4&gt;

&lt;p&gt;Use Case: A travel booking platform wants to scrape real-time price information from multiple airline and hotel websites to provide the best booking recommendations.&lt;/p&gt;

&lt;p&gt;Challenges: Many travel websites use dynamic loading technologies and have strict anti-scraping measures, such as TLS fingerprint detection and CAPTCHA validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS Fingerprint Spoofing: Scraping Browser simulates TLS fingerprints from different devices and browsers, making requests appear to come from real users.&lt;/li&gt;
&lt;li&gt;Intelligent CAPTCHA Solving: Use Scraping Browser’s CAPTCHA solution to automatically handle CAPTCHAs during login and query processes.&lt;/li&gt;
&lt;li&gt;Parallel Scraping: Improve the speed of data collection through multithreading and distributed architecture.&lt;/li&gt;
&lt;li&gt;AI Agent Predictive Analysis: Combine AI Agent to predict price trends and provide users with more accurate booking recommendations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  PART3. Bonus Tip: Bypass Cloudflare using Scraping Browser and Puppeteer
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;We firmly protect the privacy of the website. All data in this blog is public and is only used as a demonstration of the crawling process. We do not save any information and data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Scrapeless requires puppeteer-core, a Puppeteer version that doesn't download the Chrome binary. So, ensure you install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm install puppeteer-core

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 1. Sign up for Scrapeless, click API Key Management &amp;gt; Create API Key to create your Scrapeless API Key.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowse" rel="noopener noreferrer"&gt;Sign up for Scrapeless&lt;/a&gt; and get a free trial. If you have any questions, you can also contact Liam via &lt;a href="https://discord.com/invite/xBcTfGPjCQ?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowse" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Step 2. Then, go to Scraping Browser and copy your Browser URL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3xkbdqvcds3a6kmpoom.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3xkbdqvcds3a6kmpoom.png" alt="Bypass Cloudflare using Scraping Browser and Puppeteer" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Integrate the copied browser URL into your Puppeteer script like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const puppeteer = require('puppeteer-core');
const connectionURL = 'wss://browser.scrapeless.com/browser?token=&amp;lt;YOUR_Scrapeless_API_KEY&amp;gt;&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY';
(async () =&amp;gt; {// set up browser environmentconst browser = await puppeteer.connect({browserWSEndpoint: connectionURL,
    });
// create a new pageconst page = await browser.newPage();
// navigate to a URLawait page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {waitUntil: 'networkidle0',
    });
// wait for the challenge to resolveawait new Promise(function (resolve) {setTimeout(resolve, 10000);
    });
//take page screenshotawait page.screenshot({ path: 'screenshot.png' });// close the browser instanceawait browser.close();
})();

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;You need to replace &lt;a href="https://www.scrapingcourse.com/cloudflare-challenge" rel="noopener noreferrer"&gt;https://www.scrapingcourse.com/cloudflare-challenge&lt;/a&gt; with any website with cloudflare-challenge; Also replace your Scrapeless API Key in the token part.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The above code accesses and screenshots the protected page. See the result below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyw171wkgn1yxbh1uk9i7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyw171wkgn1yxbh1uk9i7.png" alt="cf challenge bypass" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Congratulations 🎉! You've successfully bypassed Cloudflare using Puppeteer and Scrapeless.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Scrapeless's technology is the infrastructure that supports AI agents to efficiently perform online tasks. Whether developers or enterprises, Scrapeless's scraping browser provides a flexible and low-cost solution when building and optimizing AI agents, making it an ideal choice for improving work efficiency, reducing development costs, and accelerating technological progress.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Unlock the power of seamless web scraping! With Scrapeless' scraping browser, you can turn any website into structured data effortlessly, boosting your AI agent’s performance. Log in and start today: &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Login Here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Faq
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. How does Scraping Browser bypass anti-scraping systems such as Cloudflare?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Scraping Browser combines dynamic IP proxy, automatic JavaScript parsing and browser fingerprint camouflage to bypass most anti-scraping mechanisms. Compared with traditional Puppeteer/Playwright solutions, it can simulate real user behavior and automatically adjust the request frequency through built-in strategies to increase the success rate. For specific methods, please refer to this article: &lt;a href="https://www.scrapeless.com/en/blog/cloudflare-challenge-bypass?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;How to Bypass Cloudflare Challenge&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. How to bypass CAPTCHA for web scraping?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can use Web Unlocker to efficiently bypass CAPTCHA protection and improve scraping success rates. For a detailed guide, check out: &lt;a href="https://www.scrapeless.com/en/blog/use-web-unlocker-to-bypass-captcha?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;How to Use Web Unlocker to Bypass CAPTCHA&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.  What is the best Scraping API？&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Answer: Scrapeless Scraping API supports scenarios such as Amazon, Shopee, Walmart, SHEIN, TikTok, Instagram, etc., specifically covering e-commerce, social media and other fields. It also covers SERP APIs for more than 20 Google search scenarios, including Google Flights, Google Maps and Google Trends. See: &lt;a href="https://app.scrapeless.com/dashboard/products/scraper?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless Scraping API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. How to scrape Google SERP data?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Google SERP data covers multiple scenarios, and Scrapeless API can cover over 20 scenarios of Google SERP. You can start scraping with just a simple registration. For more details, refer to: &lt;a href="https://www.scrapeless.com/en/blog/scrapeless-deep-serp-api?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless Deepserp API &lt;/a&gt; .&lt;/p&gt;

</description>
      <category>scraper</category>
      <category>scrapingbrowser</category>
      <category>scrapeless</category>
    </item>
    <item>
      <title>Why Browserless (Scrapeless scraping browser) can be the infrastructure of your AI Agent</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Thu, 03 Apr 2025 10:19:28 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/why-browserless-scrapeless-scraping-browser-can-be-the-infrastructure-of-your-ai-agent-2747</link>
      <guid>https://dev.to/datacollectionscraper/why-browserless-scrapeless-scraping-browser-can-be-the-infrastructure-of-your-ai-agent-2747</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;




&lt;p&gt;In the context of the rapid development of artificial intelligence technology, AI agents are playing an increasingly important role in automating tasks, especially those that involve retrieving web information. For such tasks, efficiently and accurately scraping and parsing web content presents a significant challenge. In this article, we will explore the recently released Browser Use and Scrapeless Scraping Browser and their impact on AI agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  PART1.Browser Use: Enable AI agents to efficiently parse web pages
&lt;/h2&gt;




&lt;p&gt;On March 23, 2025, the startup Browser Use announced the completion of a $17 million funding round, led by Felicis Ventures with support from several well-known investment firms. Browser Use is an AI-driven browser automation agent capable of efficiently parsing web content and helping AI agents automate a variety of online tasks. The company was founded by Gregor Žunič and Magnus Müller, who initially developed a prototype within four days and successfully launched it on Hacker News, gaining widespread attention.&lt;/p&gt;

&lt;p&gt;The core technology of Browser Use is transforming each website into structured text, helping AI agents better understand and interact with webpages without relying on costly and inefficient computer vision methods. This approach allows AI agents to parse webpages as if handling databases, improving task execution efficiency and addressing common issues like IP bans and captchas. With proxy rotation and persistent session support, Browser Use ensures the stability and efficiency of tasks, enhancing the web browsing speed and accuracy of AI agents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3pm6ksyt7aq43271j6f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3pm6ksyt7aq43271j6f.png" alt="Browser Use" width="800" height="583"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, simply enabling AI agents to "understand webpages" is not enough. In reality, websites are constantly changing and implementing various anti-scraping measures such as IP blocking, captcha triggers, and user behavior detection, creating significant obstacles for AI agents when performing tasks. &lt;/p&gt;

&lt;p&gt;While Browser Use addresses some of these issues through proxy rotation and persistent sessions, AI agents may still face challenges like fingerprint detection, dynamic rendering, and TLS anti-detection in more complex scenarios. &lt;/p&gt;

&lt;p&gt;This is where the Scrapeless Scraping Browser comes into play. Additionally, it is better suited for large-scale scraping and automation tasks, supporting parallel scraping and efficient management of large data requests to ensure task stability and efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  PART2.Browserless (Scrapeless Scraping Browser): The Ideal Infrastructure for AI Agents
&lt;/h2&gt;




&lt;p&gt;In the previous section, we explored how Browser Use helps &lt;a href="https://www.scrapeless.com/en/ai-agent?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;AI agents&lt;/a&gt; handle web tasks more effectively through efficient webpage parsing and information structuring. However, to truly enable AI agents to perform various online tasks in a stable and intelligent manner, the Scrapeless Scraping Browser offers a more advanced and comprehensive infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  PART 2.1 How to Achieve Browser Use's Capabilities and Enhance Data Scraping Performance with Scraping Browser
&lt;/h3&gt;

&lt;p&gt;Before diving into a detailed comparison between Scraping Browser and Browser Use, it's important to first understand their respective functionalities and technical implementations. While both involve browser automation and data scraping, they differ significantly in many aspects and are suitable for different use cases. In this section, we will analyze the differences between the two in terms of functionality, technical implementation, use cases, and ease of use, and explore how Scraping Browser can achieve the existing capabilities of Browser Use.&lt;/p&gt;

&lt;h4&gt;
  
  
  1.1  Functionality Overview
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Browser Use:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a Python library focused on automation, Browser Use is primarily aimed at developers and provides AI agents with browser control to facilitate automated tasks. It offers users a simple API that makes it easy to navigate, interact with, and scrape data from websites.&lt;/p&gt;

&lt;p&gt;Its core strength lies in its flexibility, making it ideal for developers who wish to perform customized browser operations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scraping browser:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In comparison, Scraping Browser&lt;a href="https://www.scrapeless.com/en/product/scraping-browser" rel="noopener noreferrer"&gt;Scraping Browser&lt;/a&gt; is more focused on offering efficient web scraping solutions, especially when it comes to bypassing anti-scraping technologies. With cloud fingerprinting technology, Scraping Browser simulates real user behavior to minimize the risk of being detected as a bot by target websites.&lt;/p&gt;

&lt;p&gt;Its functionality is better suited for large-scale data scraping, especially in scenarios involving complex anti-scraping measures.&lt;/p&gt;

&lt;h4&gt;
  
  
  1.2 Technical Implementation
&lt;/h4&gt;

&lt;p&gt;Next, we’ll take a deeper look at the technical differences between the two:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser Use:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Browser Use relies on powerful browser automation frameworks (such as Playwright) to perform browser operations locally or on the cloud. Its technical implementation is highly flexible, making it suitable for developers with custom needs.&lt;/p&gt;

&lt;p&gt;Users can highly customize operations according to specific requirements, such as simulating different user behaviors or controlling the browser to perform specific tasks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scraping browser:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike Browser Use, Scraping Browser uses cloud services and fingerprint technology, employing methods like dynamic IP rotation and user agent masking to ensure simulated user behavior appears more realistic. This allows it to bypass target websites' anti-scraping measures, resulting in more efficient data scraping.&lt;/p&gt;

&lt;p&gt;Scraping Browser’s technical advantage lies in its ability to support large-scale scraping tasks, handle complex anti-scraping mechanisms, and ensure successful data scraping even when frequently changing IPs and user agents.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Don’t let complex anti-scraping measures slow you down! &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Log in now&lt;/a&gt; and use Scrapeless Scraping Browser to enhance your web scraping tasks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  1.3 Use Cases
&lt;/h4&gt;

&lt;p&gt;The differences in functionality naturally lead to different use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser Use:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Browser Use is more suitable for developers performing small-scale, customized automation tasks, or in scenarios where AI agents are involved. For tasks that don't require large-scale, high-frequency data scraping, Browser Use offers sufficient flexibility and customization options.&lt;/p&gt;

&lt;p&gt;For example, developers might use Browser Use to automate data extraction tasks from specific websites or create AI tools that integrate browser control.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scraping browser:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scraping Browser shines in its adaptability, particularly in large-scale data scraping tasks that involve overcoming complex anti-scraping technologies. For tasks requiring frequent access and scraping of vast amounts of data, Scraping Browser is undoubtedly the better choice.&lt;/p&gt;

&lt;p&gt;It is particularly useful for high-frequency, large-scale scraping tasks, such as e-commerce websites or social media data scraping, where it can effectively bypass stringent anti-scraping measures.&lt;/p&gt;

&lt;h4&gt;
  
  
  1.4 Ease of Use
&lt;/h4&gt;

&lt;p&gt;While both tools offer automation features, there are notable differences in terms of ease of use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser Use:
As a Python library aimed at developers, Browser Use provides extensive documentation, examples, and tutorials to help developers get started quickly. However, it requires users to have a certain level of programming skills to customize operations as needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developers with programming experience, Browser Use's flexibility makes it an attractive choice.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scraping browser:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scraping Browser typically offers a more comprehensive service where users don’t need to focus on technical details and can focus more on data scraping itself. It provides a more intuitive user interface and better usability, especially for those without programming skills.&lt;/p&gt;

&lt;p&gt;Since it uses cloud fingerprinting technology behind the scenes, users only need to configure scraping tasks without diving deep into the technical implementation.&lt;/p&gt;

&lt;p&gt;In summary, Browser Use is more flexible and suited for developers looking to perform customized automation tasks, while Scraping Browser focuses on efficient and secure data scraping, particularly when dealing with anti-scraping technologies. The choice of which tool to use depends on specific needs and use cases.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Start scraping smarter today! No more hassle with complex webpage parsing—use Scrapeless' scraping browser to make your AI agent tasks faster and more accurate. Log in now and begin your journey: &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Login Here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Overall, Browser Use is more flexible and suitable for developers to perform personalized automation operations, while Cloud Fingerprint Crawling Browser focuses on efficient and secure data capture, especially when dealing with anti-crawling technology. The choice of which tool to use depends on specific needs and usage scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  PART2.2 Scraping Browser vs. Browser Use
&lt;/h3&gt;

&lt;p&gt;In this section, we explore how &lt;a href="https://www.scrapeless.com/en/product/scraping-browser?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scraping Browser&lt;/a&gt; achieves the existing capabilities of Browser Use.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tlgnw8g7lixt1wvp38l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tlgnw8g7lixt1wvp38l.png" alt="Scraping Browser" width="800" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Optimize web scraping and boost your productivity! Let Scrapeless' scraping browser become the backbone of your AI agent, solving web scraping challenges. Log in now and experience its powerful features: &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Login Here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  1. Strong anti-blockade capability
&lt;/h4&gt;

&lt;p&gt;For most network tasks, especially data scraping tasks, preventing blocking and &lt;a href="https://www.scrapeless.com/en/blog/get-around-anti-bot?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;bypassing anti-crawling mechanisms&lt;/a&gt; is crucial. Scrapeless Scraping Browser provides multiple layers of protection in this regard.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proxy IP pool and auto-rotation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scrapeless Scraping Browser provides a richer proxy IP pool that can automatically rotate IPs to avoid being blocked due to frequent requests from the same IP. This dynamic IP switching method greatly reduces the probability of being detected by the target website for crawlers.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Efficient Captcha unlocking technology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many websites employ CAPTCHA mechanisms such as reCAPTCHA or &lt;a href="https://www.scrapeless.com/en/blog/bypass-cloudflare-challenges?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Cloudflare Turnstile Challenge&lt;/a&gt; to block automated tools. Scrapeless Scraping Browser has strong CAPTCHA handling capabilities, using intelligent algorithms and automated unlocking techniques to quickly bypass these challenges, ensuring that AI agents can continue scraping data without interruptions due to CAPTCHAs. This makes &lt;a href="https://www.scrapeless.com/en/product/scraping-browser?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless Scraping Browser&lt;/a&gt; highly effective and stable when working with highly secure websites.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Highly personified interactive simulation
&lt;/h4&gt;

&lt;p&gt;To ensure that AI agents can browse a web like a real user, the Scrapeless Scraping Browser integrates multiple, anthropomorphic interaction simulation techniques.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic Fingerprint Obfuscation Technology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This technology allows Scraping Browser to simulate user behaviors such as mouse tracks, scrolling, clicking, etc. at the Chrome kernel level , thus avoiding being recognized as an automation tool by the target website. In this way, Scrapeless Scraping Browser makes requests from AI agents appear almost identical to the behavior of ordinary users, effectively bypassing common anti-crawling strategies.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Support dynamic rendering of JavaScript-heavy websites&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many modern websites rely on JavaScript to dynamically load content, which poses challenges to traditional crawlers. Scrapeless Scraping Browser can handle JavaScript-heavy websites, ensuring that AI agents can access all dynamically rendered content on the webpage, not just static HTML pages. This enables it to crawl more complex webpage data and meet the needs of modern internet.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Advanced anti-detection mechanism
&lt;/h4&gt;

&lt;p&gt;Scrapeless Scraping Browser uses various technologies to hide the crawling features of AI agents, avoiding recognition and blocking by target websites.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS Fingerprint Forgery Technology&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Through &lt;a href="https://www.scrapeless.com/en/blog/tls-fingerprinting?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;TLS fingerprint&lt;/a&gt; forgery, Scrapeless Scraping Browser can disguise itself as a normal browser access, avoiding the detection of crawler tools by target websites. TLS (Transport Layer Security Protocol) fingerprint forgery is an advanced security technology that simulates the unique identity of the browser during connection, increasing the anti-interference ability of anti-crawling technology.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real browser environment for anti-detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In order to avoid being recognized by crawlers, Scrapeless Scraping Browser makes the browser environment as close as possible to the behavior of real users, using a real browser environment to perform tasks. Unlike crawlers that use Computer Vision and image recognition, this method can effectively reduce the risk of recognition and interception, ensuring that requests from AI agents are not marked as malicious by the target website.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Real-time data statistics and session management
&lt;/h4&gt;

&lt;p&gt;Scrapeless Scraping Browser introduces real-time data statistics to ensure efficient and controllable Data Acquisition process. Users can track session status in real-time, view the progress of each browser session (such as running, success, failure), and intuitively grasp the status of task execution to ensure smooth data capture.&lt;/p&gt;

&lt;p&gt;In addition, &lt;a href="https://www.scrapeless.com/en/product?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless&lt;/a&gt; has enhanced session management capabilities, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat list and records: Users can easily view historical and current conversations, and easily manage and monitor all conversations.&lt;/li&gt;
&lt;li&gt;Session stop function: Through the dashboard, users can directly terminate running sessions without manual intervention, greatly improving operational efficiency and flexibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Start scraping smarter today! No more hassle with complex webpage parsing—use Scrapeless' scraping browser to make your AI agent tasks faster and more accurate. Log in now and begin your journey: &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Login Here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  PART 2.3 Use Cases of Scraping Browser
&lt;/h3&gt;

&lt;p&gt;To more clearly demonstrate the powerful capabilities of Scraping Browser, let's look at a few typical use cases and how AI agents enable more intelligent data scraping.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. E-Commerce Website Data Collection and Price Monitoring
&lt;/h4&gt;

&lt;p&gt;Use Case: Cross-border e-commerce companies need to regularly monitor product prices and stock information on competitor websites to optimize their own pricing strategies.&lt;/p&gt;

&lt;p&gt;Challenges: The target website employs strict anti-scraping mechanisms, including dynamic IP blocking, CAPTCHA detection, and JavaScript-rendered pages.&lt;/p&gt;

&lt;p&gt;Solution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic IP Rotation: Use Scraping Browser’s proxy pool functionality to regularly change IPs and avoid being blocked.&lt;/li&gt;
&lt;li&gt;Advanced Fingerprint Simulation: Implement dynamic fingerprint obfuscation to make the browsing behavior resemble that of a real user.&lt;/li&gt;
&lt;li&gt;Automatic JavaScript Parsing: Ensure the scraped pages include all dynamically rendered content.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Whether you're monitoring eCommerce prices or collecting real-time travel data, Scrapeless is the solution you need. &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Log in now&lt;/a&gt; and streamline your data scraping with advanced automation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  2. Travel Industry Information Scraping and Price Comparison Analysis
&lt;/h4&gt;

&lt;p&gt;Use Case: A travel booking platform wants to scrape real-time price information from multiple airline and hotel websites to provide the best booking recommendations.&lt;/p&gt;

&lt;p&gt;Challenges: Many travel websites use dynamic loading technologies and have strict anti-scraping measures, such as TLS fingerprint detection and CAPTCHA validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS Fingerprint Spoofing: Scraping Browser simulates TLS fingerprints from different devices and browsers, making requests appear to come from real users.&lt;/li&gt;
&lt;li&gt;Intelligent CAPTCHA Solving: Use Scraping Browser’s CAPTCHA solution to automatically handle CAPTCHAs during login and query processes.&lt;/li&gt;
&lt;li&gt;Parallel Scraping: Improve the speed of data collection through multithreading and distributed architecture.&lt;/li&gt;
&lt;li&gt;AI Agent Predictive Analysis: Combine AI Agent to predict price trends and provide users with more accurate booking recommendations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  PART3. Bonus Tip: Bypass Cloudflare using Scraping Browser and Puppeteer
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;We firmly protect the privacy of the website. All data in this blog is public and is only used as a demonstration of the crawling process. We do not save any information and data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Scrapeless requires puppeteer-core, a Puppeteer version that doesn't download the Chrome binary. So, ensure you install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm install puppeteer-core

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 1. Sign up for Scrapeless, click API Key Management &amp;gt; Create API Key to create your Scrapeless API Key.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowse" rel="noopener noreferrer"&gt;Sign up for Scrapeless&lt;/a&gt; and get a free trial. If you have any questions, you can also contact Liam via &lt;a href="https://discord.com/invite/xBcTfGPjCQ?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowse" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Step 2. Then, go to Scraping Browser and copy your Browser URL.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3xkbdqvcds3a6kmpoom.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3xkbdqvcds3a6kmpoom.png" alt="Bypass Cloudflare using Scraping Browser and Puppeteer" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Integrate the copied browser URL into your Puppeteer script like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const puppeteer = require('puppeteer-core');
const connectionURL = 'wss://browser.scrapeless.com/browser?token=&amp;lt;YOUR_Scrapeless_API_KEY&amp;gt;&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY';
(async () =&amp;gt; {// set up browser environmentconst browser = await puppeteer.connect({browserWSEndpoint: connectionURL,
    });
// create a new pageconst page = await browser.newPage();
// navigate to a URLawait page.goto('https://www.scrapingcourse.com/cloudflare-challenge', {waitUntil: 'networkidle0',
    });
// wait for the challenge to resolveawait new Promise(function (resolve) {setTimeout(resolve, 10000);
    });
//take page screenshotawait page.screenshot({ path: 'screenshot.png' });// close the browser instanceawait browser.close();
})();

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;You need to replace &lt;a href="https://www.scrapingcourse.com/cloudflare-challenge" rel="noopener noreferrer"&gt;https://www.scrapingcourse.com/cloudflare-challenge&lt;/a&gt; with any website with cloudflare-challenge; Also replace your Scrapeless API Key in the token part.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The above code accesses and screenshots the protected page. See the result below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyw171wkgn1yxbh1uk9i7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyw171wkgn1yxbh1uk9i7.png" alt="cf challenge bypass" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Congratulations 🎉! You've successfully bypassed Cloudflare using Puppeteer and Scrapeless.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Scrapeless's technology is the infrastructure that supports AI agents to efficiently perform online tasks. Whether developers or enterprises, Scrapeless's scraping browser provides a flexible and low-cost solution when building and optimizing AI agents, making it an ideal choice for improving work efficiency, reducing development costs, and accelerating technological progress.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Unlock the power of seamless web scraping! With Scrapeless' scraping browser, you can turn any website into structured data effortlessly, boosting your AI agent’s performance. Log in and start today: &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Login Here&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Faq
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. How does Scraping Browser bypass anti-scraping systems such as Cloudflare?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Scraping Browser combines dynamic IP proxy, automatic JavaScript parsing and browser fingerprint camouflage to bypass most anti-scraping mechanisms. Compared with traditional Puppeteer/Playwright solutions, it can simulate real user behavior and automatically adjust the request frequency through built-in strategies to increase the success rate. For specific methods, please refer to this article: &lt;a href="https://www.scrapeless.com/en/blog/cloudflare-challenge-bypass?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;How to Bypass Cloudflare Challenge&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. How to bypass CAPTCHA for web scraping?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can use Web Unlocker to efficiently bypass CAPTCHA protection and improve scraping success rates. For a detailed guide, check out: &lt;a href="https://www.scrapeless.com/en/blog/use-web-unlocker-to-bypass-captcha?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;How to Use Web Unlocker to Bypass CAPTCHA&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.  What is the best Scraping API？&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Answer: Scrapeless Scraping API supports scenarios such as Amazon, Shopee, Walmart, SHEIN, TikTok, Instagram, etc., specifically covering e-commerce, social media and other fields. It also covers SERP APIs for more than 20 Google search scenarios, including Google Flights, Google Maps and Google Trends. See: &lt;a href="https://app.scrapeless.com/dashboard/products/scraper?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless Scraping API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. How to scrape Google SERP data?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Google SERP data covers multiple scenarios, and Scrapeless API can cover over 20 scenarios of Google SERP. You can start scraping with just a simple registration. For more details, refer to: &lt;a href="https://www.scrapeless.com/en/blog/scrapeless-deep-serp-api?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=hotscrapingbrowser" rel="noopener noreferrer"&gt;Scrapeless Deepserp API &lt;/a&gt; .&lt;/p&gt;

</description>
      <category>scraper</category>
      <category>scrapingbrowser</category>
      <category>scrapeless</category>
    </item>
    <item>
      <title>How to Use Undetected ChromeDriver for Web Scraping</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Mon, 17 Mar 2025 09:14:49 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/how-to-use-undetected-chromedriver-for-web-scraping-3f2n</link>
      <guid>https://dev.to/datacollectionscraper/how-to-use-undetected-chromedriver-for-web-scraping-3f2n</guid>
      <description>&lt;p&gt;Discover how Undetected ChromeDriver helps &lt;a href="https://www.scrapeless.com/en/blog/how-to-avoid-anti-bot?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;bypass anti-bot systems for web scraping&lt;/a&gt;, along with step-by-step guidance, advanced methods, and key limitations. Plus, learn about Scrapeless - a more robust alternative for professional scraping needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this guide, you will learn:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What Undetected ChromeDriver is and how it can be useful&lt;/li&gt;
&lt;li&gt;How it minimizes bot detection&lt;/li&gt;
&lt;li&gt;How to use it with Python for web scraping&lt;/li&gt;
&lt;li&gt;Advanced usage and methods&lt;/li&gt;
&lt;li&gt;Its key limitations and drawbacks&lt;/li&gt;
&lt;li&gt;Recommended alternative: Scrapeless&lt;/li&gt;
&lt;li&gt;Technical analysis of anti-bot detection mechanisms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's dive in!&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Undetected ChromeDriver?
&lt;/h2&gt;




&lt;p&gt;Undetected ChromeDriver is a Python library that provides an optimized version of Selenium's ChromeDriver. This has been patched to limit detection by anti-bot services such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Imperva&lt;/li&gt;
&lt;li&gt;DataDome&lt;/li&gt;
&lt;li&gt;Distil Networks&lt;/li&gt;
&lt;li&gt;and more ...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It can also help &lt;a href="https://www.scrapeless.com/en/blog/cf-protected-website-bypass?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;bypass certain Cloudflare protections&lt;/a&gt;, although that can be more challenging.&lt;/p&gt;

&lt;p&gt;If you have ever used browser automation tools like Selenium, you know they let you control browsers programmatically. To make that possible, they configure browsers differently from regular user setups.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.scrapeless.com/en/blog/get-around-anti-bot?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;Anti-bot&lt;/a&gt; systems look for those differences, or "leaks," to identify automated browser bots. Undetected ChromeDriver patches Chrome drivers to minimize these telltale signs, reducing bot detection. This makes it ideal for web scraping sites protected by anti-scraping measures!&lt;/p&gt;

&lt;h2&gt;
  
  
  How does Undetected ChromeDriver work?
&lt;/h2&gt;




&lt;p&gt;Undetected ChromeDriver reduces detection from Cloudflare, Imperva, DataDome, and similar solutions by employing the following techniques:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Renaming Selenium variables to mimic those used by real browsers&lt;/li&gt;
&lt;li&gt;Using legitimate, real-world User-Agent strings to avoid detection&lt;/li&gt;
&lt;li&gt;Allowing the user to simulate natural human interaction&lt;/li&gt;
&lt;li&gt;Managing cookies and sessions properly while navigating websites&lt;/li&gt;
&lt;li&gt;Enabling the use of proxies to &lt;a href="https://www.scrapeless.com/en/blog/crawl-a-website-without-getting-blocked?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;bypass IP blocking&lt;/a&gt; and prevent rate limiting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These methods help the browser controlled by the library bypass various anti-scraping defenses effectively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Undetected ChromeDriver for Web Scraping: Step-By-Step Guide
&lt;/h2&gt;




&lt;h3&gt;
  
  
  Step #1: Prerequisites and Project Setup
&lt;/h3&gt;

&lt;p&gt;Undetected ChromeDriver has the following prerequisites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Latest version of Chrome&lt;/li&gt;
&lt;li&gt;Python 3.6+: If Python 3.6 or later is not installed on your machine, download it from the official site and follow the installation instructions.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The library automatically downloads and patches the driver binary for you, so there is no need to manually download ChromeDriver.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Create a directory for your project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir undetected-chromedriver-scraper
cd undetected-chromedriver-scraper
python -m venv env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Activate the virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# On Linux or macOS
source env/bin/activate

# On Windows
env\Scripts\activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step #2: Install Undetected ChromeDriver
&lt;/h3&gt;

&lt;p&gt;Install Undetected ChromeDriver via the pip package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install undetected_chromedriver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This library will automatically install Selenium, as it is one of its dependencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step #3: Initial Setup
&lt;/h3&gt;

&lt;p&gt;Create a scraper.py file and import undetected_chromedriver:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
import json

# Initialize a Chrome instance
driver = uc.Chrome()

# Connect to the target page
driver.get("https://scrapeless.com")

# Scraping logic...

# Close the browser
driver.quit()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step #4: Implement the Scraping Logic
&lt;/h3&gt;

&lt;p&gt;Now let's add the logic to extract data from the Apple page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
import json
import time

# Create a Chrome web driver instance
driver = uc.Chrome()

# Connect to the Apple website
driver.get("https://www.apple.com/fr/")

# Give the page some time to fully load
time.sleep(3)

# Dictionary to store product info
apple_products = {}

try:
    # Find product sections (using the classes from the provided HTML)
    product_sections = driver.find_elements(By.CSS_SELECTOR, ".homepage-section.collection-module .unit-wrapper")

    for i, section in enumerate(product_sections):
        try:
            # Extract product name (headline)
            headline = section.find_element(By.CSS_SELECTOR, ".headline, .logo-image").get_attribute("textContent").strip()

            # Extract description (subhead)
            subhead_element = section.find_element(By.CSS_SELECTOR, ".subhead")
            subhead = subhead_element.text

            # Get the link if available
            link = ""
            try:
                link_element = section.find_element(By.CSS_SELECTOR, ".unit-link")
                link = link_element.get_attribute("href")
            except:
                pass

            apple_products[f"product_{i+1}"] = {
                "name": headline,
                "description": subhead,
                "link": link
            }
        except Exception as e:
            print(f"Error processing section {i+1}: {e}")

    # Export the scraped data to JSON
    with open("apple_products.json", "w", encoding="utf-8") as json_file:
        json.dump(apple_products, json_file, indent=4, ensure_ascii=False)

    print(f"Successfully scraped {len(apple_products)} Apple products")

except Exception as e:
    print(f"Error during scraping: {e}")

finally:
    # Close the browser and release its resources
    driver.quit()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python scraper.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Undetected ChromeDriver: Advanced Usage
&lt;/h2&gt;

&lt;p&gt;Now that you know how the library works, you're ready to explore some more advanced scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose a Specific Chrome Version
&lt;/h3&gt;

&lt;p&gt;You can specify a particular version of Chrome for the library to use by setting the version_main argument:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import undetected_chromedriver as uc

# Specify the target version of Chrome
driver = uc.Chrome(version_main=105)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  With Syntax
&lt;/h3&gt;

&lt;p&gt;To avoid manually calling the quit() method when you no longer need the driver, you can use the with syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import undetected_chromedriver as uc

with uc.Chrome() as driver:
    driver.get("https://example.com")
    # Rest of your code...

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Limitations of Undetected ChromeDriver
&lt;/h2&gt;

&lt;p&gt;While undetected_chromedriver is a powerful Python library, it does have some known limitations:&lt;/p&gt;

&lt;h3&gt;
  
  
  IP Blocks
&lt;/h3&gt;

&lt;p&gt;The library does not hide your IP address. If you're running a script from a datacenter, chances are high that detection will still occur. Similarly, if your home IP has a poor reputation, you may also be blocked.&lt;/p&gt;

&lt;p&gt;To hide your IP, you need to integrate the controlled browser with a proxy server, as demonstrated earlier.&lt;/p&gt;

&lt;h3&gt;
  
  
  No Support for GUI Navigation
&lt;/h3&gt;

&lt;p&gt;Due to the inner workings of the module, you must browse programmatically using the get() method. Avoid using the browser GUI for manual navigation—interacting with the page using your keyboard or mouse increases the risk of detection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Limited Support for Headless Mode
&lt;/h3&gt;

&lt;p&gt;Officially, headless mode is not fully supported by the undetected_chromedriver library. However, you can experiment with it using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;driver = uc.Chrome(headless=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Stability Issues
&lt;/h3&gt;

&lt;p&gt;Results may vary due to numerous factors. No guarantees are provided, other than continuous efforts to understand and counter detection algorithms. A script that successfully bypasses anti-bot systems today might fail tomorrow if the protection methods receive updates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Alternative: Scrapeless
&lt;/h2&gt;




&lt;p&gt;Given the limitations of Undetected ChromeDriver, Scrapeless offers a more robust and reliable alternative for web scraping without getting blocked.&lt;/p&gt;

&lt;p&gt;We firmly protect the privacy of the website. All data in this blog is public and is only used as a demonstration of the crawling process. We do not save any information and data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Scrapeless is Superior
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.scrapeless.com/en/product/scraping-browser?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;Scrapeless&lt;/a&gt; is a remote browser service that addresses the inherent problems with the Undetected ChromeDriver approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Constant updates: Unlike Undetected ChromeDriver which may stop working after anti-bot system updates, Scrapeless is continuously updated by its team.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Built-in IP rotation: Scrapeless offers automatic IP rotation, eliminating the IP blocking issue of Undetected ChromeDriver.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Optimized configuration: Scrapeless browsers are already optimized to avoid detection, which greatly simplifies the process.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automatic CAPTCHA solving: Scrapeless can automatically solve CAPTCHAs you might encounter.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compatible with multiple frameworks: Works with Playwright, Puppeteer, and other automation tools.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;Sign in to Scrapeless&lt;/a&gt; for a free trial.&lt;/p&gt;

&lt;p&gt;Recommended reading: &lt;a href="https://www.scrapeless.com/en/blog/puppeteer-cloudflare-bypass?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;How to Bypass Cloudflare With Puppeteer&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How to use Scrapeless to scrape the web (without getting blocked)
&lt;/h2&gt;

&lt;p&gt;Here's how to implement a similar solution with Scrapeless using Playwright:&lt;/p&gt;

&lt;p&gt;Step 1: Register and &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;log in to Scrapeless&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Step 2: Get the Scrapeless API KEY&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52m0m1x7t6guw22cfw7p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52m0m1x7t6guw22cfw7p.png" alt="Get the Scrapeless API KEY" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Step 3: You can integrate the following code into your project&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const {chromium} = require('playwright-core');

// Scrapeless connection URL with your token
const connectionURL = 'wss://browser.scrapeless.com/browser?token=YOUR_TOKEN_HERE&amp;amp;session_ttl=180&amp;amp;proxy_country=ANY';

(async () =&amp;gt; {
  // Connect to the remote Scrapeless browser
  const browser = await chromium.connectOverCDP(connectionURL);

  try {
    // Create a new page
    const page = await browser.newPage();

    // Navigate to Apple's website
    console.log('Navigating to Apple website...');
    await page.goto('https://www.apple.com/fr/', {
      waitUntil: 'domcontentloaded',
      timeout: 60000
    });

    console.log('Page loaded successfully');

    // Wait for the product sections to be available
    await page.waitForSelector('.homepage-section.collection-module', { timeout: 10000 });

    // Get featured products from the homepage
    const products = await page.evaluate(() =&amp;gt; {
      const results = [];

      // Get all product sections
      const productSections = document.querySelectorAll('.homepage-section.collection-module .unit-wrapper');

      productSections.forEach((section, index) =&amp;gt; {
        try {
          // Get product name - could be in .headline or .logo-image
          const headlineEl = section.querySelector('.headline') || section.querySelector('.logo-image');
          const headline = headlineEl ? headlineEl.textContent.trim() : 'Unknown Product';

          // Get product description
          const subheadEl = section.querySelector('.subhead');
          const subhead = subheadEl ? subheadEl.textContent.trim() : '';

          // Get product link
          const linkEl = section.querySelector('.unit-link');
          const link = linkEl ? linkEl.getAttribute('href') : '';

          results.push({
            name: headline,
            description: subhead,
            link: link
          });
        } catch (err) {
          console.error(`Error processing section ${index}: ${err.message}`);
        }
      });

      return results;
    });

    // Display the results
    console.log('Found Apple products:');
    console.log(JSON.stringify(products, null, 2));
    console.log(`Total products found: ${products.length}`);

  } catch (error) {
    console.error('An error occurred:', error);
  } finally {
    // Close the browser
    await browser.close();
    console.log('Browser closed');
  }
})();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;You can also join the Scrapeless &lt;a href="https://discord.com/invite/PCEFG8bV?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; to participate in the developer support program and receive up to 500k SERP API usage credits for free.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Enhanced Technical Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bot Detection: How It Works
&lt;/h3&gt;

&lt;p&gt;Anti-bot systems use several techniques to detect automation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Browser fingerprinting: Collects dozens of browser properties (fonts, canvas, WebGL, etc.) to create a unique signature.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;WebDriver detection: Looks for the presence of the WebDriver API or its artifacts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Behavioral analysis: Analyzes mouse movements, clicks, typing speed that differ between humans and bots.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Navigation anomaly detection: Identifies suspicious patterns like too-fast requests or lack of image/CSS loading.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Recommended reading: &lt;a href="https://www.scrapeless.com/en/blog/how-to-avoid-anti-bot?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;How to Bypass Anti Bot&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  How Undetected ChromeDriver Bypasses Detection
&lt;/h3&gt;

&lt;p&gt;Undetected ChromeDriver circumvents these detections by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Removing WebDriver indicators: Eliminates the &lt;code&gt;navigator.webdriver&lt;/code&gt; property and other WebDriver traces.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Patching Cdc_: Modifies Chrome Driver Controller variables that are known signatures of ChromeDriver.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using realistic User-Agents: Replaces default User-Agents with up-to-date strings.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Minimizing configuration changes: Reduces changes to Chrome browser's default behavior.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Technical code showing how Undetected ChromeDriver patches the driver:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Simplified extract from Undetected ChromeDriver source code

def _patch_driver_executable():
    """
    Patches the ChromeDriver binary to remove telltale signs of automation
    """
    linect = 0
    replacement = os.urandom(32).hex()
    with io.open(self.executable_path, "r+b") as fh:
        for line in iter(lambda: fh.readline(), b""):
            if b"cdc_" in line.lower():
                fh.seek(-len(line), 1)
                newline = re.sub(
                    b"cdc_.{22}", b"cdc_" + replacement.encode(), line
                )
                fh.write(newline)
                linect += 1
    return linect
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Scrapeless is More Effective
&lt;/h3&gt;

&lt;p&gt;Scrapeless takes a different approach by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Pre-configured environment: Using browsers already optimized to mimic human users.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cloud-based infrastructure: Running browsers in the cloud with proper fingerprinting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Intelligent proxy rotation: Automatically rotating IPs based on the target site.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Advanced fingerprint management: Maintaining consistent browser fingerprints throughout the session.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;WebRTC, Canvas, and Plugin suppression: Blocking common fingerprinting techniques.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=chromedriver" rel="noopener noreferrer"&gt;Sign in to Scrapeless&lt;/a&gt; for a free trial.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, you've learned how to deal with bot detection in Selenium using Undetected ChromeDriver. This library provides a patched version of ChromeDriver for web scraping without getting blocked.&lt;/p&gt;

&lt;p&gt;The challenge is that advanced anti-bot technologies like Cloudflare will still be able to detect and block your scripts. Libraries like undetected_chromedriver are unstable—while they may work today, they might not work tomorrow.&lt;/p&gt;

&lt;p&gt;For professional scraping needs, cloud-based solutions like Scrapeless offer a more robust alternative. They provide pre-configured remote browsers specifically designed to bypass anti-bot measures, with additional features like IP rotation and CAPTCHA solving.&lt;/p&gt;

&lt;p&gt;The choice between Undetected ChromeDriver and Scrapeless depends on your specific needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Undetected ChromeDriver: Good for smaller projects, free and open-source, but requires more maintenance and can be less reliable.&lt;/li&gt;
&lt;li&gt;Scrapeless: Better for professional scraping needs, more reliable, constantly updated, but comes with a subscription cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By understanding how these anti-bot bypass technologies work, you can choose the right tool for your web scraping projects and avoid the common pitfalls of automated data collection.&lt;/p&gt;

</description>
      <category>chromedriver</category>
      <category>webdev</category>
      <category>javascript</category>
      <category>beginners</category>
    </item>
    <item>
      <title>How to Scrape Google Finance Ticker Quote Data in Python</title>
      <dc:creator>datacollection</dc:creator>
      <pubDate>Fri, 14 Mar 2025 10:30:50 +0000</pubDate>
      <link>https://dev.to/datacollectionscraper/how-to-scrape-google-finance-ticker-quote-data-in-python-2mc7</link>
      <guid>https://dev.to/datacollectionscraper/how-to-scrape-google-finance-ticker-quote-data-in-python-2mc7</guid>
      <description>&lt;p&gt;In the fast-paced world of finance, access to up-to-date and accurate stock market data is essential for investors, traders, and analysts. Google Finance is an invaluable resource that provides real-time stock quotes, historical financial data, news, and currency rates. Learning how to scrape this data using Python can be of great benefit to those looking to aggregate data, perform sentiment analysis, make market forecasts, or effectively manage risk.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Scrape Google Finance?
&lt;/h2&gt;

&lt;p&gt;Scraping Google Finance can be beneficial for various reasons, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-Time Stock Data – Access up-to-date stock prices, market trends, and historical performance.&lt;/li&gt;
&lt;li&gt;Automated Market Analysis – Collect financial data at scale for trend analysis, portfolio management, or algorithmic trading.&lt;/li&gt;
&lt;li&gt;Company Insights – Gather financial summaries, earnings reports, and stock performance for investment research.&lt;/li&gt;
&lt;li&gt;Competitor &amp;amp; Industry Research – Monitor competitors’ financial health and industry trends to make data-driven decisions.&lt;/li&gt;
&lt;li&gt;News &amp;amp; Sentiment Analysis – Extract news articles and updates related to specific stocks or industries for sentiment tracking.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What will be scraped
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fposts%2Fscrape-google-finance-python%2Ffc133805488c95fa6626fa416ae31826.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.scrapeless.com%2Fprod%2Fposts%2Fscrape-google-finance-python%2Ffc133805488c95fa6626fa416ae31826.png" alt="What will be scraped" width="800" height="1777"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Scrape Google Finance Ticker Quote Data in Python
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1. Configure the environment
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Python: &lt;a href="https://www.python.org/downloads/" rel="nofollow noopener noreferrer"&gt;&lt;strong&gt;The software&lt;/strong&gt;&lt;/a&gt;  is the core of running Python. You can download the version we need from the official website as shown below. However, it is not recommended to download the latest version. You can download 1.2 versions before the latest version.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Python IDE: Any IDE that supports Python will work, but we recommend PyCharm. It is a development tool specifically designed for Python. For the PyCharm version, we recommend the  &lt;a href="https://www.jetbrains.com/pycharm/download/" rel="nofollow noopener noreferrer"&gt;&lt;strong&gt;free PyCharm Community Edition&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsm9ffe2itzi25g3av6x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsm9ffe2itzi25g3av6x.png" alt="Python IDE" width="800" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: If you are a Windows user, do not forget to check the "Add python.exe to PATH" option during the installation wizard. This will allow Windows to use Python and commands in the terminal. Since Python 3.4 or later includes it by default, you do not need to install it manually.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now you can check if Python is installed by opening the terminal or command prompt and entering the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python --version

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2. Install Dependencies
&lt;/h3&gt;

&lt;p&gt;It is recommended to create a virtual environment to manage project dependencies and avoid conflicts with other Python projects. Navigate to the project directory in the terminal and execute the following command to create a virtual environment named google_lens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python -m venv google_finance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Activate the virtual environment based on your system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Windows:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;google_finance_env\Scripts\activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;MacOS/Linux:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;source google_finance_env/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After activating the virtual environment, install the required Python libraries for web scraping. The library for sending requests in Python is requests, and the main library for scraping data is BeautifulSoup4. Install them using the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install requests
pip install beautifulsoup4
pip install playwright
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3. Scrape Data
&lt;/h3&gt;

&lt;p&gt;To extract stock information from Google Finance, we first need to understand how to use the website's URL to scrape the desired stock. Let's take the Nasdaq index as an example, which contains multiple stocks that we can get information from. To access the symbol of each stock, we can use the Nasdaq stock filter from this link. Now let's target META as our target stock. With the index and stock in hand, we can build the first snippet of the script.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We firmly protect the privacy of the website. All data in this blog is public and is only used as a demonstration of the crawling process. We do not save any information and data.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
import requests
from bs4 import BeautifulSoup
BASE_URL = "https://www.google.com/finance"
INDEX = "NASDAQ"
SYMBOL = "META"
LANGUAGE = "en"
TARGET_URL = f"{BASE_URL}/quote/{SYMBOL}:{INDEX}?hl={LANGUAGE}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can use the Requests library to make an HTTP request on TARGET_URL and create a Beautiful Soup instance to scrape the HTML content.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;make an HTTP request
page = requests.get(TARGET_URL)# use an HTML parser to grab the content from "page"
soup = BeautifulSoup(page.content, "html.parser")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before we start crawling, we first need to process the HTML element (TARGET_URL) by inspecting the web page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wwlkagktns9pvah420j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wwlkagktns9pvah420j.png" alt="Stock Description" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Items describing stocks are represented by the class gyFHrc. Inside each such element, there is a class that represents the item's title (e.g. "Last Closing Price") and the corresponding value (e.g. $597.99). The title can be obtained from the mfs7Fc class, while the value comes from the P6K39c class.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk78apz2n9tw4dcto55l5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk78apz2n9tw4dcto55l5.png" alt="Stock title" width="800" height="513"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw7luz5mf4w7d8yctwyhf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw7luz5mf4w7d8yctwyhf.png" alt="Stock Value" width="800" height="513"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The complete list of items to be crawled is as follows:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Previous Close&lt;/li&gt;
&lt;li&gt;Day Range&lt;/li&gt;
&lt;li&gt;Year Range&lt;/li&gt;
&lt;li&gt;Market Cap&lt;/li&gt;
&lt;li&gt;AVG Volume&lt;/li&gt;
&lt;li&gt;P/E Ratio&lt;/li&gt;
&lt;li&gt;Dividend Yield&lt;/li&gt;
&lt;li&gt;Primary Exchange&lt;/li&gt;
&lt;li&gt;CEO&lt;/li&gt;
&lt;li&gt;Founded&lt;/li&gt;
&lt;li&gt;Website&lt;/li&gt;
&lt;li&gt;Employees&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now let's see how to fetch these items using Python code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# get the items that describe the stock
items = soup.find_all("div", {"class": "gyFHrc"})


# create a dictionary to store the stock description
stock_description = {}

# iterate over the items and append them to the dictionary
for item in items:
    item_description = item.find("div", {"class": "mfs7Fc"}).text
    item_value = item.find("div", {"class": "P6K39c"}).text
    stock_description[item_description] = item_value


print(stock_description)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is just an example of a simple script that can be integrated into a trading bot, application, or a simple dashboard to track your favorite stocks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Code
&lt;/h2&gt;

&lt;p&gt;There are many more data attributes you can grab from the page, but for now, the full code looks a little like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
from bs4 import BeautifulSoup
BASE_URL = "https://www.google.com/finance"
INDEX = "NASDAQ"
SYMBOL = "META"
LANGUAGE = "en"
TARGET_URL = f"{BASE_URL}/quote/{SYMBOL}:{INDEX}?hl={LANGUAGE}"# make an HTTP request
page = requests.get(TARGET_URL)# use an HTML parser to grab the content from "page"
soup = BeautifulSoup(page.content, "html.parser")# get the items that describe the stock
items = soup.find_all("div", {"class": "gyFHrc"})# create a dictionary to store the stock description
stock_description = {}# iterate over the items and append them to the dictionaryfor item in items:
for item in items:
    item_description = item.find("div", {"class": "mfs7Fc"}).text
    item_value = item.find("div", {"class": "P6K39c"}).text
    stock_description[item_description] = item_value
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following are some examples of the results：&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzr8tor7p6jkwpsamai5i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzr8tor7p6jkwpsamai5i.png" alt="some examples of the results" width="800" height="85"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations when scraping Google Finance
&lt;/h2&gt;

&lt;p&gt;Using the above method, you can create a small scraper, but if you are going to do large-scale scraping, this scraper will not continue to provide you with data. Google is very sensitive about data scraping and will eventually block your IP.&lt;/p&gt;

&lt;p&gt;Once your IP is blocked, you will not be able to scrape anything and your data pipeline will eventually break. So, how to overcome this problem? There is a very simple solution and that is to use the Google Finance Scraping API.&lt;/p&gt;

&lt;p&gt;Let's see how to scrape unlimited data from Google Finance using this API.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why use Scrapeless Google Finance Scraping API
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Data quality and accuracy
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-precision data:&lt;/strong&gt; &lt;a href="https://www.scrapeless.com/en/product/deep-serp-api" rel="noopener noreferrer"&gt;Scrapeless SerpApi&lt;/a&gt; always provides accurate, reliable and up-to-date Google Finance data, ensuring that users can obtain the most authentic and useful market information.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time updates:&lt;/strong&gt; Being able to obtain the latest data on Google Finance in real time, including real-time stock quotes, market trends, etc., is essential for users who need to make timely investment decisions.
### Multi-language and location support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-language support:&lt;/strong&gt; Supports multiple languages, and users can obtain financial data in different languages according to their needs to meet the needs of users in different regions around the world.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Location customization:&lt;/strong&gt; You can obtain customized search results based on specified geographic locations, device types and other parameters, which is very useful for analyzing market conditions in different regions or conducting localized market research.
### Performance and cost advantages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Super fast speed:&lt;/strong&gt; With an average response time of only &lt;u&gt;1-2 seconds&lt;/u&gt;, Scrapeless SerpApi is one of the fastest search crawling APIs on the market, which can quickly provide users with the required data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-effective:&lt;/strong&gt; Scrapeless SerpApi provides Google Search APIs at &lt;u&gt;only $0.1 per thousand queries&lt;/u&gt;. This pricing model is very cost-effective for large-scale data scraping projects.
Integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy integration:&lt;/strong&gt; Scrapeless SerpApi supports integration with a variety of popular programming languages (&lt;strong&gt;such as Python, Node.js, Golang, etc.&lt;/strong&gt;), and users can easily embed it into their own applications or analysis tools.
Stability and reliability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High availability:&lt;/strong&gt; Scrapeless SerpApi has high service availability and stability, which can ensure uninterrupted service to users during long-term and high-frequency data scraping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Professional support:&lt;/strong&gt; Scrapeless SerpApi provides professional technical support and customer service to help users solve problems encountered during use and ensure that users can smoothly obtain and use data.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How to Scrape Google Finance data with Scrapeless
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Sign up for Scrapeless and get an API key
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;If you don't have a Scrapeless account yet, visit the Scrapeless website and sign up. You can get &lt;u&gt; 20,000 free search queries&lt;/u&gt;.&lt;/li&gt;
&lt;li&gt;Once &lt;a href="https://app.scrapeless.com/passport/login?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=googlefinance" rel="noopener noreferrer"&gt;signed up&lt;/a&gt;, log in to your dashboard.&lt;/li&gt;
&lt;li&gt;In the dashboard, navigate to &lt;strong&gt;API Key Management&lt;/strong&gt; and click &lt;strong&gt;Create API Key&lt;/strong&gt;. Copy the generated API key, which will be your authentication credential when calling the &lt;a href="https://www.scrapeless.com/en/product/scraping-api?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=googlefinance" rel="noopener noreferrer"&gt;Scrapeless API&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxs2q30n017h6v0056ttg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxs2q30n017h6v0056ttg.png" alt="get an API key" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Access the Deep SerpApi Playground
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Then navigate to the "&lt;strong&gt;Deep SerpApi&lt;/strong&gt;" section.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pzlw4mh7wp2y9hmsf2u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pzlw4mh7wp2y9hmsf2u.png" alt="navigate to the " width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Set search parameters
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;In the Playground, enter your search keyword, such as "GOOGL:NASDAQ".&lt;/li&gt;
&lt;li&gt;Set other parameters, such as Query term, language, time etc.
&amp;gt; You can also click to view the &lt;a href="https://apidocs.scrapeless.com/doc-873763" rel="noopener noreferrer"&gt;official API documentation&lt;/a&gt; of Scrapeless to learn about the parameters of Google Finance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2htt255r1pbezvdt1vm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2htt255r1pbezvdt1vm.png" alt="Set search parameters" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Perform a search
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Click the "&lt;strong&gt;Start Search&lt;/strong&gt;" button, and the Playground will send a request to the Deep Serp API and return structured JSON data.
### Step 5: View and export data&lt;/li&gt;
&lt;li&gt;Browse the returned JSON data to view detailed information.&lt;/li&gt;
&lt;li&gt;If necessary, you can click "&lt;strong&gt;Copy&lt;/strong&gt;" in the upper right corner to export the data to CSV or JSON format for further analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Free developer support:&lt;/strong&gt;&lt;br&gt;
Integrate Scrapeless Deep SerpApi into your AI tool, application or project (we already support Dify, and will support Langchain, Langflow, FlowiseAI and other frameworks in the future).&lt;br&gt;
Share your integration results on social media and you will get 1 to 12 months of free developer support, up to 500K usage per month.&lt;br&gt;
Seize this opportunity to improve your project and enjoy more development support! You can also contact Liam via &lt;a href="https://discord.com/invite/xBcTfGPjCQ?utm_source=official&amp;amp;utm_medium=blog&amp;amp;utm_campaign=googlefinance" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; for more details.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How  to integrate the Scrapeless API
&lt;/h2&gt;

&lt;p&gt;Here is the sample code for scraping Google Finance results using the Scrapeless API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
import requests


class Payload:
    def __init__(self, actor, input_data):
        self.actor = actor
        self.input = input_data


def send_request():
    host = "api.scrapeless.com"
    url = f"https://{host}/api/v1/scraper/request"
    token = "your api key"

    headers = {
        "x-api-token": token
    }

    input_data = {
        "q": "GOOG:NASDAQ",
        "window": "MAX",
        .....
    }

    payload = Payload("scraper.google.finance", input_data)

    json_payload = json.dumps(payload.__dict__)

    response = requests.post(url, headers=headers, data=json_payload)

    if response.status_code != 200:
        print("Error:", response.status_code, response.text)
        return

    print("body", response.text)


if __name__ == "__main__":
    send_request()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Adjust the query parameters as needed to get more precise results. For more information on API parameters, you can check out the Scrapeless official API documentation&lt;br&gt;
You must replace YOUR-API-KEY with the API key you copied.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Additional Resources
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.scrapeless.com/en/blog/scrape-google-news" rel="noopener noreferrer"&gt;How to Scrape Google News with Python&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.scrapeless.com/en/blog/puppeteer-cloudflare-bypass" rel="noopener noreferrer"&gt;How to Bypass Cloudflare With Puppeteer&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.scrapeless.com/en/blog/scrape-google-lens" rel="noopener noreferrer"&gt;How to Scrape Google Lens Results with Scrapeless&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In conclusion, scraping Google Finance ticker quote data in Python is a powerful technique for accessing real-time financial information. By utilizing libraries like requests and BeautifulSoup, or more advanced tools like Selenium, you can efficiently extract and analyze market data to inform your investment decisions. Remember to respect website terms of service and consider using official APIs when available for sustainable data access.&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
      <category>api</category>
    </item>
  </channel>
</rss>
