<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Usama Jamil</title>
    <description>The latest articles on DEV Community by Usama Jamil (@usamaapify).</description>
    <link>https://dev.to/usamaapify</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1093645%2F122bcda1-605c-42df-a941-cc20c5fa6a8d.jpg</url>
      <title>DEV Community: Usama Jamil</title>
      <link>https://dev.to/usamaapify</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/usamaapify"/>
    <language>en</language>
    <item>
      <title>Web scraping with Python Requests</title>
      <dc:creator>Usama Jamil</dc:creator>
      <pubDate>Wed, 24 May 2023 16:15:56 +0000</pubDate>
      <link>https://dev.to/apify/web-scraping-with-python-requests-46j5</link>
      <guid>https://dev.to/apify/web-scraping-with-python-requests-46j5</guid>
      <description>&lt;p&gt;Python has become a language widely used by developers because it's easy, efficient, and flexible. Here we're going to learn how to scrape websites with Python. We will learn a few basic concepts, but our main focus will be web scraping in Python with the Requests library.&lt;/p&gt;

&lt;p&gt;In simple words, &lt;a href="https://blog.apify.com/what-is-web-scraping-and-web-scraping-tools/" rel="noopener noreferrer"&gt;web scraping&lt;/a&gt; means getting a website's content and extracting relevant data from that content. Almost all programming languages provide support to automate this process.&lt;/p&gt;

&lt;p&gt;Likewise, Python has also libraries to make this process easier and faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why is Python used for web scraping?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;There are many reasons &lt;a href="https://blog.apify.com/why-is-python-used-for-web-scraping/" rel="noopener noreferrer"&gt;why you should use Python for web scraping&lt;/a&gt;. Here are a few of them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Python has an enormous community that provides hundreds of articles and documentation for beginners. This makes things easier for beginners, and they can learn quickly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Python is renowned for its readability, making it easier to understand and write code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It has pre-defined functions, which means that we don't need to write functions. This saves time and makes the process faster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We don't have to define the data types: just define a variable and give it a value of any type.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It provides data processing and visualization tools, making data analysis and structuring easier.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/why-is-python-used-for-web-scraping/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.apify.com%2Fcontent%2Fimages%2Fsize%2Fw1200%2F2023%2F01%2FPython-web-scraping-pros-and-cons.png" height="450" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/why-is-python-used-for-web-scraping/" rel="noopener noreferrer" class="c-link"&gt;
          Why is Python used for web scraping? The pros and cons.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          What are the pros and cons of web scraping with Python?
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.apify.com%2Fcontent%2Fimages%2Fsize%2Fw256h256%2F2025%2F07%2Ffavicon.png" width="48" height="48"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  &lt;strong&gt;What are the best Python tools and libraries for web scraping?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Depending on your needs and use cases, Python has several tools and libraries for web scraping. Let's take a look at a few of them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://scrapy.org/?ref=blog.apify.com" rel="noopener noreferrer"&gt;&lt;strong&gt;Scrapy&lt;/strong&gt;&lt;/a&gt;: A web crawling framework that provides a complete set of tools for web scraping and helps to structure data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/?ref=blog.apify.com" rel="noopener noreferrer"&gt;&lt;strong&gt;BeautifulSoup&lt;/strong&gt;&lt;/a&gt;: Used for parsing HTML and XML documents. It creates a parsed tree for the web pages and allows us to extract data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.selenium.dev/?ref=blog.apify.com" rel="noopener noreferrer"&gt;&lt;strong&gt;Selenium&lt;/strong&gt;&lt;/a&gt;: A web scraping and automation tool that supports multiple browsers like Chrome, Firefox, and Edge.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://pyquery.readthedocs.io/en/latest/?ref=blog.apify.com" rel="noopener noreferrer"&gt;&lt;strong&gt;PyQuery&lt;/strong&gt;&lt;/a&gt;: A Python library that uses a jQuery-like syntax. It uses the ElementTree Python API, allowing us to manipulate and extract data from HTML documents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://newspaper.readthedocs.io/en/latest/?ref=blog.apify.com" rel="noopener noreferrer"&gt;&lt;strong&gt;Newspaper3k&lt;/strong&gt;&lt;/a&gt;: Explicitly designed to scrape news websites. It's built on top of the libraries like BeautifulSoup and lxml. It automatically detects and extracts news content, and it can handle pagination as well.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://requests.readthedocs.io/en/latest/?ref=blog.apify.com" rel="noopener noreferrer"&gt;&lt;strong&gt;Requests&lt;/strong&gt;&lt;/a&gt;: A Python library that allows us to make HTTP requests to get and manipulate web pages.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, our main focus will be the Requests library. So, let's start our journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Getting started with Python Requests&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The Python Requests library has made HTTP requests very simple because it's very lightweight and efficient, making it a great choice for web scraping.&lt;/p&gt;

&lt;p&gt;It's the most downloaded Python package, and there's a good reason for that. It does a really good job of taking on tasks that used to be complicated and confusing. That's why the tagline for the Requests library is &lt;strong&gt;HTTP for humans&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How to set up Python Requests?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's install the library so we can use it. Open up your terminal or command line and enter the following command to create a new directory and a Python file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;pythonRequests
&lt;span class="nb"&gt;cd &lt;/span&gt;pythonRequests
&lt;span class="nb"&gt;touch &lt;/span&gt;main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;We'll use &lt;code&gt;pip3&lt;/code&gt; to install Requests:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip3 &lt;span class="nb"&gt;install &lt;/span&gt;requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This should be a fairly quick download. Once the command is executed, you can confirm the installation by simply running the following command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip3 show requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This should print the name of the library with other information. Once we're done with the installation, we're ready to make our HTTP requests.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 You need to be careful while installing packages through &lt;code&gt;pip&lt;/code&gt;. If you're using &lt;strong&gt;Python 3&lt;/strong&gt; or above, you need to use &lt;code&gt;pip3&lt;/code&gt; with that. Otherwise, it will create issues for you.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How to send HTTP requests with Requests?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With the Requests library, you can not only get information from different websites but also send information to the pages, download images, and perform many other tasks. Let's see what this library offers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GET&lt;/strong&gt; : The GET method allows us to get information from a website.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;POST&lt;/strong&gt; : The POST method will enable us to send information back to the server, like submitting forms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PUT&lt;/strong&gt; : The PUT method is used to update the existing data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DELETE&lt;/strong&gt; : This is simply used to delete data from the server.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These methods work just like having a discussion with the server. In return for these methods, we also get a response from the server. This response comes with different codes called &lt;strong&gt;status codes&lt;/strong&gt;. For example, the server returns a &lt;strong&gt;200&lt;/strong&gt; when it has that particular requested thing or file; &lt;strong&gt;404&lt;/strong&gt; means it has nothing related to the request, and so on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftut5vsjoyo4wtw4lzlss.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftut5vsjoyo4wtw4lzlss.png" alt="How to send HTTP Requests" width="332" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now comes the point where you make HTTP requests. Simply create a Request object using the appropriate method to make a request. Let's say you simply want to get the data from a website; use the &lt;code&gt;GET&lt;/code&gt; method for this.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://www.lipsum.com/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends a GET request to the &lt;a href="https://www.lipsum.com/?ref=blog.apify.com" rel="noopener noreferrer"&gt;Lorem Ipsum&lt;/a&gt; website and prints the response status code and content.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 One thing that needs to be mentioned here is that the Requests library is a great tool for making HTTP requests, but it's not meant to parse the HTML pages and get information. If you want to do that, you need to use it with other libraries, like Beautiful Soup.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Handling HTTP responses with Requests&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Once you've got the response, you can apply different methods to it; a few of those methods are below:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;response.status_code&lt;/code&gt;: This method returns the status code of the response.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;response.content&lt;/code&gt;: The content method is used to get the content in bytes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;response.text&lt;/code&gt;: This is used to get the content in the form of Unicode text.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;response.json()&lt;/code&gt;: This method is useful when the response is in the JSON format. It returns data as a dictionary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;response.headers&lt;/code&gt;: This method gives us the headers having information like the type of data, length, and encoding; it's the meta-data - &lt;em&gt;information about the information&lt;/em&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 The &lt;code&gt;response.content&lt;/code&gt; and &lt;code&gt;response.text&lt;/code&gt; extract data in different formats. The &lt;code&gt;response.content&lt;/code&gt; is useful when dealing with images or pdf. The &lt;code&gt;response.text&lt;/code&gt; helps us to extract data in the form of HTML.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's make a request and see whether it's successful.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.wikipedia.org&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;It&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s a successful Get requested&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The server returned the Status code: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends a &lt;code&gt;GET&lt;/code&gt; request to the website. It returns " &lt;strong&gt;It's a successful Get requested&lt;/strong&gt;" if the response is successful. Otherwise, it prints " &lt;strong&gt;The server returned the Status code:&lt;/strong&gt;" with the status code.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;How the Requests library handles different types of data&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;While extracting data from websites, we may interact with different types of data. Requests specifically provides built-in methods to handle different data types. Here are a few of those with examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTML&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After making a request to a website, if the server returns the HTML of the webpage, this is how you can handle it:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.wikipedia.org&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;htmlResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;htmlResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The code sends the &lt;code&gt;GET&lt;/code&gt; request to the &lt;a href="https://www.wikipedia.org/?ref=blog.apify.com" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt; website and converts the response into HTML.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;JSON&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's say you're extracting a dataset in JSON: the Requests library provides a function that automatically converts the data to a Python dictionary.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://jsonplaceholder.typicode.com/posts/1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;jsonResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jsonResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The code sends the &lt;code&gt;GET&lt;/code&gt; request to the &lt;a href="https://jsonplaceholder.typicode.com/posts/1?ref=blog.apify.com" rel="noopener noreferrer"&gt;jsonplaceholder&lt;/a&gt; website and returns the data in the form of a Python dictionary.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Binary data&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Requests library provides support to handle binary data or the content in bytes, like images or pdf files. Let's say you want to download the logo of Google:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.google.com/images/branding/googlelogo/2x/googlelogo_light_color_272x92dp.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;googleLogo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logo.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;googleLogo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends a GET request to the provided link and returns the response in bytes. The &lt;code&gt;open()&lt;/code&gt; method is used here to write data in the &lt;code&gt;wb&lt;/code&gt; (Write Binary) mode that opens a file &lt;code&gt;logo.png&lt;/code&gt; and writes the data in that file.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 The &lt;code&gt;open()&lt;/code&gt; method opens the file provided as an argument. If the file is not present, it creates one with the same name and writes the data in it.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/web-scraping-with-beautiful-soup/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.apify.com%2Fcontent%2Fimages%2F2024%2F08%2Fweb_scraping_with_beautiful_soup_requests.png" height="450" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/web-scraping-with-beautiful-soup/" rel="noopener noreferrer" class="c-link"&gt;
          Web scraping in Python with Beautiful Soup &amp;amp; Requests
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          Detailed tutorial with code examples and some handy tricks.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.apify.com%2Fcontent%2Fimages%2Fsize%2Fw256h256%2F2025%2F07%2Ffavicon.png" width="48" height="48"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;How the HTML data is parsed and extracted&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;As we've already seen, the Requests library only allows us to get the data of a website, but it doesn't provide support to parse that data. So, we need another library for this. In this tutorial, we're going to get help from &lt;a href="https://blog.apify.com/web-scraping-with-beautiful-soup/" rel="noopener noreferrer"&gt;Beautiful Soup&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To install Beautiful Soup in your project, just enter the following command on the command line and press Enter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip3 &lt;span class="nb"&gt;install &lt;/span&gt;beautifulsoup4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now you're ready to parse and extract data from any website. Let's say you want all the blog titles from the &lt;a href="https://blog.apify.com/" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; website.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fga6ch0acmlu4240hq3e2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fga6ch0acmlu4240hq3e2.png" alt="How HTML data is parsed and extracted: Apify blog" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The code for it looks like this:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://blog.apify.com/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Create an object of BeautifulSoup by parsing the content
&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Find all the elements with the class "post-title" and "h2" heading
&lt;/span&gt;&lt;span class="n"&gt;postTitles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;h2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;post-title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Loop through the list of all the post_titles
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;postTitle&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;postTitles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;postTitle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends a &lt;code&gt;GET&lt;/code&gt; request to the &lt;a href="https://blog.apify.com/" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; website, parses the HTML content with Beautiful Soup, and finds all blog titles.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;What are the challenges of web scraping with Python Requests?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;With every tool comes functionalities and limitations. Python Requests has limitations as well. Let's go through a few of these.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;All the requests are &lt;strong&gt;synchronous&lt;/strong&gt; , which means every request will block the execution of the program. It can cause issues while making a large number of requests. To avoid this, you can use &lt;code&gt;asyncio&lt;/code&gt; and &lt;code&gt;gevent&lt;/code&gt; Python libraries. These libraries allow you to make &lt;strong&gt;asynchronous&lt;/strong&gt; requests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Python Requests library is only designed to make &lt;strong&gt;HTTP&lt;/strong&gt; or &lt;strong&gt;HTTPS&lt;/strong&gt; requests. It doesn't provide support for non-HTTP protocols. To make non-HTTP requests like &lt;strong&gt;FTP&lt;/strong&gt; or &lt;strong&gt;SSH&lt;/strong&gt; , you need to use other Python libraries like &lt;code&gt;ftplib&lt;/code&gt; or &lt;code&gt;paramiko&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It loads the entire response to the memory. This can cause problems when dealing with &lt;strong&gt;large-sized&lt;/strong&gt; files. To download files in chunks, you can pass the &lt;code&gt;stream&lt;/code&gt; parameter or use other libraries like &lt;code&gt;wget&lt;/code&gt; or &lt;code&gt;urllib3&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It doesn't handle &lt;strong&gt;retries&lt;/strong&gt; automatically. Try using the &lt;code&gt;requests-retry&lt;/code&gt; library, which provides a retry decorator for the Requests library.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Advanced web scraping techniques&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Python Requests provides some advanced features like handling cookies, authentication, session management, etc. They can be used to avoid blocking and improve scraping efficiency.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How to handle Cookies with Python Requests?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A cookie is a small piece of text sent from the website to the user. It is stored in the user's browser to remember information like products in the cart and login information. So, next time you add a product to the cart or sign in somewhere and accidentally close the browser window, you open the website and find the same state. That's done through cookies. They maintain the state of a website. They're also used for tracking users' behavior, such as which pages they visit and how long they stayed on the site.&lt;/p&gt;

&lt;p&gt;Through Python Requests, we have the control to access and add cookies. Let's see how we can access cookies:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="c1"&gt;# send the GET request
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://stackoverflow.com/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Get the cookies from the response
&lt;/span&gt;&lt;span class="n"&gt;cookies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cookies&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cookies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends a &lt;code&gt;GET&lt;/code&gt; request to the &lt;a href="https://stackoverflow.com/?ref=blog.apify.com" rel="noopener noreferrer"&gt;StackOverflow&lt;/a&gt; website and gets the cookies from it.&lt;/p&gt;

&lt;p&gt;Now, say we want to send cookies to the &lt;a href="http://httpbin.org/cookies?ref=blog.apify.com" rel="noopener noreferrer"&gt;httpbin&lt;/a&gt; website. The &lt;code&gt;cookies&lt;/code&gt; parameter in the request allows us to send cookies back to the server.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="c1"&gt;# Create a dictionary of cookies
&lt;/span&gt;&lt;span class="n"&gt;cookies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;exampleCookie&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;123456789&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://httpbin.org/cookies&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cookies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cookies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends a &lt;code&gt;Get&lt;/code&gt; a request and &lt;code&gt;cookies&lt;/code&gt; as a parameter. The server will get the cookies from the request and process them according to its own implementation.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How to authenticate using Python Requests?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To authenticate websites, Requests provides automatic authentication support. We just need to give the names of the fields with the correct credentials, and BOOM! It automatically finds the form, fills in the fields, and presses the sign-in button. It's so easy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6jh34jp1iqr4chwt6iog.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6jh34jp1iqr4chwt6iog.png" alt="How to authenticate using Python Requests: filling in a form" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's how it's done:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="c1"&gt;# Login credentials
&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yourEmail&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yourPassword&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://newsapi.org/login&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;homePage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Authentication Successful!&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Authentication failed!&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends a &lt;strong&gt;POST&lt;/strong&gt; request to the news api website with the authentication credentials. It stores the HTML of the homepage in the &lt;code&gt;homePage&lt;/code&gt; variable and prints " &lt;strong&gt;Authentication Successful!"&lt;/strong&gt; if the authentication is successful; otherwise, it prints " &lt;strong&gt;Authentication failed!"&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 You can still post your credentials by using the &lt;code&gt;GET&lt;/code&gt; request and including it in the query string. This is generally not recommended because the query string can be visible to third parties and may be cached by intermediaries such as proxies or caching servers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How does session management work in Requests?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A session can be used to store users' information on the server throughout their interaction with the website.&lt;/p&gt;

&lt;p&gt;Instead of storing the changing information of the user through cookies in the browser, the server gives the browser a unique id and stores some temporary variables. Every time that particular browser makes a request, a server receives the id and retrieves the variables.&lt;/p&gt;

&lt;p&gt;In the context of web scraping, sometimes we need to authenticate a website and navigate to different routes. Sessions are used to maintain the performance and stateful information between those pages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw1pnffy85zyrmdnj1wg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faw1pnffy85zyrmdnj1wg.png" alt="Session management with Python Requests" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's an example in which we'll first log in and then try to access the subscription page.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 To make this example executable and to get the desired output, you need to provide the same credentials you provided in the above example. If you dont have account, you can create one for free.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We'll also extract the &lt;strong&gt;current plan&lt;/strong&gt; from the subscription page.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;span class="c1"&gt;# create session object
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# Login credentials
&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yourEmail&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;yourPassword&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://newsapi.org/login&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Show an error if the status_code is not 200
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Login failed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;subscriptionResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://newsapi.org/account/manage-subscription&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subscriptionResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;subscriptionPlan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mb2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="c1"&gt;#If the subscriptionPlan is not found
&lt;/span&gt;  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;subscriptionPlan&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subscriptionPlan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to find plan element.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How to use proxies with Python Requests?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When trying to make multiple HTTP requests to a website through a script, the website may detect it and block us from making more requests. So, in order to avoid this, we can use different proxies. With multiple proxies, the website will think that the request is coming from different sources and won't limit the usage.&lt;/p&gt;

&lt;p&gt;Here's an example of using proxies with Requests:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# Different proxies
&lt;/span&gt;&lt;span class="n"&gt;proxies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://103.69.108.78:8191&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://198.199.86.11:8080&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;http://200.105.215.22:33630&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Make four GET requests
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://httpbin.org&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;proxies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Print **Request successful** if the status code is **200**
&lt;/span&gt;  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request successful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Print **Request failed** otherwise
&lt;/span&gt;  &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends four &lt;code&gt;GET&lt;/code&gt; requests to the &lt;a href="https://httpbin.org/?ref=blog.apify.com" rel="noopener noreferrer"&gt;HTTP&lt;/a&gt; website using multiple proxies and prints the message accordingly. You can get different proxies from this &lt;a href="https://free-proxy-list.net/?ref=blog.apify.com" rel="noopener noreferrer"&gt;Free Proxy List&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;User-agents&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Modern-day websites can easily detect the requests made through any script. To avoid detection, we can add user agents in the headers of our request to make it look like a real browser request. It's a very important technique that has been used to avoid blocking.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;User-Agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://apify.com/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request successful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Print Request failed otherwise
&lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends a &lt;code&gt;GET&lt;/code&gt; request to the &lt;a href="https://httpbin.org/?ref=blog.apify.com" rel="noopener noreferrer"&gt;HTTP&lt;/a&gt; website using the user-agents and prints a message according to the response. &lt;a href="https://generate-name.net/user-agent?ref=blog.apify.com" rel="noopener noreferrer"&gt;Generate User-agents Online&lt;/a&gt; allows you to generate user-agents as well.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Throttling and rate limiting&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Throttling and rate limiting are very important techniques that allow us to limit the number of requests and produce a delay between multiple requests. If we send requests rapidly, the website may detect our bot and block us. So, to avoid this thing, we can use these techniques:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="c1"&gt;# Import the time library to add delays between the requests
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="c1"&gt;# Make 10 requests
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://blog.apify.com/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Add a 1-second delay between requests
&lt;/span&gt;    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request successful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Print Request failed otherwise
&lt;/span&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code sends 10 &lt;code&gt;GET&lt;/code&gt; requests to &lt;a href="https://blog.apify.com/" rel="noopener noreferrer"&gt;Apify&lt;/a&gt; with a 1-second delay between each request.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;Testing and debugging web scraping with Requests&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The process of web scraping can lead to errors and technical issues. It's good to test the code before deployment to avoid potential errors, but every beginner may get stuck on a few common problems:&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;What are common errors in web scraping?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here are a few errors that can affect our scraping scripts.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;400 Bad Request&lt;/strong&gt; : This status code means that something is wrong on the user end that the server can't understand. For example, invalid request message framing, malformed request syntax, or deceptive request routing. We can confirm whether or not the URL or the headers are correct.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;401 Unauthorized&lt;/strong&gt; : This error occurs when we try to access a resource that requires some authentication. So, if we're using some credentials, we need to double-check them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;403 Forbidden&lt;/strong&gt; : This error occurs when the server denies access to the user or the script attempting to access the website. We'll see this error many times when scraping. We can use user-agents, request headers, or rotate proxies to avoid this error.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;404 Not Found&lt;/strong&gt; : This error occurs when we try to access a resource that is not available or may have been deleted. This error also occurs when the URL is incorrect.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;500 Internal Server Error&lt;/strong&gt; : This error occurs when the server encounters an unexpected condition that prevents it from fulfilling the request.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;501 Not Implemented&lt;/strong&gt; : This error indicates that the server does not support the functionality that we've asked for in the request.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's very helpful to have some knowledge about these common errors that may halt our scraping process. Why not begin with our article on &lt;a href="https://blog.apify.com/web-scraping-how-to-solve-403-errors/" rel="noopener noreferrer"&gt;how to solve 403 errors?&lt;/a&gt;&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/web-scraping-how-to-solve-403-errors/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.apify.com%2Fcontent%2Fimages%2F2024%2F06%2Fweb_scraping_how_to_solve_403_errors.png" height="450" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/web-scraping-how-to-solve-403-errors/" rel="noopener noreferrer" class="c-link"&gt;
          Web scraping: how to solve 403 errors
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          403 Forbidden error keeps reappearing? Try our workarounds.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.apify.com%2Fcontent%2Fimages%2Fsize%2Fw256h256%2F2025%2F07%2Ffavicon.png" width="48" height="48"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;You can also read about other Python web scraping libraries we mentioned here below.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Further reading&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you want to learn more about web scraping tools and libraries for Python, here are some useful resources that might help you:&lt;/p&gt;

&lt;p&gt;🔖 &lt;a href="https://blog.apify.com/web-scraping-with-scrapy/" rel="noopener noreferrer"&gt;&lt;strong&gt;Web scraping with Scrapy&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔖 &lt;a href="https://blog.apify.com/alternatives-scrapy-web-scraping/" rel="noopener noreferrer"&gt;&lt;strong&gt;Alternatives to Scrapy&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔖 &lt;a href="https://blog.apify.com/web-scraping-with-beautiful-soup/" rel="noopener noreferrer"&gt;&lt;strong&gt;Web scraping with Beautiful Soup&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔖 &lt;a href="https://blog.apify.com/beautiful-soup-vs-scrapy-web-scraping/" rel="noopener noreferrer"&gt;&lt;strong&gt;Beautiful Soup vs. Scrapy&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔖 &lt;a href="https://blog.apify.com/web-scraping-with-selenium-and-python/" rel="noopener noreferrer"&gt;&lt;strong&gt;Web scraping with Selenium&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔖 &lt;a href="https://blog.apify.com/how-to-scrape-the-web-with-playwright-ece1ced75f73/" rel="noopener noreferrer"&gt;&lt;strong&gt;Web scraping with Playwright&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>requests</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>Web scraping with Cheerio in 2023</title>
      <dc:creator>Usama Jamil</dc:creator>
      <pubDate>Fri, 21 Apr 2023 13:59:22 +0000</pubDate>
      <link>https://dev.to/apify/web-scraping-with-cheerio-in-2023-k6o</link>
      <guid>https://dev.to/apify/web-scraping-with-cheerio-in-2023-k6o</guid>
      <description>&lt;p&gt;If you're new to web scraping and need help figuring out where to start, this Cheerio web scraping tutorial is for you. We'll cover each step of web scraping from basics to advanced. Let's start our journey.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Getting started with Cheerio and web scraping&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Web scraping is basically an automated process of extracting data from different websites using a bot or software. Once fetched, the data is stored in a database like an Excel sheet, JSON, or XML.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Y5tSniua--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Web-scraping.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Y5tSniua--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Web-scraping.png" alt="Web scraping" width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's an outstanding method for researchers and business people who need to gather large amounts of data quickly and efficiently. For instance, an e-commerce website's owner can use web scraping to keep an eye on the pricing information of a competitor's website.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://apify.com/apify/cheerio-scraper?ref=blog.apify.com" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--Nva8sP_0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://images.apifyusercontent.com/DdDlYmU_I8ZdBAtODS2ArWssYjyOKOidJ9OS9OMCyBo/aHR0cHM6Ly9zMy5hbWF6b25hd3MuY29tL2FwaWZ5LXVwbG9hZHMtcHJvZC9vZy1pbWFnZXMvYWN0b3IvY2gyUWd0UmFkREJLUkNNQ3otWXJRdUVrb3drTkNMZGs0ajI" height="450" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://apify.com/apify/cheerio-scraper?ref=blog.apify.com" rel="noopener noreferrer" class="c-link"&gt;
          Cheerio Scraper - HTML scraping tool · Apify
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--WE9XeacI--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://apify.com/img/favicon.svg" width="800" height="800"&gt;
        apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  &lt;strong&gt;What is Cheerio?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now, what is Cheerio all about? Well, Cheerio is JavaScript technology used for web scraping in server-side implementations, and it's designed explicitly for Node.js. It's a lightweight library that allows you to crawl web pages and extract data using CSS-style selectors. Cheerio can load HTML as a string and returns an object to extract data using its built-in methods.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/web-scraping-with-axios-and-cheerio/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--4hSRsNJE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/01/nodejs_axios_cheerio.png" height="450" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/web-scraping-with-axios-and-cheerio/" rel="noopener noreferrer" class="c-link"&gt;
          Web scraping in Node.js with Axios and Cheerio
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          Using Axios and Cheerio in Node.js. With code examples.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--q_zdUqT4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/size/w256h256/2021/03/favicon-128x128.png" width="128" height="128"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;One of the best things about Cheerio is that it runs directly in Node.js without a browser, making it faster and more efficient than other web scraping tools. It was first released in 2010 by Matt Mueller and has gained significant popularity among developers due to its versatility and ease of use.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why use Cheerio?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's explore the reasons to use Cheerio:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Its jQuery-like syntax makes HTML manipulation and data extraction easier for developers familiar with jQuery.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It's super easy to set up, even for beginner developers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Its modularity allows us to extend it with Node modules and customize it according to our needs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The best thing about Cheerio is it doesn't require any browser for data extraction. Because it's designed explicitly for Node.js, we can use it on the server side without any worries.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's enough for an introduction to web scraping and Cheerio. Let's get our hands dirty and jump into the environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Prerequisites&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before diving into the code, there are a few prerequisites for learning Cheerio.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;An IDE installed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Basic knowledge of JavaScript and Node.js&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Basic knowledge of Devtools&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Setting up the Cheerio environment&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Now we need to set up our development environment. The good news is that it's super easy!&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/how-to-install-nodejs/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--geWLtXnQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/01/install-nodejs-best-way.jpg" height="491" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/how-to-install-nodejs/" rel="noopener noreferrer" class="c-link"&gt;
          How to install Node.js properly in 2023
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          Skip nvm and use fnm - here's how to do it right.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--q_zdUqT4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/size/w256h256/2021/03/favicon-128x128.png" width="128" height="128"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;First, we'll need to install Node.js on our local machine. For that, head to the &lt;a href="https://nodejs.org/en/download?ref=blog.apify.com"&gt;Node.js website&lt;/a&gt; and download the appropriate installer for your operating system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--znsGWsQV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Node.js-website.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--znsGWsQV--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Node.js-website.png" alt="Node.js website" width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Node.js website&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once Node.js is installed, we'll use a package manager called &lt;code&gt;npm&lt;/code&gt; to install and manage all the third-party libraries in Node.js. Luckily, &lt;code&gt;npm&lt;/code&gt; comes bundled with Node.js, so we don't need to install anything extra.&lt;/p&gt;

&lt;p&gt;We've set up the Node js and &lt;code&gt;npm&lt;/code&gt;. We can create a new Node js project using the command line interface. From there, we can add Cheerio js as a dependency and start using it to extract data and manipulate HTML easily.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;webscraper         
&lt;span class="nb"&gt;cd &lt;/span&gt;webscraper            
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;               
npm &lt;span class="nb"&gt;install &lt;/span&gt;cheerio

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;We can use simple commands to ensure we have Node.js and Cheerio installed correctly.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt; node &lt;span class="nt"&gt;-v&lt;/span&gt;

 npm list cheerio

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You can review the installation process and fix the issue if anything goes wrong, such as downloading the wrong Node.js installer, network connectivity, missing dependencies, etc.&lt;/p&gt;

&lt;p&gt;Next, we need to specify the module format. We will use &lt;strong&gt;ES modules&lt;/strong&gt; in this tutorial because it allows us to use modern JavaScript features, such as &lt;strong&gt;await&lt;/strong&gt; , that will be used in the later section of this tutorial. To specify the format, we'll go to the &lt;code&gt;package.json&lt;/code&gt; and add the following field:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;module&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let's do that:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2KZ6rOP9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Web-scraping-with-Cheerio-package.json-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2KZ6rOP9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Web-scraping-with-Cheerio-package.json-1.png" alt="module in package.json" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;package.json&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;We've completed our installations and made the environment ready. Now let's learn about the Cheerio API.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;The Cheerio API&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Cheerio API refers specifically to the set of methods and functions provided by the &lt;a href="https://apify.com/apify/cheerio-scraper?ref=blog.apify.com"&gt;Cheerio library&lt;/a&gt;. These methods help you to extract and modify HTML very quickly.&lt;/p&gt;

&lt;p&gt;The selector API in Cheerio can be accessed through the &lt;code&gt;$()&lt;/code&gt; method, which has the following structure:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$(selector, [context], [root])&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It takes three arguments: the first is compulsory, and the other two are optional.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;selector&lt;/code&gt; argument specifies the elements you want to select from the HTML document. It can be a string, DOM element, array of elements, or Cheerio objects.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The optional &lt;code&gt;context&lt;/code&gt; argument is actually the scope of where to begin looking for the target elements. It can be a string, DOM element, array of elements, or Cheerio objects. If no context is specified, it searches the entire document.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The optional &lt;code&gt;root&lt;/code&gt; argument is usually the markup string you want to traverse or manipulate. It can be used to specify a different root element for the selected elements.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the elements have been selected, we can use other functions to manipulate them. For example, use the &lt;code&gt;attr()&lt;/code&gt; function to get or set the value of an attribute or the &lt;code&gt;text()&lt;/code&gt; function to get or set the text content of an element.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;How to Scrape web pages with Cheerio&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Now it's time to dive into some practical examples of using Cheerio for web scraping and HTML parsing. So, for that, create an &lt;code&gt;index.js&lt;/code&gt; file in your directory or by using the command line.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;touch &lt;/span&gt;index.js

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;First, we'll need to load the HTML or XML document we want to parse using the &lt;code&gt;load&lt;/code&gt; function.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Load the HTML&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We can load the HTML using the &lt;code&gt;load&lt;/code&gt; function. This function takes a string containing the HTML as an argument and returns an object.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// import the library&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;//Load the HTML string.&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
  &amp;lt;body&amp;gt;
    &amp;lt;h1&amp;gt;Hello from Cheerio&amp;lt;/h1&amp;gt;
  &amp;lt;/body&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;📔 &lt;strong&gt;Note:&lt;/strong&gt; The resulting object is assigned to the variable &lt;code&gt;$&lt;/code&gt;, which is a common convention used to refer to jQuery objects in JavaScript.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After executing this line of code, we can manipulate the HTML by calling methods on the &lt;code&gt;$&lt;/code&gt; object provided by Cheerio.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Cheerio selectors&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Cheerio makes it easy to select elements using CSS-style selectors. It allows us to select elements based on &lt;code&gt;tag&lt;/code&gt;, &lt;code&gt;class&lt;/code&gt;, and &lt;code&gt;attribute&lt;/code&gt; values.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Select elements with a&lt;/strong&gt; &lt;code&gt;tag&lt;/code&gt; name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;tagName&lt;/code&gt; selector allows us to select elements with a specific tag name. For example, to select all &lt;code&gt;h3&lt;/code&gt; tags in the document, we can use the selector like this:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    &amp;lt;body&amp;gt;
       &amp;lt;h3&amp;gt;I'm learning Cheerio&amp;lt;/h3&amp;gt;
       &amp;lt;h3&amp;gt;It's super easy&amp;lt;/h3&amp;gt;
    &amp;lt;/body&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$divItems&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;h3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$divItems&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;//Output: I'm learning Cheerio It's super easy&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The selector &lt;code&gt;'h3'&lt;/code&gt; gets all the &lt;code&gt;&amp;lt;h3&amp;gt;&lt;/code&gt; elements from the document and returns a Cheerio object stored in the &lt;code&gt;$divItems&lt;/code&gt; object.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📔 &lt;strong&gt;Note:&lt;/strong&gt; The &lt;code&gt;.text()&lt;/code&gt; method in Cheerio is used to extract the text content of an HTML element. It's used on a Cheerio object representing a single element or a collection of elements.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Select elements with a&lt;/strong&gt; &lt;code&gt;class&lt;/code&gt; name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can also select elements with their class names using the &lt;code&gt;'.className'&lt;/code&gt; selector. Let's take an example to get all the elements with a &lt;code&gt;class&lt;/code&gt; name &lt;code&gt;classSelector&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;` 
    &amp;lt;body&amp;gt;
        &amp;lt;h3 class="classSelector"&amp;gt; Learning platforms: &amp;lt;/h3&amp;gt;
        &amp;lt;ul&amp;gt;
            &amp;lt;li class="classSelector"&amp;gt;Apify&amp;lt;/li&amp;gt;
            &amp;lt;li&amp;gt;Coursera&amp;lt;/li&amp;gt;
            &amp;lt;li class="classSelector"&amp;gt;Udemy&amp;lt;/li&amp;gt;
            &amp;lt;li&amp;gt;Freecodecamp&amp;lt;/li&amp;gt;
        &amp;lt;/ul&amp;gt;
    &amp;lt;/body&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$selection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.classSelector&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//Select the classSelector class&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$selection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;//Output: Learning platforms: Apify classSelector&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Select elements with an attribute value&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can select an element with its attribute using the &lt;code&gt;'[atrribute]'&lt;/code&gt; selector. The value of the attribute filters out attributes further.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    &amp;lt;body&amp;gt;
      &amp;lt;h3&amp;gt;Terms and Conditions: &amp;lt;/h3&amp;gt;
      &amp;lt;form&amp;gt; 
        &amp;lt;button name="Accept" type="submit"&amp;gt;Accept&amp;lt;/button&amp;gt;
        &amp;lt;button name="Reject" type="submit"&amp;gt;Reject&amp;lt;/button&amp;gt;      
      &amp;lt;/form&amp;gt;
    &amp;lt;/body&amp;gt;
  `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$selectedElements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[name]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//Selects both the buttons&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$selectedElements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;📔 &lt;strong&gt;Note:&lt;/strong&gt; To select the first element, we can specify the value of the attribute like &lt;code&gt;$('[name=Accept]')&lt;/code&gt;.&lt;br&gt;&lt;br&gt;
The attribute selectors can be used with any HTML attribute, not just &lt;code&gt;data-*&lt;/code&gt; attributes. For example, &lt;code&gt;$('img[alt="example"]')&lt;/code&gt; will select all &lt;code&gt;img&lt;/code&gt; elements with an &lt;code&gt;alt&lt;/code&gt; attribute having a value of &lt;code&gt;"example&lt;/code&gt;".&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Selecting elements with a combination of Cheerio selectors&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Combining Cheerio selectors is an effective way to select specific elements from an HTML. Here are a few examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Selecting an element with a particular tag and class:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;h3 .class &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Selecting all the elements that match a selector:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ul &amp;gt; li&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Selecting the next sibling element:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;p + ul&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Selecting all elements that match multiple selectors:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;h1, h2, h3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Selecting elements based on their position in the document:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;li:nth-child(odd)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;//This selects all odd-numbered li elements in the document.&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;These are just a few examples of how selectors can be combined to select elements that match multiple criteria. By using various combinations of selectors, we can select very specific elements within an HTML document.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Traversing the DOM&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Cheerio provides methods for navigating in any direction of the selected elements. For example, find the child or parent of any selected element. Let's discuss them one by one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;find&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;find&lt;/code&gt; method helps us to further filter out the selected elements based on any selector. It takes a selector as an argument and returns a new group of elements based on that.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
  &amp;lt;div class="parent"&amp;gt;
    &amp;lt;p&amp;gt;Hello, world!&amp;lt;/p&amp;gt;
    &amp;lt;span&amp;gt;How are you?&amp;lt;/span&amp;gt;
    &amp;lt;p class="day"&amp;gt;Have a nice day!&amp;lt;/p&amp;gt;
  &amp;lt;/div&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;welcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.parent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.day&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;welcome&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="c1"&gt;//Output : Have a nice day!&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;In the example above, first, we selected the element with the &lt;code&gt;parent&lt;/code&gt; class, and then from that element, we found an element with a class name &lt;code&gt;.day&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;children&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;children&lt;/code&gt; method allows us to select the children of any selected element. Let's take an example.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
  &amp;lt;div class="parent"&amp;gt;
    &amp;lt;p&amp;gt;Hello, world!&amp;lt;/p&amp;gt;
    &amp;lt;span&amp;gt;How are you?&amp;lt;/span&amp;gt;
    &amp;lt;p class="day"&amp;gt;Have a nice day!&amp;lt;/p&amp;gt;
    &amp;lt;div&amp;gt;
      &amp;lt;span class="day"&amp;gt;
          Granchildren of ".parent" class
        &amp;lt;/span&amp;gt;
      &amp;lt;/div&amp;gt;
  &amp;lt;/div&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;welcome&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.parent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;children&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;span&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;welcome&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="c1"&gt;//Output: How are you?&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;If there's no argument passed to the &lt;code&gt;children&lt;/code&gt; function, it will return all the child elements.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📔 &lt;strong&gt;Note&lt;/strong&gt; : &lt;code&gt;find&lt;/code&gt; searches for matching descendant elements at any level below the selected element, whereas &lt;code&gt;children&lt;/code&gt; only looks for direct child elements of the selected element.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3WH6y60J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Traversing-functions.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3WH6y60J--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Traversing-functions.png" alt="Traversing functions" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Traversing functions&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There are many other methods, but we'll not go into all the details. Let's just have a quick look at these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;contents&lt;/code&gt;: This method works just like &lt;code&gt;children&lt;/code&gt;; additionally, it also selects the comments from the HTML string.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;parent&lt;/code&gt;: It gives us the parent of the selected element.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;parents&lt;/code&gt;: It gives us all the parents of the element till the root element.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;parentsUntil&lt;/code&gt;: We can specify the limit of the parents using this method and how far we want to go upwards.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;closest&lt;/code&gt;: It allows us to select the nearest parent element of a specific type that matches a given Cheerio selector. For example, &lt;code&gt;$('.child').closest('div')&lt;/code&gt;;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;next&lt;/code&gt; and &lt;code&gt;prev&lt;/code&gt;: The &lt;code&gt;next&lt;/code&gt; method allows us to select the next element and the &lt;code&gt;prev&lt;/code&gt; method gives us the previous element.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are so many other methods provided by Cheerio. If you're interested in learning more, you can see the documentation &lt;a href="https://cheerio.js.org/docs/basics/traversing?ref=blog.apify.com"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How to loop over elements&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If we recall, JavaScript methods like &lt;code&gt;each&lt;/code&gt;, &lt;code&gt;map&lt;/code&gt;, etc., facilitate looping over elements to perform specific operations. For example, let's look at &lt;code&gt;each&lt;/code&gt; function and see how it works.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
  &amp;lt;body&amp;gt;
      &amp;lt;h3 class="classSelector"&amp;gt; Learning platforms: &amp;lt;/h3&amp;gt;
      &amp;lt;ul&amp;gt;
          &amp;lt;li class="classSelector"&amp;gt;Apify&amp;lt;/li&amp;gt;
          &amp;lt;li&amp;gt;Coursera&amp;lt;/li&amp;gt;
          &amp;lt;li class="classSelector"&amp;gt;Udemy&amp;lt;/li&amp;gt;
          &amp;lt;li&amp;gt;Freecodecamp&amp;lt;/li&amp;gt;
      &amp;lt;/ul&amp;gt;
  &amp;lt;/body&amp;gt;`&lt;/span&gt; &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;listItems&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;li&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;listItems&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;element&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
 &lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The &lt;code&gt;each&lt;/code&gt; method takes a callback function as an argument. It has two parameters: the first one is the &lt;code&gt;index&lt;/code&gt; starting from zero, and the second is the current element.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Selecting elements using regular expressions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To use regular expressions with Cheerio, we can use the &lt;code&gt;.match()&lt;/code&gt; function provided by the JavaScript. The following example finds all the email addresses that have &lt;code&gt;@&lt;/code&gt; in an HTML document:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    &amp;lt;body&amp;gt;
      &amp;lt;ul class="email"&amp;gt;
        &amp;lt;li&amp;gt;Apify@gmail.com&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;ondra@gmail.com&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;Chris&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;amanda@gmail.com&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;notAnEmail&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;notAnEmailAtAll&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;Queue&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;jamesbond@gmail.com&amp;lt;/li&amp;gt;
      &amp;lt;/ul&amp;gt;
    &amp;lt;/body&amp;gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;emails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userNames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.email li&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//Get the emails&lt;/span&gt;
&lt;span class="nx"&gt;userNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// Iterating over the emails&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;regex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/@/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Expression to be matched&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Match the text of each li item with the expression&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
    &lt;span class="nx"&gt;emails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;//Push, if the return value is not NULL&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;emails&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;📔 &lt;strong&gt;Note&lt;/strong&gt; : The .&lt;code&gt;match()&lt;/code&gt; function returns an object if it matches with an expression or returns &lt;code&gt;null&lt;/code&gt; if it doesn't. That's the reason we're using &lt;code&gt;email.input&lt;/code&gt; to get the text from the object.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Filtering elements&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With filtering, we can select only the specific elements we want and ignore the rest. Cheerio provides several methods for filtering elements within a selection, like &lt;code&gt;filter&lt;/code&gt;, &lt;code&gt;not&lt;/code&gt;, &lt;code&gt;has&lt;/code&gt;, etc. Let's go through them one by one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;filter&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;filter&lt;/code&gt; method in Cheerio works just like the &lt;code&gt;filter&lt;/code&gt; method of JavaScript. It lets us cherry-pick elements based on a specific selector. It's super handy when we're dealing with large amounts of data and want to narrow down the focus.&lt;/p&gt;

&lt;p&gt;Let's look at an example of filtering &lt;code&gt;li&lt;/code&gt; with a specific class.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
&amp;lt;ul class="birds"&amp;gt;
  &amp;lt;li class="parrot"&amp;gt;
    &amp;lt;span class="sharp"&amp;gt;Parrots&amp;lt;/span&amp;gt; are beautiful birds with &amp;lt;span class="beaks"&amp;gt;sharp beaks&amp;lt;/span&amp;gt;
  &amp;lt;/li&amp;gt;

  &amp;lt;li class="sharp"&amp;gt;
    They are superfast.
  &amp;lt;/li&amp;gt;

  &amp;lt;li class="crow"&amp;gt;
    Crows are smart
  &amp;lt;/li&amp;gt;

&amp;lt;/ul&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;$selectedElements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;li .parrot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Get a 'li' with 'parrot' class&lt;/span&gt;
&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;$parrot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;span&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.sharp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//. Filter the span elements with a className 'sharp' &lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$parrot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;&lt;code&gt;not&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;not&lt;/code&gt; method is opposite to the &lt;code&gt;filter&lt;/code&gt; method. With this clever tool, we can easily exclude elements we don't want and focus on the important ones. It can save us much time and effort, especially when dealing with extensive data.&lt;/p&gt;

&lt;p&gt;Let's take the previous example, but we're not going to select the ones with the &lt;code&gt;sharp&lt;/code&gt; class this time.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
&amp;lt;ul class="birds"&amp;gt;
  &amp;lt;li class="parrot"&amp;gt;
    &amp;lt;span class="sharp"&amp;gt;Parrots&amp;lt;/span&amp;gt; are beautiful birds with &amp;lt;span class="beaks"&amp;gt;sharp beaks&amp;lt;/span&amp;gt;
  &amp;lt;/li&amp;gt;

  &amp;lt;li class="sharp"&amp;gt;
    They are superfast.
  &amp;lt;/li&amp;gt;

  &amp;lt;li class="crow"&amp;gt;
    Crows are smart
  &amp;lt;/li&amp;gt;

&amp;lt;/ul&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;$selectedElements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;li .parrot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// Get a 'li' with 'parrot' class&lt;/span&gt;
&lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;$parrot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;span&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;not&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.sharp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Exclude elements with a className 'sharp'&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$parrot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;// Output: sharp beaks&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;&lt;code&gt;has&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While web scraping with Cheerio, you might want to find elements that contain a specific child element, like a &lt;code&gt;span&lt;/code&gt; or an &lt;code&gt;image&lt;/code&gt;. The &lt;code&gt;has&lt;/code&gt; method works the same way. It takes a Cheerio selector as an argument and returns a child element that matches the selector.&lt;/p&gt;

&lt;p&gt;For example, we're searching for products that are on a discount.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;script&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
&amp;lt;div class="product"&amp;gt;
  &amp;lt;h2 class="name"&amp;gt;Product 1&amp;lt;/h2&amp;gt;
  &amp;lt;span class="price"&amp;gt;$10&amp;lt;/span&amp;gt;
  &amp;lt;span class="discount"&amp;gt;20% off&amp;lt;/span&amp;gt;
&amp;lt;/div&amp;gt;
&amp;lt;div class="product"&amp;gt;
  &amp;lt;h2 class="name"&amp;gt;Product 2&amp;lt;/h2&amp;gt;
  &amp;lt;span class="price"&amp;gt;$20&amp;lt;/span&amp;gt;
&amp;lt;/div&amp;gt;
&amp;lt;div class="product"&amp;gt;
  &amp;lt;h2 class="name"&amp;gt;Product 3&amp;lt;/h2&amp;gt;
  &amp;lt;span class="price"&amp;gt;$30&amp;lt;/span&amp;gt;
  &amp;lt;span class="discount"&amp;gt;10% off&amp;lt;/span&amp;gt;
&amp;lt;/div&amp;gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;script&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;discountedProducts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="c1"&gt;// search for a parent element that has a child with a &lt;/span&gt;
&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.product&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.discount&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// class name discount&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;discountedProducts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="c1"&gt;//Output : Product 1&lt;/span&gt;
&lt;span class="c1"&gt;// $10&lt;/span&gt;
&lt;span class="c1"&gt;// 20% off&lt;/span&gt;

&lt;span class="c1"&gt;// Product 3&lt;/span&gt;
&lt;span class="c1"&gt;// $30&lt;/span&gt;
&lt;span class="c1"&gt;// 10% off&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is handy when web scraping eCommerce websites and looking for discounted products. It allows you to filter out the products that have a discount quickly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;eq&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;eq&lt;/code&gt; works just like an array indexing from the selected elements. It allows us to select an element from a specific index.&lt;/p&gt;

&lt;p&gt;Let's take an example where we select the second element from &lt;code&gt;li&lt;/code&gt; elements.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    &amp;lt;ul&amp;gt;
      &amp;lt;li&amp;gt;Item 1&amp;lt;/li&amp;gt;
      &amp;lt;li&amp;gt;Item 2&amp;lt;/li&amp;gt;
    &amp;lt;/ul&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;secondItem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;li&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;secondItem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;//Output: Item 2&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;📔 &lt;strong&gt;Note:&lt;/strong&gt; It doesn't throw an error if we provide the index that's out of bounds; rather it simply returns nothing. For example, if we provide &lt;code&gt;4&lt;/code&gt; to the &lt;code&gt;.eq&lt;/code&gt; function, it will return nothing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;first&lt;/code&gt; and &lt;code&gt;last&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The methods &lt;code&gt;first&lt;/code&gt; and &lt;code&gt;last&lt;/code&gt; work the same way as accessing elements from the array. The &lt;code&gt;first&lt;/code&gt; method returns the first element from the selection, and the &lt;code&gt;last&lt;/code&gt; method returns the last element.&lt;/p&gt;

&lt;p&gt;Let's take an example where we'll select the last and first element of the &lt;code&gt;ol&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`&amp;lt;ol&amp;gt;
      &amp;lt;li&amp;gt;Item 1&amp;lt;/li&amp;gt;
      &amp;lt;li&amp;gt;Item 2&amp;lt;/li&amp;gt;
      &amp;lt;li&amp;gt;Item 3&amp;lt;/li&amp;gt;
    &amp;lt;/ol&amp;gt;`&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;firstItem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;li&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;first&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lastItem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;li&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;last&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;firstItem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;lastItem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="c1"&gt;//Output: Item 1 Item 3&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What would happen if the selection contained just one element? The first and last elements would be the same.&lt;/p&gt;

&lt;p&gt;So far, we've learned to extract elements from the HTML document, but what if we want the retrieved data to be structured and well formatted? Let's find out how to do that.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How to extract data from HTML tags using Cheerio&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We've learned to extract data from HTML documents in a simple way. However, what if the data you need to extract is large and needs to be structured? That's where Cheerio's objects come in. They allow us to extract data from the HTML in a structured way.&lt;/p&gt;

&lt;p&gt;You can specify what data you want to extract by passing the &lt;strong&gt;keys&lt;/strong&gt; and &lt;strong&gt;values&lt;/strong&gt; to the object. The &lt;strong&gt;keys&lt;/strong&gt; in the &lt;code&gt;map&lt;/code&gt; object represent the names of the properties you want to create, while the &lt;strong&gt;values&lt;/strong&gt; are the Cheerio selectors you'll use to extract the data.&lt;/p&gt;

&lt;p&gt;Here's an example:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    &amp;lt;div class="book-info"&amp;gt;
        &amp;lt;h2 class="book-title"&amp;gt;The Great Gatsby&amp;lt;/h2&amp;gt;
        &amp;lt;p class="author"&amp;gt;By F. Scott Fitzgerald&amp;lt;/p&amp;gt;
        &amp;lt;p class="release-date"&amp;gt;Released: April 10, 1925&amp;lt;/p&amp;gt;
        &amp;lt;p class="price"&amp;gt;Price: $12.99&amp;lt;/p&amp;gt;
    &amp;lt;/div&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;structuredData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;Book_Title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.book-title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;Author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.author&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;Release_Data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.release-date&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;Price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.price&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;structuredData&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;We're creating a JavaScript object &lt;code&gt;structuredData&lt;/code&gt; with properties &lt;code&gt;Book_Title&lt;/code&gt; , &lt;code&gt;Author&lt;/code&gt; , &lt;code&gt;Release_Data&lt;/code&gt; , and &lt;code&gt;Price&lt;/code&gt; . We're using the class selectors to select all the elements. The output will be an object with the extracted data:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;Book_Title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The Great Gatsby&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;Author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;By F. Scott Fitzgerald&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;Release_Data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Released: April 10, 1925&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;Price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Price: $12.99&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;What if we have multiple books and want to get all of them from the document? Let's look into that.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    &amp;lt;div class="book-info"&amp;gt;
        &amp;lt;h2 class="book-title"&amp;gt;The Great Gatsby&amp;lt;/h2&amp;gt;
        &amp;lt;p class="author"&amp;gt;By F. Scott Fitzgerald&amp;lt;/p&amp;gt;
        &amp;lt;p class="release-date"&amp;gt;Released: April 10, 1925&amp;lt;/p&amp;gt;
        &amp;lt;p class="price"&amp;gt;Price: $12.99&amp;lt;/p&amp;gt;
    &amp;lt;/div&amp;gt;
    &amp;lt;div class="book-info"&amp;gt;
        &amp;lt;h2 class="book-title"&amp;gt;To Kill a Mockingbird&amp;lt;/h2&amp;gt;
        &amp;lt;p class="author"&amp;gt;By Harper Lee&amp;lt;/p&amp;gt;
        &amp;lt;p class="release-date"&amp;gt;Released: July 11, 1960&amp;lt;/p&amp;gt;
        &amp;lt;p class="price"&amp;gt;Price: $10.99&amp;lt;/p&amp;gt;
    &amp;lt;/div&amp;gt;
    &amp;lt;div class="book-info"&amp;gt;
        &amp;lt;h2 class="book-title"&amp;gt;1984&amp;lt;/h2&amp;gt;
        &amp;lt;p class="author"&amp;gt;By George Orwell&amp;lt;/p&amp;gt;
        &amp;lt;p class="release-date"&amp;gt;Released: June 8, 1949&amp;lt;/p&amp;gt;
        &amp;lt;p class="price"&amp;gt;Price: $9.99&amp;lt;/p&amp;gt;
    &amp;lt;/div&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;books&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;books1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.book-info&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;books1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;structuredData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;Book_Title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.book-title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="na"&gt;Author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.author&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="na"&gt;Release_Data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.release-date&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="na"&gt;Price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.price&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nx"&gt;books&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;structuredData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt; 
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;books&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The output of the code is an array of objects:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="p"&gt;[&lt;/span&gt;
   &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;Book_Title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The Great Gatsby&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;By F. Scott Fitzgerald&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Release_Data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Released: April 10, 1925&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Price: $12.99&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;Book_Title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;To Kill a Mockingbird&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;By Harper Lee&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Release_Data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Released: July 11, 1960&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Price: $10.99&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;Book_Title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1984&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;By George Orwell&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Release_Data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Released: June 8, 1949&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Price: $9.99&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code uses &lt;code&gt;each&lt;/code&gt; method to iterate over a collection of selected &lt;code&gt;.book-info&lt;/code&gt; elements. For each element, the code creates an object &lt;code&gt;structuredData&lt;/code&gt; that contains the extracted data. The data is extracted using the &lt;code&gt;find&lt;/code&gt; method to search for child elements with the suitable classes.&lt;/p&gt;

&lt;p&gt;Finally, the &lt;code&gt;structuredData&lt;/code&gt; object is pushed to the books array using the &lt;code&gt;push&lt;/code&gt; method. This result is an array of objects, each representing a book and containing the extracted data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8nVOUpP3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Extracted-data.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8nVOUpP3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Extracted-data.png" alt="Extracted data: JSON, CSV, XML" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How to write extracted data in a file&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;What if we want to store the scraped data in our local storage? We'll use the example above to store the data in a JSON file. We can use the built-in &lt;code&gt;fs&lt;/code&gt; module in Node. It allows us to interact with the file system on a computer. Here's how you can modify the code to write the &lt;code&gt;books&lt;/code&gt; array to a JSON file:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;fs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Import the "fs" module&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    &amp;lt;div class="book-info"&amp;gt;
        &amp;lt;h2 class="book-title"&amp;gt;The Great Gatsby&amp;lt;/h2&amp;gt;
        &amp;lt;p class="author"&amp;gt;By F. Scott Fitzgerald&amp;lt;/p&amp;gt;
        &amp;lt;p class="release-date"&amp;gt;Released: April 10, 1925&amp;lt;/p&amp;gt;
        &amp;lt;p class="price"&amp;gt;Price: $12.99&amp;lt;/p&amp;gt;
    &amp;lt;/div&amp;gt;
    &amp;lt;div class="book-info"&amp;gt;
        &amp;lt;h2 class="book-title"&amp;gt;To Kill a Mockingbird&amp;lt;/h2&amp;gt;
        &amp;lt;p class="author"&amp;gt;By Harper Lee&amp;lt;/p&amp;gt;
        &amp;lt;p class="release-date"&amp;gt;Released: July 11, 1960&amp;lt;/p&amp;gt;
        &amp;lt;p class="price"&amp;gt;Price: $10.99&amp;lt;/p&amp;gt;
    &amp;lt;/div&amp;gt;
    &amp;lt;div class="book-info"&amp;gt;
        &amp;lt;h2 class="book-title"&amp;gt;1984&amp;lt;/h2&amp;gt;
        &amp;lt;p class="author"&amp;gt;By George Orwell&amp;lt;/p&amp;gt;
        &amp;lt;p class="release-date"&amp;gt;Released: June 8, 1949&amp;lt;/p&amp;gt;
        &amp;lt;p class="price"&amp;gt;Price: $9.99&amp;lt;/p&amp;gt;
    &amp;lt;/div&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;books&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;books1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.book-info&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;books1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;structuredData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;Book_Title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.book-title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="na"&gt;Author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.author&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="na"&gt;Release_Data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.release-date&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="na"&gt;Price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;book&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.price&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nx"&gt;books&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;structuredData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;//push the object to the array&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jsonData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;books&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//Convert the array to JSON format&lt;/span&gt;

&lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;writeFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;books.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;jsonData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;//Using the fs.writeFile , it's used to write data in a file&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Data written to file&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//Display "Data written to file" in the call back function.&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;📔 &lt;strong&gt;Note&lt;/strong&gt; : The fs.writeFile takes three arguments:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;books.json&lt;/strong&gt; : The file in which we will write data. If the file is already there, it will just write data in it; if it's not, it will first create the file and then write data in it.&lt;br&gt;&lt;br&gt;
&lt;code&gt;jsonData&lt;/code&gt; : Data to be written in the file.&lt;br&gt;&lt;br&gt;
&lt;code&gt;()=&amp;gt;{}&lt;/code&gt; : A callback function that will be called after the data has been written to the file.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How to handle errors and exceptions with Cheerio&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;As students of web scraping with Cheerio, we may encounter errors and exceptions as we work with the code. Don't worry: this is a common experience for most beginners. We're exploring how to handle these errors and exceptions so that you can become a more confident and successful web scraper developer.&lt;/p&gt;

&lt;p&gt;To handle errors, we can use &lt;code&gt;try-catch&lt;/code&gt; blocks in our code. A &lt;code&gt;try&lt;/code&gt; block allows us to try a block of code when we are not sure if the code will execute or not and &lt;code&gt;catch&lt;/code&gt; any errors that might occur. If an error occurs, the &lt;code&gt;catch&lt;/code&gt; block will execute, allowing us to handle the error appropriately.&lt;/p&gt;

&lt;p&gt;Here's an example of how to use try-catch blocks with Cheerio:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This code uses a &lt;code&gt;try-catch&lt;/code&gt; block to handle errors that might occur while loading an HTML document. Overall, this code ensures that the program does not crash if an error occurs during the loading of the HTML document and provides a way to handle the error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Congratulations&lt;/strong&gt; on completing the basics of web scraping! It's been an exciting journey so far, and now we're ready to take the next step and dive into extracting the HTML of a website. But that's just the tip of the iceberg - there's so much more to learn. So let's continue this journey together and see how we can interact with actual websites.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How to use Axios with Cheerio&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Have you ever wondered how we get the HTML document of a web page? Axios allows us to make HTTP requests and get the HTML document of a website.&lt;/p&gt;

&lt;p&gt;To use Axios in our project, we need to install it in our environment by entering the following command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;npm&lt;/span&gt; &lt;span class="nx"&gt;install&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;If you're interested in learning more about Axios with Cheerio, go to this &lt;a href="https://blog.apify.com/web-scraping-with-axios-and-cheerio/"&gt;blog post&lt;/a&gt; and learn more.&lt;/p&gt;

&lt;p&gt;Let's use the Axios library in the following example.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Scraping multiple pages using Cheerio&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It's worth noting that there's a difference between pagination and scraping multiple pages. Pagination refers to dividing content into separate pages, often with navigation links to move between them. On the other hand, scraping multiple pages involves extracting data from many pages of a website, whether they're paginated or not.&lt;/p&gt;

&lt;p&gt;Why does this matter? When scraping paginated content, we can use the navigation links to move between pages and extract data from each page. However, when scraping multiple pages that are not paginated, we'll require a different approach to identify and extract the data we need.&lt;/p&gt;

&lt;p&gt;Let's see how we can scrape multiple pages with Cheerio.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--f-_eXMeD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Web-scraping-with-Cheerio-IMDb-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--f-_eXMeD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Web-scraping-with-Cheerio-IMDb-1.png" alt="Scraping multiple pages with Cheerio: IMDb" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we can see in the link that the website is using pagination, and each Actor is at a different link. We can use Cheerio to extract different Actors from different pages using a loop. Let's do this.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;axios&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Actors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="c1"&gt;//Declare an empty actors array to store actors' data.&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// It means we will change the last digit of the url to get the data of another actor&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`https://www.imdb.com/name/nm000015&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;//Change the last digit of the data&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="c1"&gt;// Add headers to axios request&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;h1 .sc-afe43def-1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Get the name of the actor&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Date_of_Birth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.sc-dec7a8b-2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;// Get the Date of Birth&lt;/span&gt;
    &lt;span class="nx"&gt;Actors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Date_of_Birth&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Actors&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;But when we try to access the website to fetch data, we get a &lt;strong&gt;403 error&lt;/strong&gt; because the server has detected that the request is coming from an automated script. It's a very common &lt;a href="https://blog.apify.com/bypass-antiscraping-protections/"&gt;anti-scraping&lt;/a&gt; technique used by most modern websites. So, we need to find a way to tackle this problem. One way is to use headers in our Axios request to act like an actual browser request. In this example, we'll use a &lt;strong&gt;user agent&lt;/strong&gt; that indicates that the request is made from an actual browser rather than an automated script.&lt;/p&gt;

&lt;p&gt;Let's see how we can do this.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;axios&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Actors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="c1"&gt;//Declare an empty actors array to store actors' data.&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// It means we will change the last digit of the url to get the data of another actor&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`https://www.imdb.com/name/nm000015&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;//Change the last digit of the data&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="c1"&gt;// Add headers to axios request&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;h1 .sc-afe43def-1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Get the name of the actor&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Date_of_Birth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.sc-dec7a8b-2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;// Get the Date of Birth&lt;/span&gt;
    &lt;span class="nx"&gt;Actors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Date_of_Birth&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Actors&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/web-scraping-how-to-solve-403-errors/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--whQC5EGK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2022/12/Copy-of--LinkedIn_Social-media-images-template-3.png" height="420" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/web-scraping-how-to-solve-403-errors/" rel="noopener noreferrer" class="c-link"&gt;
          Web scraping: how to solve 403 errors
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          403 Forbidden error keeps reappearing? Try our workarounds.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--q_zdUqT4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/size/w256h256/2021/03/favicon-128x128.png" width="128" height="128"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Challenges of web scraping with Cheerio&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Like all web scraping libraries, Cheerio has its pros and cons. We have already discussed the pros of using Cheerio. Now let's discuss what challenges we can face while working with Cheerio.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Dynamic websites and JavaScript&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's start with one of the biggest challenges of web scraping with Cheerio. It's, dealing with dynamic websites and JavaScript. Nowadays, modern websites use dynamic content and JavaScript to update or modify the page content without needing an entire page to reload. That can make it tough to scrape the website with Cheerio alone. The content may not be fully loaded or loaded asynchronously after the initial page load.&lt;/p&gt;

&lt;p&gt;When Cheerio loads an HTML page, it only sees the static HTML content. Any JavaScript or dynamic content that's loaded later is hidden from Cheerio. That can be a letdown if the website you're trying to scrape uses JavaScript to load or modify content.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Anti-scraping measures&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Do you know why some scrapers get blocked and don't work as efficiently as they should? If we're smart enough to make scrapers scrape websites, the developers are also ready to block us and stop us from extracting data from their websites. They use different anti-scraping techniques to do this. Let's discuss a few of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CAPTCHAs&lt;/strong&gt; : &lt;a href="https://blog.apify.com/why-captchas-are-bad/"&gt;CAPTCHAs&lt;/a&gt; are tests that are designed to differentiate between human users and bots. They require users to complete a task, such as typing in a sequence of letters or identifying objects in an image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;IP blocking&lt;/strong&gt; : Websites block IP addresses that are associated with web scraping. This means that if a particular IP address is identified as a scraper, the website will block all requests from that IP address.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;User-agent detection&lt;/strong&gt; : User-agent detection is a technique that identifies the type of browser or device that is being used to access the website. Websites can use this information to identify scrapers, as many scrapers use non-standard user agents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dynamic web pages&lt;/strong&gt; : Some websites use &lt;a href="https://blog.apify.com/what-is-a-dynamic-page/"&gt;dynamic web pages&lt;/a&gt; that are generated using JavaScript. These pages can be more difficult to scrape, as the content is generated on the fly and may not be present in the page source.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a web scraper developer, it's important to be aware of these anti-scraping measures and to take steps to avoid them. This may include using rotating proxies or user agents, cookies, implementing delays between requests, and much more. You can learn about these measures &lt;a href="https://docs.apify.com/academy/api-scraping/general-api-scraping/cookies-headers-tokens?ref=blog.apify.com"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Performance issues&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When working with large amounts of data or complex HTML structures, using intermediate results to optimize performance is common. It means that we store the results of certain operations and reuse them later rather than re-calculating them each time. jQuery can do that to optimize performance because it's designed to run in a browser with more memory.&lt;/p&gt;

&lt;p&gt;However, Cheerio is designed to run in Node.js, which has a more limited memory than the browser. As a result, Cheerio has a hard time saving intermediate results. It can quickly run into memory issues when working with large datasets.&lt;/p&gt;

&lt;p&gt;Without the ability to save intermediate results, Cheerio has to parse the entire HTML document to perform each operation. It can be slow and resource-intensive, especially when working with large documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Congratulations&lt;/strong&gt; on completing the basics of web scraping and Cheerio! Now it's time to level up and learn a few advanced concepts of web scraping.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/crawl-without-getting-blocked/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--Licg0MnR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/01/50.png" height="450" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/crawl-without-getting-blocked/" rel="noopener noreferrer" class="c-link"&gt;
          Web scraping: how to crawl without getting blocked
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          Guide on how to solve or avoid anti-scraping protections.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--q_zdUqT4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/size/w256h256/2021/03/favicon-128x128.png" width="128" height="128"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;h3&gt;
  
  
  &lt;strong&gt;Scraping websites with dynamic content&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Advanced websites load data at runtime, and hence it becomes difficult to extract data using Cheerio. That's where other libraries come in to help Cheerio. Let's take a look at these libraries and see how they help.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Puppeteer&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before we start learning Puppeteer, let's discuss where Cheerio fails. While Cheerio is an excellent tool, it fails when it comes to parsing websites that load data at runtime and need a browser to open them. Now, what Puppeteer does for us is that it helps us open the website in a browser known as a &lt;a href="https://blog.apify.com/headless-browsers-what-are-they-and-how-do-they-work/"&gt;headless browser&lt;/a&gt; that acts like a browser, but in reality, it's not. Once the website is loaded using Puppeteer, we can use Cheerio to load its HTML and do whatever we want. Pretty amazing, right?&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/puppeteer-web-scraping-tutorial/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--sjEl_Tna--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/03/How_to_scrape_web_with_puppetter_in_2023.png" height="450" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/puppeteer-web-scraping-tutorial/" rel="noopener noreferrer" class="c-link"&gt;
          How to scrape the web with Puppeteer in 2023
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          Complete web scraping and crawling tutorial for Puppeteer.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--q_zdUqT4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/size/w256h256/2021/03/favicon-128x128.png" width="128" height="128"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Here are some of the core functions provided by Puppeteer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;puppeteer.launch([options])&lt;/code&gt;: This function launches an instance of Chrome or Chromium with a set of options specified in an object. It returns a Promise that resolves to an instance of the &lt;code&gt;Browser&lt;/code&gt; class.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;browser.newPage()&lt;/code&gt;: This function creates a new blank page in the browser instance. It returns a Promise that resolves to an instance of the &lt;code&gt;Page&lt;/code&gt; class.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;page.goto(URL, [options])&lt;/code&gt;: This function navigates to the specified URL. It returns a Promise that resolves when the page is loaded.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;page.content()&lt;/code&gt;: This function returns the HTML content of the current page. It returns a Promise that resolves to a string.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;browser.close()&lt;/code&gt; is a function in Puppeteer that is used to close the browser instance that was opened with &lt;code&gt;puppeteer.launch()&lt;/code&gt;. It will terminate the browser process and all of its pages.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are just some of the functions provided by Puppeteer. There are other functions available as well.&lt;/p&gt;

&lt;p&gt;To install Puppeteer in your environment, run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;puppeteer

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let's take a closer look at how this works with an example. We want to extract the Top 250 Movies from the IMDb website in this example.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n3I6wX9i--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/IMDb-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n3I6wX9i--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/IMDb-1.png" alt="Extracting the top 250 movies from the IMDb website" width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see from the image above, all the movies are in one body tag &lt;code&gt;&amp;lt;tbody&amp;gt;&lt;/code&gt; having a class of &lt;code&gt;lister-list&lt;/code&gt;, and each movie is the &lt;code&gt;&amp;lt;tr&amp;gt;&lt;/code&gt; .Now, let's start using puppeteer with Cheerio.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;puppeteer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
 &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;//Launch the browser  &lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;//Open a new page &lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.imdb.com/chart/top&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//The opened page goes to the link provided in the .goto() function&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;//Get the content of the page using the .content() function&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;movies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;moviesTr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tbody.lister-list tr&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;//Select all the table rows using the Cheerio selectors&lt;/span&gt;
  &lt;span class="nx"&gt;moviesTr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;movie&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;//Iterate over all the movies one by one&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;movie&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.titleColumn a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;movie&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.titleColumn .secondaryInfo&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rating&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;movie&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.ratingColumn strong&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
   &lt;span class="nx"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="c1"&gt;//Push the extracted data&lt;/span&gt;
   &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;})();&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Playwright&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Playwright is also an open-source Node.js library that was developed by the same team that developed Puppeteer. It is a powerful and versatile &lt;a href="https://blog.apify.com/playwright-vs-puppeteer-which-is-better/"&gt;alternative to Puppeteer&lt;/a&gt;. The Puppeteer team needed a tool that could automate not just the Chromium-based browsers but other browsers like Firefox and Safari. So, they developed Playwright, which supports other browsers as well.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/how-to-scrape-the-web-with-playwright-ece1ced75f73/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--w_BPB1GY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/03/How_to_scrape_web_with_Playwright.png" height="450" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/how-to-scrape-the-web-with-playwright-ece1ced75f73/" rel="noopener noreferrer" class="c-link"&gt;
          How to scrape the web with Playwright in 2023
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          Complete Playwright web scraping and crawling tutorial.
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--q_zdUqT4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/size/w256h256/2021/03/favicon-128x128.png" width="128" height="128"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;To install Puppeteer in your environment, run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;playwright

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let's implement the above code with Playwright.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;chromium&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;playwright&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;//import chromium from playwright&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;chromium&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Launch the browser&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;newContext&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Create a new context&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Create a new page&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.imdb.com/chart/top&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Navigate to the provided URL&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Get the page content&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;movies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;moviesTr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tbody.lister-list tr&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Select all the table rows using the Cheerio selectors&lt;/span&gt;
&lt;span class="nx"&gt;moviesTr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;movie&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// Iterate over all the movies one by one&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;movie&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.titleColumn a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;movie&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.titleColumn .secondaryInfo&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rating&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;movie&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.ratingColumn strong&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="c1"&gt;// Push the extracted data&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Close the browser&lt;/span&gt;
&lt;span class="p"&gt;})();&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Playwright is a more flexible and powerful automation framework than Puppeteer, which makes it an excellent alternative for advanced web automation.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How to implement authentication&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;What if a website requires authentication? That's another challenge, but don't worry: we've got you covered. It's not possible to extract data from such websites without logging in. Therefore, we need to find a way to authenticate ourselves through our code to open the page from where we can extract the data.&lt;/p&gt;

&lt;p&gt;For this purpose, we need the help of the Puppeteer library, which will do the authentication task. And obviously, our well-known tool, Cheerio, will extract the data. With Puppeteer and Cheerio, we can automate the authentication process and extract the data we need.&lt;/p&gt;

&lt;p&gt;Let's look at an example to understand how to implement authentication with Cheerio and Puppeteer. We'll try to authenticate the &lt;a href="https://newsapi.org/login?ref=blog.apify.com"&gt;newsapi&lt;/a&gt; website, which looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--adn7sz61--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Login-newsapi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--adn7sz61--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Login-newsapi.png" alt="Using Puppeteer and Cheerio to authenticate the newsapi website" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's dive into the code and see how this magic happens:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;puppeteer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;//Launch the browser using .launch() function&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;newPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;//Open a new page using .newPage() function&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://newsapi.org/login&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//The opened page goes to the link provided in the .goto() function&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#Email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your_email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//Enter the email and password in the respective input fields&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#Password&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;you_password&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//The await waits for the website to load these fields.&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;button[type=submit]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//Click the submit button&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;waitForNavigation&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;//Wait for the website to finish the login process&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://newsapi.org/account&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;//Go to the home page&lt;/span&gt;
    &lt;span class="c1"&gt;//Now we can manipulate the website according to our needs&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.mb2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt; 
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;})();&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How to handle asynchronous requests, errors, and retries&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When web scraping with Cheerio, it's important to consider errors and retries to ensure our code is reliable and robust. As we saw earlier, error handling allows us to prevent and detect possible errors and whether the error occurs while fetching the website. It would require retries, and that's what we'll cover now. We'll see how things go from requesting a website, handling errors, and trying again. We've already seen the &lt;code&gt;axios&lt;/code&gt; library for making HTTP requests. That same &lt;code&gt;axios&lt;/code&gt; library allows us to retry the HTTP requests as well.&lt;/p&gt;

&lt;p&gt;To use &lt;code&gt;axios-retry&lt;/code&gt;, first we need to install it as a dependency and import it into our file.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;axios-retry

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Let's see an example.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;axios&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;axiosRetry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;axios-retry&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;axiosRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;axios&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;retryDelay&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;retryCount&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// multiple the retry time with 2000 miliseconds&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;scrapeWebsite&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;axios&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;link-to-the-website&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;scrapeWebsite&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;First, &lt;code&gt;scrapeWebsite&lt;/code&gt; will be called, and it will try to fetch the website provided in the &lt;code&gt;axios.get&lt;/code&gt; function, if the request fails, then the &lt;code&gt;axiosRetry&lt;/code&gt; function will be called. It will try to fetch the data three times, and the time between each try increases to a multiple of 2,000 milliseconds. The function will return an error if it gets nothing after three attempts.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;How to use a testing framework with Cheerio&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;It's important to test your code thoroughly before it goes into production. By testing with various inputs and outputs, we can be confident that the code works correctly and meets our requirements. Deploying code without proper testing can lead to unexpected problems, which can cause delays and additional costs to the project. Therefore, testing the code is a good practice to ensure the quality and reliability of the software.&lt;/p&gt;

&lt;p&gt;Let's see how we can use a framework to test our Cheerio codes.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Test code with Jest&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Jest is a popular framework developed to test JavaScript applications. It was developed by Facebook, and one of its key features is that it can run tests in parallel, which makes it super fast and efficient. It's easy to set up and test the codes.&lt;/p&gt;

&lt;p&gt;Let's see how we can add Jest to our environment.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open the terminal and enter the command below:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;npm&lt;/span&gt; &lt;span class="nx"&gt;install&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="nx"&gt;save&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;dev&lt;/span&gt; &lt;span class="nx"&gt;jest&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;We have passed this &lt;code&gt;--save-dev&lt;/code&gt; optional argument to make that this dependency is only needed in the development phase.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open the package.json and replace:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;with the following:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node --experimental-vm-modules node_modules/.bin/jest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Write the tests in the .test file and enter the command below to see the results:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;npm&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How to write a test with Jest&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Let's use some code that we wrote earlier in this Cheerio tutorial.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nx"&gt;getBookInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;htmlString&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;htmlString&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;structuredData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;bookTitle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.book-title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.author&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;releaseDate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.release-date&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.price&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;structuredData&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;getBookInfo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;We have a function named &lt;code&gt;getBookInfo&lt;/code&gt; that accepts an HTML string as an input and returns an object named &lt;code&gt;structuredData&lt;/code&gt;. In the end, we're exporting the function so that we can test it in our &lt;code&gt;test.js&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;Let's see how well this code is working. We'll open our IDE and create a file name &lt;code&gt;structureData.test.js&lt;/code&gt; and write the following code.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;getBookInfo&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./index&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;//Import the function getBookInfo&lt;/span&gt;
&lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;getBookInfo returns the correct book information&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;htmlString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
    &amp;lt;div class="book-info"&amp;gt; //HTML string that will be used for testing
      &amp;lt;h2 class="book-title"&amp;gt;The Great Gatsby&amp;lt;/h2&amp;gt;
      &amp;lt;p class="author"&amp;gt;By F. Scott Fitzgerald&amp;lt;/p&amp;gt;
      &amp;lt;p class="release-date"&amp;gt;Released: April 10, 1925&amp;lt;/p&amp;gt;
      &amp;lt;p class="price"&amp;gt;Price: $12.99&amp;lt;/p&amp;gt;
    &amp;lt;/div&amp;gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;//Declare a test having two arguments. The first argument is the description,&lt;/span&gt;
&lt;span class="c1"&gt;//The second argument is an anonymous function that will return if the test fails or not.&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bookInfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;getBookInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;htmlString&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bookInfo&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;toEqual&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;bookTitle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The Great Gatsby&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;By F. Scott Fitzgerald&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;releaseDate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Released: April 10, 1925&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Price: $12.99&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;//The .expect() function takes the results as an argument&lt;/span&gt;
&lt;span class="c1"&gt;// .toEqual() compares the results with the desired outcome.&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Like this, we can write as many tests using Jest as we want for our code, like missing values, extra spaces, etc.&lt;/p&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
      &lt;div class="c-embed__cover"&gt;
        &lt;a href="https://blog.apify.com/playwright-testing-how-to-write-and-run-e2e-tests-properly/" class="c-link s:max-w-50 align-middle" rel="noopener noreferrer"&gt;
          &lt;img alt="" src="https://res.cloudinary.com/practicaldev/image/fetch/s--vgdyMkvc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/02/Playwright_testing.png" height="450" class="m-0" width="800"&gt;
        &lt;/a&gt;
      &lt;/div&gt;
    &lt;div class="c-embed__body"&gt;
      &lt;h2 class="fs-xl lh-tight"&gt;
        &lt;a href="https://blog.apify.com/playwright-testing-how-to-write-and-run-e2e-tests-properly/" rel="noopener noreferrer" class="c-link"&gt;
          Playwright testing: how to write &amp;amp; run E2E tests properly
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;p class="truncate-at-3"&gt;
          How to write and run E2E tests the proper way 💡
        &lt;/p&gt;
      &lt;div class="color-secondary fs-s flex items-center"&gt;
          &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://res.cloudinary.com/practicaldev/image/fetch/s--q_zdUqT4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/size/w256h256/2021/03/favicon-128x128.png" width="128" height="128"&gt;
        blog.apify.com
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Finding and fixing bugs in your Cheerio code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;While creating scraping scripts and manipulating the DOM, we might encounter some challenges, but we have tools and strategies to make the process easier.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unit tests&lt;/strong&gt; : Testing the code before deployment and ensuring it works correctly can save us from major problems. Unit testing ensures that we get the expected results. Different technologies like Jest and Mocha are used for this.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Console.log&lt;/strong&gt; : The console.log is often used for debugging programs by writing it at various positions in the code. The same is used to troubleshoot Cheerio programs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zF2ooON9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Debug-using-the-console.log-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zF2ooON9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Debug-using-the-console.log-1.png" alt="Debugging using the console.log" width="800" height="654"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Debug using the console.log&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Browser developer tools&lt;/strong&gt; : The browser developer tools can be used to inspect the DOM and spot problems in the code. Such tools include Chrome DevTools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HmJzlcbD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Debug-using-the-inspect-element-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HmJzlcbD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/Debug-using-the-inspect-element-1.png" alt="Debugging using the inspect element" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Debug using the inspect element&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Best practices for web scraping with Cheerio&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The following approaches can be used to optimize our code and improve performance.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Handling dynamic content&lt;/strong&gt; : As we mentioned earlier, dynamic content creates a lot of issues while extracting data, so it's a good practice to always keep in mind that Cheerio may not help us in this scenario. We need to use other libraries to load the HTML and perform operations using Cheerio.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Handling complex selectors&lt;/strong&gt; : It can be tricky to work on websites that use complex selectors, i.e., nested selectors. It's recommended to break down the selectors and select elements very carefully.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Handling version compatibility issues&lt;/strong&gt; : Cheerio has different versions that may not be compatible with specific versions of Node.js or other libraries. Check compatibility before using Cheerio, and update to the latest version if necessary.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can visit this great &lt;a href="https://blog.apify.com/how-to-super-efficiently-scrape-any-website-for-beginners/"&gt;blog post&lt;/a&gt; to learn more about scraping websites more efficiently using Cheerio.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Alternatives to Cheerio&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Let's explore some of the alternatives to web scraping with Cheerio. We'll look at the pros and cons of these libraries as well.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Library&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Advantages&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Maintenance/Up-to-date?&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cheerio&lt;/td&gt;
&lt;td&gt;Lightweight, fast, and easy to use&lt;/td&gt;
&lt;td&gt;Limited functionality for dynamic web pages&lt;/td&gt;
&lt;td&gt;Maintained and up-to-date&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Puppeteer&lt;/td&gt;
&lt;td&gt;Full-fledged automation tool with Chrome DevTools integration&lt;/td&gt;
&lt;td&gt;Requires more setup than Cheerio, slower, resource-intensive&lt;/td&gt;
&lt;td&gt;Maintained and up-to-date&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSDOM&lt;/td&gt;
&lt;td&gt;Lightweight, allows for easy DOM manipulation&lt;/td&gt;
&lt;td&gt;Limited functionality for dynamic web pages&lt;/td&gt;
&lt;td&gt;Maintained and up-to-date&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NightmareJS&lt;/td&gt;
&lt;td&gt;High-level API, supports multiple browsers&lt;/td&gt;
&lt;td&gt;Slower than Puppeteer, outdated and not maintained&lt;/td&gt;
&lt;td&gt;Outdated and not maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PhantomJS&lt;/td&gt;
&lt;td&gt;Lightweight, supports multiple browsers&lt;/td&gt;
&lt;td&gt;Outdated and not maintained&lt;/td&gt;
&lt;td&gt;Outdated and not maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Playwright&lt;/td&gt;
&lt;td&gt;Multi-browser support, faster than Puppeteer&lt;/td&gt;
&lt;td&gt;Requires more setup than Cheerio, resource-intensive&lt;/td&gt;
&lt;td&gt;Maintained and up-to-date&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;node-html-parser&lt;/td&gt;
&lt;td&gt;Supports parsing of dynamic HTML, easy to use&lt;/td&gt;
&lt;td&gt;Limited functionality for web automation&lt;/td&gt;
&lt;td&gt;Maintained and up-to-date&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;got-scraping&lt;/td&gt;
&lt;td&gt;Lightweight, fast, and easy to use&lt;/td&gt;
&lt;td&gt;Developed specifically to address drawbacks of modern scraping tools&lt;/td&gt;
&lt;td&gt;Maintained and up-to-date&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Let's take a peek at the downloads of these packages as well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--18YCjaS7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/package-downloads.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--18YCjaS7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://blog.apify.com/content/images/2023/04/package-downloads.png" alt="Downloads of packages: cheerio, jsdom, nightmare, node-html-parser, phantomjs, playwright, puppeteer" width="800" height="314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Cheerio is a robust and adaptable framework with an easy-to-use API for parsing and manipulating HTML. With its jQuery-like syntax, extracting data from web pages and manipulating the HTML to meet your needs is simple.&lt;/p&gt;

&lt;p&gt;Although Cheerio is an excellent scraping tool in many cases, it does have its challenges, such as anti-scraping measures, dynamic websites, and performance issues. Yet, refined approaches and tools can assist you in overcoming these challenges and achieving your web scraping objectives.&lt;/p&gt;

&lt;p&gt;As you may have observed, this blog post by no means tries to pitch Cheerio as the ultimate scraping tool. Cheerio has its fair share of shortcomings and areas to improve. Having said that, Cheerio still looks pretty promising. Also, it's always easy to switch to alternative tools after mastering it.&lt;/p&gt;

&lt;p&gt;As the saying goes, &lt;strong&gt;&lt;em&gt;Perpetuam Uitae Doctrina (Life long learning)&lt;/em&gt;&lt;/strong&gt;. Complimenting this tutorial with practice is essential. Also, a good tutorial is neither perfect nor the only source of information on the subject. So, if you have any feedback for improvement, please let us know. Happy learning!&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Frequently asked questions&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is Cheerio js?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Cheerio js is an easier and more efficient way to extract data in Node.js. It's a lightweight library that allows us to crawl web pages and extract data using CSS-style selectors. It enables us to load HTML as a string and returns an object that can be used for extracting data.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What are the benefits of Cheerio?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Cheerio's jQuery-like syntax is a big advantage for many developers. Setting it up is easy compared to other tools. Even beginners can get started with just a little bit of configuration. Cheerio is also modular and can be extended with Node.js modules. Another big advantage of Cheerio is that it requires no browser.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do you install Cheerio in Node.js?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Simply head over to your working environment and open the terminal. Enter the command &lt;strong&gt;npm install cheerio&lt;/strong&gt;. Create a new Node js project using the command line interface. From there, add Cheerio js as a dependency and start using it to extract data and manipulate HTML easily.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do you scrape data from a single web page using Cheerio?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You can specify what data you want to extract by passing the &lt;strong&gt;keys&lt;/strong&gt; and &lt;strong&gt;values&lt;/strong&gt; to the object. The &lt;strong&gt;keys&lt;/strong&gt; in the map object represent the names of the properties you want to create on the object, while the &lt;strong&gt;values&lt;/strong&gt; are the Cheerio selectors you'll use to extract the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do you use Puppeteer with Cheerio?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Use Puppeteer to open the website in a headless browser. Once the website is loaded using Puppeteer, you can use Cheerio to load its HTML.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do you scrape data from multiple pages using Cheerio and pagination?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When scraping paginated content, you can use the navigation links to move between pages and extract data from each one. However, when scraping multiple pages that aren't paginated, you'll need to use a different approach to identify and extract the data you need.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do you handle errors while scraping with Cheerio?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To handle errors, we can use &lt;code&gt;try-catch&lt;/code&gt; blocks in our code. A &lt;code&gt;try&lt;/code&gt; block allows us to try a block of code when we're not sure whether the code will execute and &lt;code&gt;catch&lt;/code&gt; any errors that might occur. If an error occurs, the &lt;code&gt;catch&lt;/code&gt; block will execute, allowing us to handle the error appropriately.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do you write scraped data to a local file using Cheerio?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Store the data in a JSON file. You can use the built-in &lt;code&gt;fs&lt;/code&gt; module in Node.js, which lets you interact with the file system on a computer.&lt;/p&gt;

</description>
      <category>cheerio</category>
      <category>javascript</category>
      <category>webscraping</category>
    </item>
  </channel>
</rss>
