<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jacob Goh</title>
    <description>The latest articles on DEV Community by Jacob Goh (@jacobgoh101).</description>
    <link>https://dev.to/jacobgoh101</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F72765%2F0be3f25f-353f-4cf6-8ee4-5152b5fc7e78.jpg</url>
      <title>DEV Community: Jacob Goh</title>
      <link>https://dev.to/jacobgoh101</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jacobgoh101"/>
    <language>en</language>
    <item>
      <title>3 Ways You Could Customize 3rd Party React Component</title>
      <dc:creator>Jacob Goh</dc:creator>
      <pubDate>Sat, 08 Dec 2018 08:09:08 +0000</pubDate>
      <link>https://dev.to/jacobgoh101/3-ways-you-could-customize-3rd-party-react-component-3dpl</link>
      <guid>https://dev.to/jacobgoh101/3-ways-you-could-customize-3rd-party-react-component-3dpl</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Component libraries make our life easier.&lt;/p&gt;

&lt;p&gt;But as developers, you would often find yourself in situations where 3rd party components don't provide the functionality or customization capability the project needs. &lt;/p&gt;

&lt;p&gt;We are left with 2 choices:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write the component from scratch yourself&lt;/li&gt;
&lt;li&gt;Customize the 3rd party components&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What to choose depends on the component and the situation that you are in. &lt;/p&gt;

&lt;p&gt;Apparently, some components are not customizable, Some feature requirements are not feasible. But most of the time, customizing 3rd party component is the less time-consuming option. Here's how.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before we start
&lt;/h2&gt;

&lt;p&gt;As example, we are going to customize &lt;a href="https://github.com/ericgio/react-bootstrap-typeahead" rel="noopener noreferrer"&gt;react-bootstrap-typeahead&lt;/a&gt; component.&lt;/p&gt;

&lt;p&gt;Here's the starter if you wanna follow along &lt;a href="https://stackblitz.com/edit/react-hznpca" rel="noopener noreferrer"&gt;https://stackblitz.com/edit/react-hznpca&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  1. Overwriting CSS
&lt;/h1&gt;

&lt;p&gt;This is fairly straightforward. &lt;/p&gt;

&lt;p&gt;Just find out what's the component's CSS classes and overwrite them with new CSS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Add a dropdown icon to the input box, so that it looks like a drop-down.&lt;/p&gt;

&lt;p&gt;Just add Font Awesome to &lt;code&gt;index.html&lt;/code&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2FzDSsKct3%2Fcarbon-2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2FzDSsKct3%2Fcarbon-2.png" alt="Add FontAwesome" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and add these CSS to  &lt;code&gt;style.css&lt;/code&gt;&lt;br&gt;
&lt;a href="https://postimg.cc/xNP15njW" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzt2pfpbqrngdrar01bw.png" alt="carbon-4.png" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Demo: &lt;a href="https://stackblitz.com/edit/react-wdjptx" rel="noopener noreferrer"&gt;https://stackblitz.com/edit/react-wdjptx&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  2. Wrapper Component
&lt;/h1&gt;

&lt;p&gt;This is where you could alter the default behavior of the 3rd party component.&lt;/p&gt;

&lt;p&gt;Start by creating a wrapper component &lt;code&gt;CustomizedTypeahead&lt;/code&gt; and replace &lt;code&gt;Typeahead&lt;/code&gt; with it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2FNFYxQbmz%2Fcarbon-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2FNFYxQbmz%2Fcarbon-1.png" alt="Wrapper Component" width="800" height="400"&gt;&lt;/a&gt;&lt;a href="https://stackblitz.com/edit/react-rwyjmm" rel="noopener noreferrer"&gt;https://stackblitz.com/edit/react-rwyjmm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This wrapper component has no effect for now. It's simply passing &lt;code&gt;props&lt;/code&gt; down to the Typeahead component.&lt;/p&gt;

&lt;p&gt;We are going to customize the component behavior by making changes to &lt;code&gt;props&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: Setting Default Props
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Adding default props&lt;/p&gt;

&lt;p&gt;Let's start with the simplest customization.&lt;/p&gt;

&lt;p&gt;Let say we want all the &lt;code&gt;CustomizedTypeahead&lt;/code&gt; to have the &lt;code&gt;clearButton&lt;/code&gt; props enabled by default.&lt;/p&gt;

&lt;p&gt;We can do so by &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2F90x5TLPD%2Fcarbon-5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2F90x5TLPD%2Fcarbon-5.png" alt="carbon-5.png" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is equilavant to &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2F9XkLR3ZG%2Fcarbon-6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2F9XkLR3ZG%2Fcarbon-6.png" alt="carbon-6.png" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We create &lt;code&gt;injectedProps&lt;/code&gt; and will put all the &lt;code&gt;props&lt;/code&gt; modification inside to make the codes manageable.&lt;/p&gt;

&lt;p&gt;Demo: &lt;a href="https://stackblitz.com/edit/react-tk9pau" rel="noopener noreferrer"&gt;https://stackblitz.com/edit/react-tk9pau&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: Modifying Props
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; To sort all options  by alphabetic order&lt;/p&gt;

&lt;p&gt;We are receiving &lt;code&gt;options&lt;/code&gt;, which is an array of objects, and &lt;code&gt;labelKey&lt;/code&gt;, which tell us that the option's label should be &lt;code&gt;optionObject[labelKey]&lt;/code&gt;.  Our goal is to sort &lt;code&gt;optionObject[labelKey]&lt;/code&gt; by alphabetic order.&lt;/p&gt;

&lt;p&gt;We can do so by using &lt;a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort" rel="noopener noreferrer"&gt;Array.prototype.sort()&lt;/a&gt; to sort the &lt;code&gt;options&lt;/code&gt; array. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2F59h11R7t%2Fcarbon-7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2F59h11R7t%2Fcarbon-7.png" alt="carbon-7.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
This way, the &lt;code&gt;options&lt;/code&gt; in &lt;code&gt;injectedProps&lt;/code&gt; will overwrite the original &lt;code&gt;options&lt;/code&gt; in &lt;code&gt;props&lt;/code&gt;. That's how we can sort all options by alphabetic order by default.&lt;/p&gt;

&lt;p&gt;Demo: &lt;a href="https://stackblitz.com/edit/react-cqv5vz" rel="noopener noreferrer"&gt;https://stackblitz.com/edit/react-cqv5vz&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: Intercepting Event Listeners
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; When the user selects an option, if the user has selected both "California" and "Texas" together, alert the user and clear the selection (for no particular reason other than for demo).&lt;/p&gt;

&lt;p&gt;This is the fun part where you can do lots of customization.&lt;/p&gt;

&lt;p&gt;Basically, this is how it will work,&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2FHn99PfnV%2Fcarbon-3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2FHn99PfnV%2Fcarbon-3.png" alt="carbon.png" width="800" height="400"&gt;&lt;/a&gt;Note the &lt;code&gt;if(onChange) onChange(selectedOptions);&lt;/code&gt;. This makes sure that the original onChange event listener continues to run after we intercept it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2Fmr4sg8Kx%2Fcarbon-5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.postimg.cc%2Fmr4sg8Kx%2Fcarbon-5.png" alt="carbon-5.png" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Here's what we did in the code above,&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;We create an &lt;code&gt;onChange&lt;/code&gt; function that is of the same structure of the default &lt;code&gt;onChange&lt;/code&gt; function. It's a function that receives an array of selected options.&lt;/li&gt;
&lt;li&gt;We scan through the selected options and check if it's valid. &lt;/li&gt;
&lt;li&gt;If it's invalid,

&lt;ul&gt;
&lt;li&gt;show an alert&lt;/li&gt;
&lt;li&gt;clear the input&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Run the original &lt;code&gt;onChange&lt;/code&gt;  event listener&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Demo: &lt;a href="https://stackblitz.com/edit/react-ravwmw" rel="noopener noreferrer"&gt;https://stackblitz.com/edit/react-ravwmw&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  3. Modifying the source code
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Caution: Don't overuse this! This is your last resort. You should only do this if there is no other choice.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If none of the above works for you, the choices you have are now limited to: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Find another component library&lt;/li&gt;
&lt;li&gt;Write your own component from scratch&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Modify the component source code&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's actually not uncommon that one would have to modify a package's source code to fit a project's need. Especially if you found a bug in a package and you need it fixed urgently.&lt;/p&gt;

&lt;p&gt;But there are a few cons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some package uses different languages like CoffeeScript, Typescript. If you don't know the language, you don't know how to edit it.&lt;/li&gt;
&lt;li&gt;It can be time-consuming to study the source code and figure out where exactly to put your modification.&lt;/li&gt;
&lt;li&gt;You may unintentionally break some part of the package.&lt;/li&gt;
&lt;li&gt;When the package updates, you would need to manually apply the update.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you decide to go ahead and make some modification to the source code, here's how.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Fork the Github Repository
&lt;/h3&gt;

&lt;p&gt;In our example case, go to &lt;a href="https://github.com/ericgio/react-bootstrap-typeahead" rel="noopener noreferrer"&gt;https://github.com/ericgio/react-bootstrap-typeahead&lt;/a&gt; and fork the repo to your own GitHub account.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Clone the repo to your machine
&lt;/h3&gt;
&lt;h3&gt;
  
  
  3. Make the modification
&lt;/h3&gt;
&lt;h3&gt;
  
  
  4. Push the repo to your GitHub account
&lt;/h3&gt;
&lt;h3&gt;
  
  
  5. Install your repo as a dependency
&lt;/h3&gt;

&lt;p&gt;After you fork the repo, your GitHub repo's URL should be &lt;code&gt;https://github.com/&amp;lt;your GitHub username&amp;gt;/react-bootstrap-typeahead&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can install this git repo as a dependency by executing this command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm i https://github.com/&amp;lt;your GitHub username&amp;gt;/react-bootstrap-typeahead
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, you should see this in package.json&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"react-bootstrap-typeahead"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"git+https://github.com/&amp;lt;your github username&amp;gt;/react-bootstrap-typeahead.git"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;We talked about 3 ways to customize 3rd party React component. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Overwriting CSS&lt;/li&gt;
&lt;li&gt;Using Wrapper Component&lt;/li&gt;
&lt;li&gt;Modifying the source code&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Hopefully, this would make your life as a React developer easier.&lt;/p&gt;

&lt;p&gt;In the meantime, let's all take a moment and be grateful to all the open source creators/contributors out there. Without these open source packages, we wouldn't be able to move as fast as we do today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;What's your experience with 3rd party component libraries? What other method would you use to customize them? Leave a comment!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>react</category>
      <category>javascript</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Simple &amp; Customizable Web Scraper using RxJS and Node</title>
      <dc:creator>Jacob Goh</dc:creator>
      <pubDate>Sat, 10 Nov 2018 17:08:00 +0000</pubDate>
      <link>https://dev.to/jacobgoh101/simple--customizable-web-scraper-using-rxjs-and-node-1on7</link>
      <guid>https://dev.to/jacobgoh101/simple--customizable-web-scraper-using-rxjs-and-node-1on7</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffiqa000msc4fptfbkyvf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffiqa000msc4fptfbkyvf.jpg" alt="spider" width="800" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;After getting to know RxJS (Thanks to Angular!), I realized that it's surprisingly a good fit for handling web scraping operations.&lt;/p&gt;

&lt;p&gt;I tried it out in a side project and I would like to share my experience with you. Hopefully, this would open your eyes to how reactive programming can make your life simpler.&lt;/p&gt;

&lt;p&gt;The codes can be found at  &lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.dev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/jacobgoh101" rel="noopener noreferrer"&gt;
        jacobgoh101
      &lt;/a&gt; / &lt;a href="https://github.com/jacobgoh101/web-scraping-with-rxjs" rel="noopener noreferrer"&gt;
        web-scraping-with-rxjs
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;Codes for article  &lt;a href="https://dev.to/jacobgoh101/simple--customizable-web-scraper-using-rxjs-and-node-1on7" rel="nofollow"&gt;Simple &amp;amp; Customizable Web Scraper using RxJS and Node&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;



&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/jacobgoh101/web-scraping-with-rxjs" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;
 

&lt;h1&gt;
  
  
  Requirements
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Node&lt;/li&gt;
&lt;li&gt;RxJS and intermediate understanding of it&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/cheerio" rel="noopener noreferrer"&gt;cheerio&lt;/a&gt;: it allows you to use jQuery like syntax to extract information out of HTML codes&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/request-promise-native" rel="noopener noreferrer"&gt;request-promise-native&lt;/a&gt;: for sending HTTP request&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Hypothetical Goal
&lt;/h1&gt;

&lt;p&gt;Everybody loves a good comedy movie. &lt;/p&gt;

&lt;p&gt;Let's make it our goal to scrape a list of good comedy movies from IMDB.&lt;/p&gt;

&lt;p&gt;There are only 3 requirements that the target data needs to fulfill&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it is a movie (not TV shows, music videos, etc)&lt;/li&gt;
&lt;li&gt;it is a comedy&lt;/li&gt;
&lt;li&gt;it has a rating of 7 or higher&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Get Started
&lt;/h1&gt;

&lt;p&gt;Let's set our base URL and define a BehaviorSubject &lt;code&gt;allUrl$&lt;/code&gt; that uses the base URL as the initial value. &lt;/p&gt;

&lt;p&gt;(A BehaviorSubject is a &lt;a href="https://www.youtube.com/watch?v=rdK92pf3abs" rel="noopener noreferrer"&gt;subject&lt;/a&gt; with an initial value.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BehaviorSubject&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rxjs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt;  &lt;span class="nx"&gt;baseUrl&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="s2"&gt;`https://imdb.com`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt;  &lt;span class="nx"&gt;allUrl$&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="k"&gt;new&lt;/span&gt;  &lt;span class="nc"&gt;BehaviorSubject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;allUrl$&lt;/code&gt; is going to be the starting point of all crawling operation. Every URL will be passed into &lt;code&gt;allUrl$&lt;/code&gt; and be processed on later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making sure that we scrape each URL only once
&lt;/h3&gt;

&lt;p&gt;With the help of &lt;a href="https://rxjs-dev.firebaseapp.com/api/operators/distinct" rel="noopener noreferrer"&gt;distinct&lt;/a&gt; operators and &lt;a href="https://www.npmjs.com/package/normalize-url" rel="noopener noreferrer"&gt;normalize-url&lt;/a&gt;, we can easily make sure that we never scrape the same URL twice.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;distinct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;filter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rxjs/operators&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt;  &lt;span class="nx"&gt;normalizeUrl&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;normalize-url&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// ...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt;  &lt;span class="nx"&gt;uniqueUrl$&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nx"&gt;allUrl$&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;// only crawl IMDB url&lt;/span&gt;
    &lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;  &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="c1"&gt;// normalize url for comparison&lt;/span&gt;
    &lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;  &lt;span class="nf"&gt;normalizeUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;removeQueryParameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ref&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ref_&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;     &lt;span class="p"&gt;})),&lt;/span&gt;
    &lt;span class="c1"&gt;// distinct is a RxJS operator that filters out duplicated values&lt;/span&gt;
    &lt;span class="nf"&gt;distinct&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  It's time to start scraping
&lt;/h3&gt;

&lt;p&gt;We are going to make a request to each unique URL and map the content of each URL into another observable.&lt;/p&gt;

&lt;p&gt;To do that, we use &lt;a href="https://www.learnrxjs.io/operators/transformation/mergemap.html" rel="noopener noreferrer"&gt;mergeMap&lt;/a&gt; to map the result of the request to another observable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BehaviorSubject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rxjs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;distinct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mergeMap&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rxjs/operators&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;request-promise-native&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt;  &lt;span class="nx"&gt;cheerio&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cheerio&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;//...&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urlAndDOM$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;uniqueUrl$&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nf"&gt;mergeMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;rp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="c1"&gt;// get the cheerio function $&lt;/span&gt;
      &lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
      &lt;span class="c1"&gt;// add URL to the result. It will be used later for crawling&lt;/span&gt;
      &lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;url&lt;/span&gt;
      &lt;span class="p"&gt;}))&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;urlAndDOM$&lt;/code&gt; will emit an object consist of 2 properties, which are &lt;code&gt;$&lt;/code&gt; and &lt;code&gt;url&lt;/code&gt;. &lt;code&gt;$&lt;/code&gt; is a Cheerio function where you can use something like &lt;code&gt;$('div').text()&lt;/code&gt; to extract information out of raw HTML codes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Crawl all the URLs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;resolve&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;url&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;//...&lt;/span&gt;

&lt;span class="c1"&gt;// get all the next crawlable URLs&lt;/span&gt;
&lt;span class="nx"&gt;urlAndDOM$&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;each&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;elem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;href&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// build the absolute url&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;absoluteUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;href&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;allUrl$&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;absoluteUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the code above, we scrape all the links inside the page and send it to &lt;code&gt;allUrl$&lt;/code&gt; for it to be crawled later. &lt;/p&gt;

&lt;h3&gt;
  
  
  Scrape and save the movies we want!
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt;  &lt;span class="nx"&gt;fs&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;//...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isMovie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`[property='og:type']`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;video.movie&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isComedy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`.title_wrapper .subtext`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Comedy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isHighlyRated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`[itemprop="ratingValue"]`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;urlAndDOM$&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;isMovie&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;isComedy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;isHighlyRated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// append the data we want to a file named "comedy.txt"&lt;/span&gt;
    &lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appendFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;comedy.txt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;\n`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Yup, we had just created a web scraper
&lt;/h2&gt;

&lt;p&gt;In around 70 lines of code, we have created a web scraper that&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;automatically crawled URLs without unnecessary duplicates&lt;/li&gt;
&lt;li&gt;automatically scrape and save the info we want in a text file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You may see the code up to this point in &lt;a href="https://github.com/jacobgoh101/web-scraping-with-rxjs/blob/86ff05e893dec5f1b39647350cb0f74efe258c86/index.js" rel="noopener noreferrer"&gt;https://github.com/jacobgoh101/web-scraping-with-rxjs/blob/86ff05e893dec5f1b39647350cb0f74efe258c86/index.js&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you had tried writing a web scraper from scratch, you should be able to see now how elegant it is to write one with RxJS.&lt;/p&gt;

&lt;h2&gt;
  
  
  But we are not done yet...
&lt;/h2&gt;

&lt;p&gt;In an ideal world, the code above may work forever without any problem.&lt;/p&gt;

&lt;p&gt;But in reality, &lt;del&gt;shits&lt;/del&gt; errors happen.&lt;/p&gt;

&lt;h1&gt;
  
  
  Handling Errors
&lt;/h1&gt;

&lt;h3&gt;
  
  
  Limit the number of active concurrent  connection
&lt;/h3&gt;

&lt;p&gt;If we send too much request to a server in a short period of time, it's likely that our IP would be temporarily blocked for making any further request, especially for an established website like IMDB.&lt;/p&gt;

&lt;p&gt;It's also considered &lt;strong&gt;rude/unethical&lt;/strong&gt; to send to request at once because it would create a heavier load on the server and in some cases, &lt;strong&gt;crash the server&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.learnrxjs.io/operators/transformation/mergemap.html" rel="noopener noreferrer"&gt;mergeMap&lt;/a&gt; has built-in functionality to control concurrency. Simply add a number to the 3rd function argument and it will limit the active concurrent connection automatically. Graceful!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;maxConcurrentReq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;//...&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urlAndDOM$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;uniqueUrl$&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nf"&gt;mergeMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;//...&lt;/span&gt;
    &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;maxConcurrentReq&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Code Diff: &lt;a href="https://github.com/jacobgoh101/web-scraping-with-rxjs/commit/6aaed6dae230d2dde1493f1b6d78282ce2e8f316" rel="noopener noreferrer"&gt;https://github.com/jacobgoh101/web-scraping-with-rxjs/commit/6aaed6dae230d2dde1493f1b6d78282ce2e8f316&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Handle and Retry Failed Request
&lt;/h3&gt;

&lt;p&gt;Requests may fail randomly due to dead links or server-side rate limiting. This is crucial for web scrapers. &lt;/p&gt;

&lt;p&gt;We can use &lt;a href="https://www.learnrxjs.io/operators/error_handling/catch.html" rel="noopener noreferrer"&gt;catchError&lt;/a&gt;,  &lt;a href="https://www.learnrxjs.io/operators/error_handling/retry.html" rel="noopener noreferrer"&gt;retry&lt;/a&gt; operators to handle this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BehaviorSubject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rxjs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
  &lt;span class="nx"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;catchError&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rxjs/operators&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;//...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// ...&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;urlAndDOM$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;uniqueUrl$&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nf"&gt;mergeMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;rp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nf"&gt;catchError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;uri&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
          &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Error requesting &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; after &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; retries.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
          &lt;span class="c1"&gt;// return null on error&lt;/span&gt;
          &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="c1"&gt;// filter out errors&lt;/span&gt;
        &lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Code Diff: &lt;a href="https://github.com/jacobgoh101/web-scraping-with-rxjs/commit/3098b48ca91a59aa5171bc2aa9c17801e769fcbb" rel="noopener noreferrer"&gt;https://github.com/jacobgoh101/web-scraping-with-rxjs/commit/3098b48ca91a59aa5171bc2aa9c17801e769fcbb&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Improved Retry Failed Request
&lt;/h3&gt;

&lt;p&gt;Using retry operator, the retry would happen immediately after the request failed. This is not ideal. &lt;/p&gt;

&lt;p&gt;It's better to retry after a certain amount of delay.&lt;/p&gt;

&lt;p&gt;We can use the &lt;code&gt;genericRetryStrategy&lt;/code&gt; suggested in &lt;a href="https://www.learnrxjs.io/operators/error_handling/retrywhen.html" rel="noopener noreferrer"&gt;learnrxjs&lt;/a&gt; to  achieve this. &lt;/p&gt;

&lt;p&gt;Code Diff: &lt;a href="https://github.com/jacobgoh101/web-scraping-with-rxjs/commit/e194f4ff128a573241055ffc0d1969d54ca8c270" rel="noopener noreferrer"&gt;https://github.com/jacobgoh101/web-scraping-with-rxjs/commit/e194f4ff128a573241055ffc0d1969d54ca8c270&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;To recap, in this post, we discuss&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how to crawl a web page using Cheerio&lt;/li&gt;
&lt;li&gt;how to avoid duplicated crawl using RxJS operators like filter, distinct&lt;/li&gt;
&lt;li&gt;how to use mergeMap to create an observable of request 's response&lt;/li&gt;
&lt;li&gt;how to limit concurrency in mergeMap&lt;/li&gt;
&lt;li&gt;how to handle error&lt;/li&gt;
&lt;li&gt;how to handle retry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I hope this has been helpful to you and has deepened your understanding of RxJs and web scraping.&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>rxjs</category>
      <category>node</category>
      <category>webscraping</category>
    </item>
  </channel>
</rss>
