<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kajal</title>
    <description>The latest articles on DEV Community by Kajal (@techykajal).</description>
    <link>https://dev.to/techykajal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F409592%2F080037eb-4b30-4fd9-81f9-149077959fe6.png</url>
      <title>DEV Community: Kajal</title>
      <link>https://dev.to/techykajal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/techykajal"/>
    <language>en</language>
    <item>
      <title>8 ML/AI projects to make your portfolio stand out.🔥</title>
      <dc:creator>Kajal</dc:creator>
      <pubDate>Tue, 04 Aug 2020 12:34:39 +0000</pubDate>
      <link>https://dev.to/techykajal/8-ml-ai-projects-to-make-your-portfolio-stand-out-33n</link>
      <guid>https://dev.to/techykajal/8-ml-ai-projects-to-make-your-portfolio-stand-out-33n</guid>
      <description>&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sentiment analysis&lt;/strong&gt; for depression based on social media post⠀
⠀&lt;/li&gt;
&lt;li&gt;Sports match video to text summarization using &lt;strong&gt;neural network&lt;/strong&gt;
⠀&lt;/li&gt;
&lt;li&gt;Handwritten equation solver using &lt;strong&gt;CNN&lt;/strong&gt;⠀
⠀&lt;/li&gt;
&lt;li&gt;Business meeting summary generation using &lt;strong&gt;NLP&lt;/strong&gt;
⠀&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Facial recognition&lt;/strong&gt; to detect mood and suggest songs accordingly⠀
⠀&lt;/li&gt;
&lt;li&gt;Finding out habitable exo-planet from &lt;strong&gt;images captured&lt;/strong&gt; by space vehicle like Kepler⠀
⠀&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image regeneration&lt;/strong&gt; for old damaged reel picture⠀
⠀&lt;/li&gt;
&lt;li&gt;Music generation using &lt;strong&gt;deep learning&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;If you want to explore above listed ideas you can go through this &lt;a href="https://towardsdatascience.com/8-ml-ai-projects-to-make-your-portfolio-stand-out-bfc5be94e063"&gt;blog&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Scraping websites using Octoparse </title>
      <dc:creator>Kajal</dc:creator>
      <pubDate>Tue, 28 Jul 2020 16:34:59 +0000</pubDate>
      <link>https://dev.to/techykajal/scraping-websites-using-software-1n29</link>
      <guid>https://dev.to/techykajal/scraping-websites-using-software-1n29</guid>
      <description>&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Web Scraping Series : Using Python and Software&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. &lt;a href="https://dev.to/techykajal/scraping-1000-s-of-news-articles-using-10-simple-steps-bb6"&gt;Scraping web pages without using Software : Python&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. &lt;a href="https://dev.to/techykajal/scraping-websites-using-software-1n29"&gt;Scraping web Pages using Software : Octoparse&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;INTRODUCTION&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;WHY THIS ARTICLE?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This article is a second of series of Web-scraping series....&lt;br&gt;
AS I mentioned before in my first article, that I choose to write article about scraping because during building my project &lt;strong&gt;Fake-News Detection System&lt;/strong&gt;, It took me days to research for it accordingly, As I wasn't able to find dataset according to my need.&lt;/p&gt;

&lt;p&gt;So, If you didn't go through my &lt;a href="https://dev.to/techykajal/scraping-1000-s-of-news-articles-using-10-simple-steps-bb6"&gt;first article&lt;/a&gt;, I would strongly recommend to go through that once and If you have a programming background, then you must read first article of this series.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;WHOM THIS ARTICLE IS USEFUL FOR?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Since, for users having programming background, I have already written a blog and who is having knowledge about python in specific,I would suggest to do scraping using that instead of any software because I find it easy to do it using python as compare to spend days on understanding interface of any particular software.&lt;/p&gt;

&lt;p&gt;But the people out there, who don't have any programming background in particular, you can follow along with me and get familiar with the interface &amp;amp; working of this software.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;OVERVIEW&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This article covers the second part of the series, Scraping web-pages using software : &lt;strong&gt;Octoparse&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, there are many softwares that you can found easily on internet for automating the purpose like &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ParseHub&lt;/strong&gt;, &lt;strong&gt;ScarpeSimple&lt;/strong&gt;, &lt;strong&gt;Diffbot&lt;/strong&gt;, &lt;strong&gt;Mozenda&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Brief Introduction to diff automate softwares:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1.ParseHub:&lt;/strong&gt; &lt;br&gt;
Website: &lt;a href="https://www.parsehub.com/" rel="noopener noreferrer"&gt;https://www.parsehub.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Parsehub is a phenomenal tool for building web scrapers without coding to extract tremendous data. It is used by data scientists, data journalists, data analysts, E-commerce websites,job boards, marketing &amp;amp; sales, finance &amp;amp; many more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features:&lt;/strong&gt; It's interface is dead simple to use, you can build web scrapers simply by clicking on data that you want.It then exports the data in JSON or Excel format. It has many handy features such as automatic IP rotation, allowing scraping behind login walls, going through dropdowns and tabs, getting data from tables and maps, and much much more. In addition, it has a generous free tier, allowing users to scrape up to 200 pages of data in just 40 minutes! Parsehub is also nice in that it provides desktop clients for Windows, Mac OS, and Linux, so you can use them from your computer no matter what system you’re running.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.ScrapeSimple:&lt;/strong&gt;&lt;br&gt;
Website: &lt;a href="https://www.scrapesimple.com" rel="noopener noreferrer"&gt;https://www.scrapesimple.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; ScrapeSimple is the perfect service for people who want a custom scraper built for them. Web scraping is made as simple as filling out a form with instructions for what kind of data you want.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features:&lt;/strong&gt; ScrapeSimple lives up to its name with a fully managed service that builds and maintains custom web scrapers for customers. Just tell them what information you need from which sites, and they will design a custom web scraper to deliver the information to you periodically (could be daily, weekly, monthly, or whatever) in CSV format directly to your inbox. This service is perfect for businesses that just want a html scraper without needing to write any code themselves. Response times are quick and the service is incredibly friendly and helpful, making this service perfect for people who just want the full data extraction process taken care of for them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.Diffbot:&lt;/strong&gt;&lt;br&gt;
Website: &lt;a href="https://www.diffbot.com" rel="noopener noreferrer"&gt;https://www.diffbot.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Enterprises who who have specific data crawling and screen scraping needs, particularly those who scrape websites that often change their HTML structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features:&lt;/strong&gt; Diffbot is different from most page scraping tools out there in that it uses computer vision (instead of html parsing) to identify relevant information on a page. This means that even if the HTML structure of a page changes, your web scrapers will not break as long as the page looks the same visually. This is an incredible feature for long running mission critical web scraping jobs. While they may be a bit pricey (the cheapest plan is $299/month), they do a great job offering a premium service that may make it worth it for large customers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4.Mozenda:&lt;/strong&gt;&lt;br&gt;
Website: &lt;a href="https://www.mozenda.com/" rel="noopener noreferrer"&gt;https://www.mozenda.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Enterprises looking for a cloud based self serve webpage scraping platform need look no further. With over 7 billion pages scraped, Mozenda has experience in serving enterprise customers from all around the world.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features:&lt;/strong&gt; Mozenda allows enterprise customers to run web scrapers on their robust cloud platform. They set themselves apart with the customer service (providing both phone and email support to all paying customers). Its platform is highly scalable and will allow for on premise hosting as well. Like Diffbot, they are a bit pricy, and their lowest plans start at $250/month.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Although I am going to talk about &lt;strong&gt;Octoparse&lt;/strong&gt; in detail in this article, since I have used that only.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;OCTOPARSE&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Website: &lt;a href="https://www.octoparse.com/" rel="noopener noreferrer"&gt;https://www.octoparse.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Octoparse is a fantastic tool for people who want to extract data from websites without having to code, while still having control over the full process with their easy to use user interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features:&lt;/strong&gt; Octoparse is the perfect tool for people who want to scrape websites without learning to code. It features a point and click screen scraper, allowing users to scrape behind login forms, fill in forms, input search terms, scroll through infinite scroll, render javascript, and more. It also includes a site parser and a hosted solution for users who want to run their scrapers in the cloud. Best of all, it comes with a generous free tier allowing users to build up to 10 crawlers for free. For enterprise level customers, they also offer fully customized crawlers and managed solutions where they take care of running everything for you and just deliver the data to you directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step by Step explanation to extract data from 1000's of news articles&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step-1: Download Octoparse&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to website : &lt;a href="https://www.octoparse.com/download" rel="noopener noreferrer"&gt;https://www.octoparse.com/download&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;and follow along the guidelines of community.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step-2: Sign-up&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After completing with downloading &amp;amp; installing, sign up for account if you haven't created before.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step-3: Explore it&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before start on your own, I would strongly recommend you to please explore different sections of it that will be ultimately going to help you in interacting with this interface while working on it later on.

&lt;ul&gt;
&lt;li&gt;Go through the popular template section, there are some popular templates of popular websites and you might find your required data there.&lt;/li&gt;
&lt;li&gt;Go through the tutorials on both template mode &amp;amp; advanced mode &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step-4: Enter URL&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If you want to scrape data from just one website, you can simply paste your copied url at home page and click start.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7kff58yiln724rx8lzcz.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7kff58yiln724rx8lzcz.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;But If you want to scrape data from more than one website. then, go to NEW tab &amp;amp; then, click ADVANCED option.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzme7zmn3ilm02xwxlho9.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzme7zmn3ilm02xwxlho9.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You will see a new window like this, in which you can easily organised your work with more advanced options and track your directories too.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Flmocbeyv1zrbt59z5w6t.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Flmocbeyv1zrbt59z5w6t.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;So, you can upload upto 10k different URLs here but the condition is layout of these different URLs should be same to be able to extract data all-together, otherwise it will automate the process but give you separate result for different website's layout. So you will not get all data merge together.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step-5: Specifying scraping details &amp;amp; attributes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Click save and you will see a window like this:&lt;br&gt;
Where, left section is for maintaining workflow, center will display web-page of first url that you have entered and below section will show data preview.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2w2xl43cbw8s1h5yuxeg.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2w2xl43cbw8s1h5yuxeg.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In there, you have two options to move further:
&lt;strong&gt;Auto-detect web-page data&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You would either choose to "Auto-detect web page data"  that  will scrape imp features according to it's understanding and return you with five different results.In which you can choose to skip what is of no use to you or keep it if it scraped all attributes as per your wish.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fh56zk6he0kcy46vtpwzd.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fh56zk6he0kcy46vtpwzd.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the below dialog box, you can choose to do editing as per your need.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffb6u0k7ic9wqfc68t4yd.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffb6u0k7ic9wqfc68t4yd.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, In above image you are provided with 3 options that you can enable or disable as per your need.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;First: &lt;strong&gt;Scroll down the page to load more data.&lt;/strong&gt; So if a web-page is not divided into different pages, It might be possible that all of the data is on the same page &amp;amp; for extracting all data all you need is to enable "load more data" option.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Second: &lt;strong&gt;Click the next button to capture multiple pages&lt;/strong&gt;&lt;br&gt;
Enabling it will do pagination upto pages that you select as a "&lt;strong&gt;next button&lt;/strong&gt;".&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;So it will allow you to &lt;strong&gt;check&lt;/strong&gt; or &lt;strong&gt;edit&lt;/strong&gt; next button. On clicking on &lt;strong&gt;check&lt;/strong&gt;, you will see &lt;strong&gt;next button&lt;/strong&gt; highlighted on web-page section that is detected automatically. &lt;/li&gt;
&lt;li&gt;And If it is not detected correctly automatically, click on &lt;strong&gt;edit&lt;/strong&gt; &amp;gt; now on click on anything on web-page screen that you wanted to detect as  a "&lt;strong&gt;next&lt;/strong&gt;" button. So, for instance, suppose there is no "&lt;strong&gt;next&lt;/strong&gt;" button instead it might be "&lt;strong&gt;&amp;gt;&lt;/strong&gt;" working as "&lt;strong&gt;next button&lt;/strong&gt;" or if you don't want to automate the process of scraping till the last page of web-page, you can choose to paginate upto a specific page like: "&lt;strong&gt;1&lt;/strong&gt;", "&lt;strong&gt;2&lt;/strong&gt;","&lt;strong&gt;3&lt;/strong&gt;" &amp;amp; so on...&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Third: &lt;strong&gt;Click the state_url to capture data on the page that follows&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It will allow you to capture content or text of a page that follows and make another attribute containing text(means content of the page opens up on clicking on particular url). &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2jx91af6kdk2exz875ji.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2jx91af6kdk2exz875ji.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;There is one more option in TIPS, "&lt;strong&gt;Switch auto-detect results(1/5)&lt;/strong&gt;", So on clicking this link, you will able to see 5 different sets of auto detected datasets.You can keep according to your need.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After being done with the editing click on "&lt;strong&gt;save settings&lt;/strong&gt;"&lt;/li&gt;
&lt;li&gt;You can see scraped results in "&lt;strong&gt;Data Preview&lt;/strong&gt;" and edit "Attribute names" too.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjigbp41kz0si3pyxkx09.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjigbp41kz0si3pyxkx09.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you will see changes in your workflow as follows: &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1zfpv3ehoe8xs87cwo0m.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1zfpv3ehoe8xs87cwo0m.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edit task workflow manually&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Or you can choose to edit workflow manually as per your need &amp;amp; select specific element from web-page to be shown in your dataset as attributes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fm4cx9kk5l6oonbe8jktn.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fm4cx9kk5l6oonbe8jktn.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In there, on pointing on down-arrows, you will find a "+" sign to add elements as per your need.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0f6sz2ugfda5mdutzjvj.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0f6sz2ugfda5mdutzjvj.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It will help you to be specific &amp;amp; organized with your work.
As in previous case of auto-detecting data, there  are too many irrelevant attributes that it scraped automatically. So to be specific with what you need , I would suggest  to opt for second option. &lt;/li&gt;
&lt;li&gt;&lt;p&gt;It also allow you to rename, delete or edit any specific element or you can choose to change settings as per need.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I will demonstrate you with an example of extracting 1000's of articles having 6 attributes: &lt;strong&gt;News Headline&lt;/strong&gt;, &lt;strong&gt;Link&lt;/strong&gt;, &lt;strong&gt;Source&lt;/strong&gt;, &lt;strong&gt;Stated on&lt;/strong&gt;, &lt;strong&gt;Date&lt;/strong&gt;, &lt;strong&gt;Image_url&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Extract data manually&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To extract all of them, go to web-page section &amp;gt; select specific details of very first article like: "News Headline", "Link of news", "Source of news", "Stated On", "Date" &amp;gt; by simply clicking on these items on very first article &amp;gt; the portion will be highlighted as shown in below window: &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F3goapijwpdv6t62d0ohb.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F3goapijwpdv6t62d0ohb.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data extraction for all news-articles on first page of listed url&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;then, choosing "&lt;strong&gt;Select all&lt;/strong&gt;" option will select same specific details of each article upto the last of web-page. you will see captured 30 lines in data-preview as below:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fr678md3sloe6h8h151he.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fr678md3sloe6h8h151he.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Now, clicking on "&lt;strong&gt;Extract data&lt;/strong&gt;" option will extract all details of all articles listed on 1st page of entered url.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can now see changes in the workflow:&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fcv18c8ru8apg8ud5tpcj.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fcv18c8ru8apg8ud5tpcj.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a dialog-box will open to ask for extract more elements:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjekmttyzcv3lamtqlb0j.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjekmttyzcv3lamtqlb0j.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Since, we haven't scraped image_url, we will select it separately and same  procedure as listed above:

&lt;ul&gt;
&lt;li&gt;select image on web-page section:&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0gugpzsgfeltw4sm8df6.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0gugpzsgfeltw4sm8df6.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A dialog-box will come with diff options &amp;gt; Select "&lt;strong&gt;Select All&lt;/strong&gt;" option.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7kt4ft51fzry8otkzgru.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F7kt4ft51fzry8otkzgru.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Another pop-up window will come asking for diff options to choose &amp;gt; select "&lt;strong&gt;Extract image urls&lt;/strong&gt;"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjsnlufv5louqlzgs7dqz.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjsnlufv5louqlzgs7dqz.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;and you are all done with scraping image urls as well for all news article of first page &amp;amp; It will add a new attribute to your data-preview. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This is how your data-preview will look like after editing attribute names:&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgdy1hl6hu5blwbhpi8r0.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgdy1hl6hu5blwbhpi8r0.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Again a pop-up window will come asking to extract more element, since we wan't to extract data from more than one page, we will do pagination.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pagination&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Now, if you are needed with a huge data, you can loop over till the specific page or to the last page of listed url.

&lt;ul&gt;
&lt;li&gt;To do pagination, all you need is to search for the keyword that is specifying next page for particular web-page like: "&lt;strong&gt;next&lt;/strong&gt;", "&lt;strong&gt;&amp;gt;&lt;/strong&gt;", or anything. &amp;gt; click on that keyword &amp;gt; In my case, It is "&lt;strong&gt;next&lt;/strong&gt;" button itself &amp;gt; click on it &amp;gt; It will highlighted &amp;amp; a new window will pop-up.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6kj2ejuuvk72elmdjnes.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6kj2ejuuvk72elmdjnes.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;select "&lt;strong&gt;Loop click next page&lt;/strong&gt;"&lt;/li&gt;
&lt;li&gt; After you are done, your workflow will now look something like auto one's workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5oxy7bg71nim7871s3g2.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5oxy7bg71nim7871s3g2.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When you are done with all editing and have organized data, click save &amp;amp; run.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fb06c7acu2xx7f0mh0g4o.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fb06c7acu2xx7f0mh0g4o.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step-6: Exporting data to your machine&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clicking on &lt;strong&gt;save&lt;/strong&gt; &amp;amp; &lt;strong&gt;Run&lt;/strong&gt; option will open up with new window listing 3 options to choose  how to run it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fcg3svfdpitwc9co0aa6f.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fcg3svfdpitwc9co0aa6f.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If you are a premium user, only then you can access last two options which is provided with features like scraping any website on daily basis, weekly basis, monthly basis &amp;amp; many more other features. Their server will take care of your data and sending it to you after organizing it on the basis whatever you choose to do.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you are not a free user, select first option, "&lt;strong&gt;Run task on your device&lt;/strong&gt;"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It will start extracting all of data, &amp;amp; also you need to pay a little attention while it is extracting data although it is done by automating, because if data limit exceeds 10k, It will stop &amp;amp; you will have to sit for another hour waiting to extract it from scratch. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You are also needed to wake your system up because if your screen sleeps while extracting data , It might possible that it will stop extracting data at that point &amp;amp; again you will have to start it again to extract more or as much as you can.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose "&lt;strong&gt;export data&lt;/strong&gt;": &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fb06c7acu2xx7f0mh0g4o.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fb06c7acu2xx7f0mh0g4o.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;choose format to save your file:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwm07m77lzhzap6z89gjd.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwm07m77lzhzap6z89gjd.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;So I choose  to extract my file as .csv file &amp;amp; save it to my desktop &amp;amp; don't close this window, we will use it to extract data as .xlsx format again.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxmvxxo4sx5dhh1j42aey.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxmvxxo4sx5dhh1j42aey.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Now, let's have a look at data that we have extracted.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq3x6nrqwvq7zrtoyfvnx.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq3x6nrqwvq7zrtoyfvnx.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Oh-oh! It's in suspicious form that is not readable &amp;amp; organized  at all&lt;/p&gt;

&lt;p&gt;Let's go to the export window again: export data &amp;gt; choose .xlsx format this time &amp;gt; click ok.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Now, let's have a look at exported data in .xlsx format.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fprypz1373m1uhwiksk0l.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fprypz1373m1uhwiksk0l.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Voila! Now, It does make sense, all ambiguity is now removed from our data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step-7: Formatting Excel file using formulas&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inspecting Dataset&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;first thing you will do is inspect your Dataset, on inspecting my Dataset, I found some irrelevant things that I couldn't able to edit at time of scraping.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqdmpah69f7b596dpnsd0.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqdmpah69f7b596dpnsd0.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So we will do some formatting task on excel file.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I.&lt;/strong&gt; Look at my "&lt;strong&gt;Image url&lt;/strong&gt;" attribute, So I extract image url, to extract label from it, Since Label value is written within image url, I didn't find any better option to extract it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2x7f4p44st901iniv25v.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2x7f4p44st901iniv25v.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;So if you notice "&lt;strong&gt;Image url&lt;/strong&gt;" attribute closely, there is a small string "&lt;strong&gt;.jpg&lt;/strong&gt;" &amp;amp; a bigger string "&lt;strong&gt;&lt;a href="https://static.politifact.com/img/meter-" rel="noopener noreferrer"&gt;https://static.politifact.com/img/meter-&lt;/a&gt;&lt;/strong&gt;" which is common for all rows of "&lt;strong&gt;Image url&lt;/strong&gt;" attribute.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;So, We will replace both the strings with "" to get my label values.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;press ctrl+H &amp;gt; fill field "&lt;strong&gt;find what&lt;/strong&gt;" with ".jpg" &amp;amp; then, fill field "&lt;strong&gt;replace with&lt;/strong&gt;" with ""(means you don't have to specify anything there) &amp;gt; click "&lt;strong&gt;Replace All&lt;/strong&gt;" &amp;gt; press "&lt;strong&gt;OK&lt;/strong&gt;" &amp;amp; you are done with your labels.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;1.&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkxq1kutapef5d6txh8e3.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkxq1kutapef5d6txh8e3.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5ob3qu9z1oo3jq4r81x0.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5ob3qu9z1oo3jq4r81x0.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw0thk5wqn3bqd8icblgx.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw0thk5wqn3bqd8icblgx.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is how your attribute will look like after formatting,&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1mp29hbrhg0eavuvusf6.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1mp29hbrhg0eavuvusf6.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;As you can see there are two more problems with it, first is value in first cell is a hyperlink . second is the extra spaces.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;To "remove hyperlinks" from a particular cell &amp;gt; right click on it &amp;gt; select "&lt;strong&gt;remove hyperlinks&lt;/strong&gt;" from a drop down. And to remove from entire column &amp;gt; select entire column &amp;gt; right click on it &amp;gt; select "&lt;strong&gt;remove hyperlinks&lt;/strong&gt;" from a drop down.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To remove "extra spaces" from a particular attribute &amp;gt; go to any empty cell &amp;gt; write formula : &lt;strong&gt;=TRIM(address of the first cell of attribute)&lt;/strong&gt; &amp;gt; press enter &amp;gt; you will see formatted value of first cell &amp;gt; to apply changes to all cell &amp;gt; drag &amp;amp; drop first cell upto the last cell of particular attribute. &amp;gt; you will see all values inserted in a format applied &amp;gt; now replace new column with old column &amp;gt; by selecting new column entirely &amp;gt; copy it &amp;gt; then, select old column entirely where you need to paste it &amp;gt; go to paste options &amp;gt; select paste values(V) option from drop-down. &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq8gvrdwlcllrrk38fx49.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq8gvrdwlcllrrk38fx49.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;fantastic! you are all done with the "&lt;strong&gt;Label&lt;/strong&gt;" column.&lt;br&gt;
Have a look now:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgpnr4czo12qie8mndlmq.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgpnr4czo12qie8mndlmq.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;II.&lt;/strong&gt; Look at my other attribute "&lt;strong&gt;Stated On&lt;/strong&gt;", from which data of my concern is only date.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fcye9jf39937d01c2frp6.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fcye9jf39937d01c2frp6.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To delete text other than that, we will do it in two steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;As you already have seen that for similar pattern of strings repeating throughout the column, we can use previous pattern of finding it &amp;amp; replacing it with nothing. So, for sub-string "stated on" we will replace it with nothing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how our column will look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqntokxf6vwml1c1r9r2b.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fqntokxf6vwml1c1r9r2b.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On noticing above image, you will find the other sub-string is different for all rows of this column, then what to do with that, Since we only wanted date, we will extract it this time:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's see : In a new empty cell &amp;gt; type command "&lt;strong&gt;=MID(address of first cell of attribute stated on, starting index of string that you want to extract, number of characters upto what you want to extract&lt;/strong&gt;" &amp;gt; press enter &amp;gt; you will see first formatted value &amp;gt; then, repeat the steps above to change all values &amp;amp; replace it with new values.&lt;/p&gt;

&lt;p&gt;Format of writing formula:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fo5f1yutinfdqaj9pbghg.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fo5f1yutinfdqaj9pbghg.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F94dgd7ej9s2q91o8y6bi.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F94dgd7ej9s2q91o8y6bi.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On pressing enter:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fu30mt86acsfcxsrt8ise.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fu30mt86acsfcxsrt8ise.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have a look at new "&lt;strong&gt;Stated on&lt;/strong&gt;" attribute:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmpdkq1kfjjk8vwau8gko.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmpdkq1kfjjk8vwau8gko.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;III.&lt;/strong&gt; In "&lt;strong&gt;Date&lt;/strong&gt;" attribute, we don't want text other than date, also we can't use "&lt;strong&gt;mid&lt;/strong&gt;" formula here, as the date is specified at suffix of a string &amp;amp; there is no clarity about starting point, as it is changing for all diff cells.&lt;/p&gt;

&lt;p&gt;So, we will going to achieve this task by using "&lt;strong&gt;RIGHT&lt;/strong&gt;":&lt;/p&gt;

&lt;p&gt;Go to new empty cell &amp;gt; type command "&lt;strong&gt;RIGHT(E2,LEN(E20-FIND("•",E2))&lt;/strong&gt;" &amp;gt; press enter &amp;gt; and do steps same as above to replace new column inserted values with that of old values.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;write formula:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fixu3mxq0gx3qbw41fc5y.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fixu3mxq0gx3qbw41fc5y.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;press enter:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkbtwofo433tarcgzx7ft.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkbtwofo433tarcgzx7ft.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;New column will look like:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffz3pleb0ugcajzo0trg4.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffz3pleb0ugcajzo0trg4.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our Final Dataset after all formatting:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Flpo51lr0ecvnlt6czyjc.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Flpo51lr0ecvnlt6czyjc.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, This data is now all cleaned &amp;amp; ready to use, I hope you guys will find this article informative &amp;amp; useful for you. Do share your thoughts about it in comment box &amp;amp; do let me know, if you might have any queries. ✌️&lt;/p&gt;

&lt;p&gt;You can reach me via following :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Subscribe to my &lt;a href="https://www.youtube.com/channel/UCdwAaZMWiRmvIBIT96ApVjw" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt; for video contents coming soon &lt;a href="https://www.youtube.com/channel/UCdwAaZMWiRmvIBIT96ApVjw" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Connect and reach me on &lt;a href="http://www.linkedin.com/in/techykajal" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>beginners</category>
      <category>datascience</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Scraping 1000's of News Articles using 10 simple steps</title>
      <dc:creator>Kajal</dc:creator>
      <pubDate>Tue, 28 Jul 2020 16:29:36 +0000</pubDate>
      <link>https://dev.to/techykajal/scraping-1000-s-of-news-articles-using-10-simple-steps-bb6</link>
      <guid>https://dev.to/techykajal/scraping-1000-s-of-news-articles-using-10-simple-steps-bb6</guid>
      <description>&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Web Scraping Series: Using Python and Software&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. &lt;a href="https://dev.to/techykajal/scraping-1000-s-of-news-articles-using-10-simple-steps-bb6"&gt;Scraping web pages without using Software: Python&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. &lt;a href="https://dev.to/techykajal/scraping-websites-using-software-1n29"&gt;Scraping web Pages using Software: Octoparse&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Table Of Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;    1.1 Introduction
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;          1.1.1 Why This article?
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;          1.1.2 WHOM THIS ARTICLE IS USEFUL FOR ?
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;    1.2 Overview
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;          1.2.1A brief introduction to webpage design and HTML
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;          1.2.2Web-scraping using BeautifulSoup in PYTHON
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-1:Installing Packages
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-2:Importing Libraries
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-3:Making Simple requests
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-4:Inspecting the Response Object
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-5:Delaying request time
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-6:Extracting Content from HTML
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-7:Finding elements and attributes
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-8:Making Dataset
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-9:Visualising Dataset
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;             Step-10:Making csv file &amp;amp; saving it to your machine
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;    1.3 Suggestion &amp;amp; conclusion
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;          1.3.1Full Code
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;INTRODUCTION&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;WHY THIS ARTICLE?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Aim of this article is to scrape news articles from different websites using Python. Generally, web scraping involves accessing numerous websites and collecting data from them. However, we can limit ourselves to collect a large amounts of information from a single source and use it as a dataset.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Web Scraping is a technique employed to extract large amounts of data from websites whereby the data is extracted and saved to a local file in your computer or to a database in table (spreadsheet) format. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, I get motivated to do web scraping while working on my Machine-Learning project on &lt;strong&gt;Fake News Detection System&lt;/strong&gt;. Whenever we begin a machine learning project, the first thing that we need is a dataset. While there are many datasets that you can find online with varied information, sometimes you wish to extract data on your own and begin your own investigation. I was needed with a dataset that I couldn't able to find anywhere according to my need.&lt;br&gt;
&lt;a href="https://i.giphy.com/media/3Q2hJ4FLN1UvS/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/3Q2hJ4FLN1UvS/giphy.gif" alt="alt text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So this motivated me to make my own Dataset for my project accordingly. And that's how I did my project from the scratch. My Project was basically based on classifying different news articles into two main categories &lt;strong&gt;FAKE&lt;/strong&gt; &amp;amp; &lt;strong&gt;REAL&lt;/strong&gt;. &lt;/p&gt;
&lt;h4&gt;
  
  
  FAKE-NEWS DATASET
&lt;/h4&gt;

&lt;p&gt;For this project, The first task was to get a dataset which is already labeled with "&lt;em&gt;FAKE&lt;/em&gt;", so this can be achieved by scraping data from some verified &amp;amp; certified news websites, on which we can rely on for fact of news articles and it is really a very difficult task to get genuine "&lt;em&gt;FAKE NEWS&lt;/em&gt;". &lt;br&gt;
&lt;strong&gt;&lt;em&gt;I go through these news websites to get my FAKE-NEWS Dataset&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.boomlive.in/fake-news" rel="noopener noreferrer"&gt;Boom Live&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.snopes.com/fact-check/" rel="noopener noreferrer"&gt;Snopes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.politifact.com/truth-o-meter/" rel="noopener noreferrer"&gt;Politifact&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.allsides.com/unbiased-balanced-news" rel="noopener noreferrer"&gt;AllSides&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;But honestly speaking, I end up scraping data from one website i.e., Politifact.&lt;/strong&gt;&lt;br&gt;
And there is a strong reason to do so, As you go through the listed links up there, you will conclude that we needed a dataset with already labeled category i.e., "&lt;strong&gt;FAKE&lt;/strong&gt;" but also we don't want our news articles to be in a modified form as such. We want to extract a raw news article without any keywords specifying whether the given news article in a dataset is "FAKE" or not.&lt;br&gt;
So for example, If you go through the link  &lt;a href="https://www.boomlive.in/fake-news" rel="noopener noreferrer"&gt;"BoomLive.in"&lt;/a&gt;, you will find that the news articles specifying "FAKE" are not in its actual form and altered on basis of some analysis of the fact-checking team. So this altered text on model training in ML will give us a biased result every time and the model that we made using this kind of dataset will result into a dumb one which can only predict news articles having keywords like "FAKE", "DID?", "IS?" in it and will not be going to perform well on a new testing set of data.&lt;br&gt;
That's why we use &lt;strong&gt;Politifact&lt;/strong&gt; to scrape our "FAKE-NEWS DATASET".&lt;br&gt;
Even though there are challenges too in a labelling news article but we will be going to cover up that in a further section.&lt;/p&gt;
&lt;h4&gt;
  
  
  REAL-NEWS DATASET
&lt;/h4&gt;

&lt;p&gt;The second task was to create a "&lt;strong&gt;REAL-NEWS&lt;/strong&gt;" dataset, So that was easy if you are scrapping news-articles from trusted or verified news websites like "TOI", "IndiaToday", "TheHindu" &amp;amp; so many...So we can trust these websites that they are listing the factual/actual data and even if not, then we are assuming the same to be true and will train our model accordingly.&lt;br&gt;
But for my project, I scrape data for &lt;strong&gt;real&lt;/strong&gt; and &lt;strong&gt;fake&lt;/strong&gt; from one website only (i.e., Politifact.com), since I am getting what I needed from it and also it is advisable when we are scraping data using python to use one website at a time. Although you can scrape multiple pages of that particular website altogether in one module by just running an outer for loop.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;WHOM THIS ARTICLE IS USEFUL FOR?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Whoever is working on some project where you need to scrape data in thousands, this article is definitely for you 😃.It doesn't matter if you are from a programming background or not, because there are many times when people other than programmers from different backgrounds needed data as per their project, survey, or whatsoever purpose. But non-programmers find it difficult to understand any programming language, So I will make scrapping easy for them too by introducing some software from which they can scrape any kind of data in a huge amount easily. Although, Scraping using python is not that difficult if you follow along with me while reading this blog 😎, the only thing that you need to focus on is the HTML source code of a webpage. Once, you able to understand how webpages are written in HTML and able to identify attributes and elements of your interest, you can scrape any website 😋. For non-programmers, if you want to do web-scraping using python, just focus on HTML code mainly, python syntax is not that difficult to understand, It's just all libraries, some functions, and keywords that you needed to remember and understand.  So I tried to explain every step with transparency, I hope at the end of this series, you will be able to scrape different types of the layout of webpages.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;OVERVIEW&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This post covers the first part: News articles web scraping using &lt;strong&gt;PYTHON&lt;/strong&gt;. We’ll create a script that scrapes the latest news articles from different newspapers and stores the text, which will be fed into the model afterward to get a prediction of its category.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;A brief introduction to webpage design and HTML:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If we want to be able to extract news articles (or, in fact, any other kind of text) from a website, the first step is to know how a website works. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;We will follow an example to understand this:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When we insert an URL into the web browser (i.e. Google Chrome, Firefox, etc…) and access to it, what we see is the combination of three technologies:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTML (HyperText Markup Language):&lt;/strong&gt; it is the standard language for adding content to a website. It allows us to insert text, images, and other things to our site. In one word, HTML defines the content of every webpage on the internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CSS (Cascading Style Sheets):&lt;/strong&gt; this language allows us to set the visual design of a website. This means it determines the style/presentation of a webpage including colors, layouts, and fonts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JavaScript:&lt;/strong&gt; JavaScript is a dynamic computer programming language. It allows us to make the content and the style interactive &amp;amp; provides a dynamic interface between client-side script and user.&lt;/p&gt;

&lt;p&gt;Note that these three are programming languages. They will allow us to create and manipulate every aspect of the design of a webpage.&lt;/p&gt;

&lt;p&gt;Let’s illustrate these concepts with an example. When we visit the Politifact page, we see the following:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjlkko3j8dk1v5j93ryf4.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjlkko3j8dk1v5j93ryf4.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we disabled &lt;strong&gt;JavaScript&lt;/strong&gt;, we would not be able to use this pop-up anymore, as you can see, we are not able to see a video pop up window now:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5ioba6jbhjx0g7cd2m2o.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5ioba6jbhjx0g7cd2m2o.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And If we deleted the &lt;strong&gt;CSS&lt;/strong&gt; content from the webpage, we would see something like this:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fewvnkv4my1hcsyuimw86.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fewvnkv4my1hcsyuimw86.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, At this point, I will be going to ask you a question.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"&lt;strong&gt;&lt;em&gt;If you want to extract the content of a webpage via web-scraping, where do you need to look up?&lt;/em&gt;&lt;/strong&gt;"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, At this point, I hope you guys are clear about what kind of source code do we need to scrape.😎 Yeah, you are absolutely right, If you are thinking about &lt;strong&gt;HTML&lt;/strong&gt; 😉&lt;/p&gt;

&lt;p&gt;So, the last step before performing web scraping methods is to understand the bit of the HTML language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTML&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HTML&lt;/strong&gt; language is a "&lt;em&gt;hypertext markup language&lt;/em&gt;" that defines the content of a webpage and constitute of elements and attributes, for scraping data, you should be familiar with inspecting those elements.&lt;br&gt;
-An element could be a heading, paragraph, division, anchor tag &amp;amp; so many...&lt;br&gt;
-An attribute could be that the heading is in bold letters.&lt;br&gt;
These tags are represented with a opening symbol &lt;code&gt;&amp;lt;tag&amp;gt;&lt;/code&gt; and closing symbol&lt;code&gt;&amp;lt;/tag&amp;gt;&lt;/code&gt;&lt;br&gt;
e.g.,&lt;br&gt;
&lt;br&gt;
&lt;code&gt;&amp;lt;p&amp;gt;This is paragraph.&amp;lt;/p&amp;gt; &lt;br&gt;
     &amp;lt;h1&amp;gt;&amp;lt;b&amp;gt;This is heading one in bold letters&amp;lt;/b&amp;gt;&amp;lt;/h1&amp;gt;&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Web-scraping using BeautifulSoup in PYTHON&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Enough talk, show me the code.&lt;br&gt;
&lt;a href="https://i.giphy.com/media/LmNwrBhejkK9EFP504/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/LmNwrBhejkK9EFP504/giphy.gif" title="Show me code" alt="alt text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Step-1 : Installing Packages&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;We will first begin with installing necessary packages:&lt;br&gt;
&lt;code&gt;beautifulsoup4&lt;/code&gt;&lt;br&gt;
To install it, Please type the following code into your python distribution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt; &lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;beautifulsoup4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;BeautifulSoup under bs4 package is a library used to parse HTML &amp;amp; XML docs into python in a very easy &amp;amp; convenient way and access its elements by identifying them with their tags and attributes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is very easy to use yet very powerful package to extract any kind of data from the internet in just 5-6 lines.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;requests&lt;/code&gt;&lt;br&gt;
To install it, use the following command in your IDE or command shell.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt; &lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;So as to provide BeautifulSoup with the HTML code of any page, we will need with the requests module.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;urllib&lt;/code&gt;&lt;br&gt;
To install it, use the following command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt; &lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;urllib&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;urllib module is the URL handling module for python. It is used to fetch URLs(Uniform Resource Locator)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Although, here we are using this module for a different purpose, to call libraries like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;time(using which we can call sleep() function to delay or 
suspends execution for the given number of seconds.&lt;/li&gt;
&lt;li&gt;sys(It is used here to get exception info like type of error, error object, info about the error.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Step-2 : Importing Libraries&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Now we will import all the required libraries:&lt;br&gt;
&lt;code&gt;BeautifulSoup&lt;/code&gt;&lt;br&gt;
To import it, use the following command onto your IDE&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;This library helps us with getting HTML structure of any page that we want to work with and provides functions to access specific elements and extract relevant info.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;urllib&lt;/code&gt;&lt;br&gt;
To import it, type following command&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urllib.request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;urllib.request : It helps in defining functions &amp;amp; classes 
which help in opening URLs&lt;/li&gt;
&lt;li&gt;urllib.sys : It's functions &amp;amp; classes helps us with 
retrieving exception info.&lt;/li&gt;
&lt;li&gt;urllib.time :Python has a module named time which provides 
several useful functions to handle time-related tasks. One 
of the popular functions among them is sleep().&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;requests&lt;/code&gt;&lt;br&gt;
To import it, just type import before this library keyword.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;This module allows us to send the HTTP requests to web-server using python. (HTTP messages consist of requests from client to server and responses from server to client.)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;pandas&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;It is a high-level data-manipulation tool that we needed to visualize our structured scraped data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;will use this library to make DataFrame(Key data structure of this library). DataFrames allow us to store and manipulate tabular data in rows of observations and columns of variables.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urllib.request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Step-3 : Making Simple requests&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;with the &lt;code&gt;request&lt;/code&gt; module, we can get the HTML content and store into the &lt;code&gt;page&lt;/code&gt; variable.&lt;br&gt;
Make a simple get request(just fetching a page)&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#url of the page that we want to Scarpe
#+str() is used to convert int datatype of the page no. and concatenate that to a URL for pagination purposes.
&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://www.politifact.com/factchecks/list/?page=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;#Use the browser to get the URL. This is a suspicious command that might blow up.
&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;Since, &lt;code&gt;requests.get(url)&lt;/code&gt; is a suspicious command and might throw an exception, we will call it in a try-except block&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
     &lt;span class="c1"&gt;# this might throw an exception if something goes wrong.
&lt;/span&gt;     &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
     &lt;span class="c1"&gt;# this describes what to do if an exception is thrown 
&lt;/span&gt;&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; 
    &lt;span class="c1"&gt;# get the exception information
&lt;/span&gt;    &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_info&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exc_info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;      
    &lt;span class="c1"&gt;#print the link that cause the problem
&lt;/span&gt;    &lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ERROR FOR LINK:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;#print error info and line that threw the exception                          
&lt;/span&gt;    &lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Line:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error_info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tb_lineno&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;continue&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;We will also use an outer for loop for pagination purposes.&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Step-4 : Inspecting the Response Object&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;I. See what response code the server sent back (useful for &lt;br&gt;
  detecting 4XX or 5XX errors.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsqm7kzgwjbpj0gduogds.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsqm7kzgwjbpj0gduogds.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
  The HTTP 200 OK success status response code indicates that the request has succeeded.&lt;/p&gt;

&lt;p&gt;II. Access the full response as text(get the HTML of the page in a big string)&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8kmn6yq5kcxxlvihj13l.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8kmn6yq5kcxxlvihj13l.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It will return HTML content of a response object in Unicode.&lt;br&gt;
  &lt;strong&gt;Alternative:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fii8ehj5vbjwn6g1agqbw.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fii8ehj5vbjwn6g1agqbw.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
  &lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fr6cb2w9a1dmdj8y6tu5o.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fr6cb2w9a1dmdj8y6tu5o.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;whereas, It will return the content of response in bytes.&lt;/p&gt;

&lt;p&gt;III. Look for a specific substring of text within the response.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;     &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Politifact&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
         &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Yes, Scarpe it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;IV. Check the response’s Content Type (see if you got back HTML, &lt;br&gt;
  JSON, XML, etc)&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;  &lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content-type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffi2tec4p3yqghn3iqrck.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ffi2tec4p3yqghn3iqrck.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Step-5 : Delaying request time&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Next with the time module, we can call sleep(2) function with a value of 2 seconds. Here it delayed sending requests to a web-server by 2 seconds.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;The sleep() function suspends execution of the current thread for a given number of seconds.&lt;br&gt;
Next, we need to create &lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Step 6 : Extracting Content from HTML&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Now that you’ve made your HTTP request and gotten some HTML content, it’s time to parse it so that you can extract the values you’re looking for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A)Using Regular Expressions&lt;/strong&gt;&lt;br&gt;
Using Regular Expressions for looking up HTML content is strongly not recommended at all.&lt;/p&gt;

&lt;p&gt;However, regular expressions are still useful for finding specific string patterns like prices, email addresses, or phone numbers.&lt;/p&gt;

&lt;p&gt;Run a regular expression on the response text to look for specific string patterns:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;  &lt;span class="c1"&gt;# put this at the top of the file
&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\$[0-9,.]+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fn15706uy5v5ibqwdvj5h.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fn15706uy5v5ibqwdvj5h.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;B)Using BeautifulSoup's object Soup&lt;/strong&gt;&lt;br&gt;
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Below listed command will Look for all the tags e.g.,&lt;code&gt;&amp;lt;li&amp;gt;&lt;/code&gt; with specific attribute 'o-listicle__item'&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;li&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;o-listicle__item&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;INSPECTING WEBPAGE&lt;/strong&gt;&lt;br&gt;
For being able to understand above code, you need to inspect the webpage &amp;amp; please do follow along:&lt;br&gt;
1)Go to listed URL above&lt;br&gt;
2)press ctrl+shift+I to inspect it.&lt;br&gt;
3)This is how your 'Inspect window' will look like:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmdgzoj5rfsgwv15uzoof.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmdgzoj5rfsgwv15uzoof.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;press ctrl+shift+C to select an element in the page to inspect it or go to the leftmost arrow in header of Inspect window.
4)For getting above  specific element &amp;amp; attribute in inspect window&lt;/li&gt;
&lt;li&gt;First, tries to go to every section of the webpage, &amp;amp; see changes on your inspect window, you will easily grasp the idea behind how webpages are working and which element is what and what particular attribute is contributing to the webpage.&lt;/li&gt;
&lt;li&gt;When done with the above step, now I am assuming that you can understand the working of the above element&lt;code&gt;&amp;lt;li&amp;gt;&lt;/code&gt; and it's attribute.&lt;/li&gt;
&lt;li&gt;Since, I needed with the news section of a particular article, I go to that article section by selecting the inspect element option in the inspect window, It will highlight that article section on the web-page and it's HTML source on Inspect Window. Voila!✨
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fry3zt4iknrgvwz2b1t5t.PNG" alt="Alt Text"&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Did you able to locate the same tag on your machine?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgajd6p6yjkz9dbps1nep.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgajd6p6yjkz9dbps1nep.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If yes, You are all set to understand every bit of HTML tags whatsoever I have used in my code.&lt;/p&gt;

&lt;p&gt;Continuing with my code: 😅&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;This command will help you to inspect how many news articles are there on a given page.&lt;br&gt;
Help you understand accordingly, up to what level you need to paginate your loop for extracting huge data.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4&gt;
  
  
  &lt;strong&gt;Step-7 : Finding elements and attributes&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Look for all anchor tags on the page (useful if you’re building a crawler and need to find the next pages to visit)
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;links&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;It will find a division tag under &lt;code&gt;&amp;lt;li&amp;gt;&lt;/code&gt; tag where div tag should contain listed or specific attribute value.
Here 'j' is an iterable variable that is iterating over response object 'Links' for all news articles listed on a given page.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Statement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m-statement__quote&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;text.strip() function will return text contained within this tag and strip any kind of extra spaces, '\n','\t' from the text string object.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;   &lt;span class="n"&gt;Statement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m- 
   statement__quote&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;Voula! 🌟 We have scraped the first attribute i.e., Statement of our dataset 😋&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;In the same division section, It will look for anchor tag and return with the value of hypertext link.
Again, strip() function is used to get our values organized so that our CSV file looks good.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Link&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m-statement__quote&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;For getting Date attribute, you need to inspect web-page first, As there is a string contained with it. So calling text function without specifying indexing, you will get something like this
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fl2lon5dnqana9q9nr4h0.PNG" alt="Alt Text"&gt;
But we don't need text other than the date, So I use indexing. Although you can clean your attribute later using some regex combinations. 
'footer' is an element that contained the required text.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m-statement__body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;footer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Here also, I have done everything same as before except get(), which is extracting content of a attribute passed(i.e., title)
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m-statement__author&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Since, For my project, I needed a dataset that is not already altered and also, I need to know already about thousands of articles that lie in which category for my training data.
and No-one can do that manually.
So, On this website, I do find articles attached already with labels but the text is not retrievable because it is contained in an image. For this kind of specific task, you can use get() to retrieve particular text effectively.
Here, I am passing 'alt' as an attribute to get(), which contains our Label text.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attrs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m-statement__content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;img&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;c-image__original&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;alt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;In below lines of code, I have put all concepts together &amp;amp; tried to fetch details for five different attributes of my Dataset.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;links&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;Statement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m-statement__quote&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;Link&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;href&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;Date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m-statement__body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;footer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;Source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m-statement__author&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;Label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attrs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m-statement__content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;img&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;attrs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;class&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;c-image__original&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;alt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;Statement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Link&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;upperframe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Step-8:Making Dataset&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Append each attribute value to a empty list 'frame' for each article&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;Statement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Link&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Then, extend this list to an empty list 'upperframe' for each page.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;upperframe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Step-9 : Visualising Dataset&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;If you wanted to visualise your data on jupyter, you can use pandas dataframe to do so.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;upperframe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Statement&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Link&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Source&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Step-10 : Making csv file &amp;amp; saving it to your machine&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;A) Opening &amp;amp; writing to file&lt;/strong&gt;&lt;br&gt;
The below command will help you to write csv file and save it to your machine in the same directory as where your python file has been saved in&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEWS.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Statement,Link,Date, Source, Label&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Statement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;Link&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;Source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This line will write each attribute to a file with replacing any ',' with '^'.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Statement&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;Link&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;Source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;Label&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;So, when you run this file on command shell, It will make a CSV file in your .py file directory.&lt;br&gt;
On opening it, you might see weird data if you don't use strip() while scraping. So do check it without applying strip() and if you don't replace '^' with ',', It will also look weird.&lt;br&gt;
So replace it using these simple steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open your excel file (.csv file)&lt;/li&gt;
&lt;li&gt;press ctrl+H (a pop-up window will come asking about find what &amp;amp; replace with)
&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvrfs11sl6edywxg20smt.PNG" alt="Alt Text"&gt;
&lt;/li&gt;
&lt;li&gt;give '^' value to 'find what' attribute and give ',' value in 'replace with' attribute.&lt;/li&gt;
&lt;li&gt;press Replace All&lt;/li&gt;
&lt;li&gt;click Close
&amp;amp; Wohoo!😍 You are all done with having your dataset in perfect form.
and don't forget to close your file with the following command after done with both the for loops,
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;and running the same code again and again might throw an error if it has already created a dataset using the file writing method.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;B) converting dataframe into csv file using to_csv()&lt;/strong&gt;&lt;br&gt;
So, instead of this lengthy method, you can opt for another method: to_csv() is also used to convert dataframe into a csv file and also provide with a attribute to specify path.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;C:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;Users&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;Kajal&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;Desktop&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;KAJAL&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;Project&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s"&gt;Datasets&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NEWS.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;To avoid the ambiguity and allow portability of your code you can use this:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;NEWS.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;this will append your CSV name to your destination path correctly.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;SUGGESTION &amp;amp; CONCLUSION&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Although I will suggest using the first method using open file and writing to it and then close it, I know it is a bit lengthy &amp;amp; tacky to implement but at least it will not provide you with ambiguous data as to_csv method mostly does.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0dda26v5w8wic5rbwx0a.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0dda26v5w8wic5rbwx0a.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;br&gt;
See in the above image, how it extracts ambiguous data for the Statement attribute.&lt;br&gt;
So, instead of spending hours cleaning your data manually, I would suggest writing a few extra lines of code specified in the first method.&lt;br&gt;
Now, you are done with it.✌️&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;IMPORTANT NOTE:&lt;/strong&gt; If you tried to copy-paste my source code for scraping different websites &amp;amp; run it, It might possible that it will throw an error. In fact, It will definitely throw an error because each webpage's layout is different &amp;amp; for that, you need to make changes accordingly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I hope you will find it useful and liked my article.😇 Please feel free to share your thoughts and hit me up with any queries you might have.😉&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Full Code&lt;/strong&gt;
&lt;/h3&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Dataset:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fo68ja5v74flchf3nnnrz.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fo68ja5v74flchf3nnnrz.PNG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article is the first part of the series of web-scraping and for those who come from non-technical backgrounds, read the second part of this series &lt;a href="https://dev.to/techykajal/scraping-websites-using-software-1and-temp-slug-2506096?preview=f5ec72a865c61e1ab7b093d13af096e1a7b5e86d148181c25ea9a88d849273f3794c776bc3b82a234b078f7d72fc7458fca403da3cc1ddab37c1cddb"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I hope you will find it useful and liked my article.😇 Please feel free to share your thoughts and hit me up with any queries you might have. You can reach me via following :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Subscribe to my &lt;a href="https://www.youtube.com/channel/UCdwAaZMWiRmvIBIT96ApVjw" rel="noopener noreferrer"&gt;YouTube channel&lt;/a&gt; for video contents coming soon &lt;a href="https://www.youtube.com/channel/UCdwAaZMWiRmvIBIT96ApVjw" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Connect and reach me on &lt;a href="http://www.linkedin.com/in/techykajal" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>python</category>
      <category>tutorial</category>
      <category>datascience</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
