<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mirzokhid Mukhsidov</title>
    <description>The latest articles on DEV Community by Mirzokhid Mukhsidov (@muxsidov).</description>
    <link>https://dev.to/muxsidov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F891922%2F3523d89f-245c-4b61-9e87-1833d0720adf.jpg</url>
      <title>DEV Community: Mirzokhid Mukhsidov</title>
      <link>https://dev.to/muxsidov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/muxsidov"/>
    <language>en</language>
    <item>
      <title>Web Scraper with Python (Beautiful Soup) &amp; Deployment of it into Heroku [Part2]</title>
      <dc:creator>Mirzokhid Mukhsidov</dc:creator>
      <pubDate>Mon, 10 Oct 2022 14:08:04 +0000</pubDate>
      <link>https://dev.to/muxsidov/web-scraper-with-python-beautiful-soup-deployment-of-it-into-heroku-part2-125p</link>
      <guid>https://dev.to/muxsidov/web-scraper-with-python-beautiful-soup-deployment-of-it-into-heroku-part2-125p</guid>
      <description>&lt;p&gt;After writing the &lt;a href="https://dev.to/mirzokhid/a-web-scraper-with-python-beautiful-soup-deployment-of-it-into-heroku-part1-22kb"&gt;code portion&lt;/a&gt; of my project and testing it, I pushed it into the &lt;a href="https://www.heroku.com/home" rel="noopener noreferrer"&gt;Heroku&lt;/a&gt; server. Since running the program regularly manually might get tedious over time I scheduled it (a.k.a cron job) so it runs automatically at a given time (every day in my case). Turns out Heroku does not allow unverified users (&lt;a href="https://devcenter.heroku.com/articles/account-verification#when-is-verification-required?c=&amp;amp;utm_campaign=freedynolimits&amp;amp;utm_medium=telex&amp;amp;utm_source=nurture&amp;amp;utm_content=devcenter&amp;amp;utm_term=when-verify" rel="noopener noreferrer"&gt;here is&lt;/a&gt; how to verify your account) to use &lt;a href="https://devcenter.heroku.com/articles/add-ons?c=&amp;amp;utm_campaign=freedynolimits&amp;amp;utm_medium=telex&amp;amp;utm_source=nurture&amp;amp;utm_content=devcenter&amp;amp;utm_term=add-ons" rel="noopener noreferrer"&gt;add-ons&lt;/a&gt; so I scheduled it manually with the &lt;a href="https://schedule.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;python schedule&lt;/a&gt; module. Later on, after I verified my account with a credit card I was able to use the &lt;a href="https://devcenter.heroku.com/articles/scheduler" rel="noopener noreferrer"&gt;Heroku Scheduler&lt;/a&gt;. In this post we will go through both of the ways. However, first we have to connect PostgreSQL to your database in Python.&lt;br&gt;
&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Connect Python to Postgresql&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://devcenter.heroku.com/articles/connecting-heroku-postgres#connecting-in-python" rel="noopener noreferrer"&gt;Connecting in Python&lt;/a&gt; describes connecting to the database in the Heroku server with PostgreSQL. First you should install psycopg2 package&lt;br&gt;
&lt;code&gt;pip install psycopg2-binary&lt;/code&gt;&lt;br&gt;
then connect to DATABASE_URL with this package&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import psycopg2

DATABASE_URL = os.environ['DATABASE_URL']

conn = psycopg2.connect(DATABASE_URL, sslmode='require')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Scheduling with Python Schedule&lt;/strong&gt;&lt;br&gt;
Python schedule module, as the name suggests, runs Python functions (or any other callable) periodically using a friendly syntax.&lt;/p&gt;

&lt;p&gt;We install it with the command:&lt;br&gt;
&lt;code&gt;$ pip install schedule&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Import schedule and time module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import schedule
import time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Define a function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def function_name():
    # ToDo

schedule.every(10).minutes.do(function_name)
schedule.every().hour.do(function_name)
schedule.every().day.at("10:30").do(function_name)
schedule.every().monday.do(function_name)
schedule.every().wednesday.at("13:15").do(function_name)
schedule.every().minute.at(":17").do(function_name)

while True:
    schedule.run_pending()
    time.sleep(1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source: &lt;a href="https://schedule.readthedocs.io/en/stable/" rel="noopener noreferrer"&gt;https://schedule.readthedocs.io/en/stable/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=qquCAgwvL8Q" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=qquCAgwvL8Q&lt;/a&gt;&lt;br&gt;
&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Pushing the code into the Heroku Server&lt;/strong&gt;&lt;br&gt;
Heroku is a quite popular cloud platform. On the &lt;a href="https://devcenter.heroku.com/articles/getting-started-with-python?singlepage=true" rel="noopener noreferrer"&gt;Getting Started on Heroku with Python&lt;/a&gt; you will see in detail how to install Heroku CLI onto your machine and push your project into the server using Git.&lt;br&gt;
Keep in mind that, unlike the tutorial above, &lt;a href="https://devcenter.heroku.com/articles/procfile" rel="noopener noreferrer"&gt;Procfile&lt;/a&gt;, we must use worker process type!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdi802e2rkc5uh821ojmx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdi802e2rkc5uh821ojmx.png" alt="Procfile" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Scheduling with Heroku Scheduler&lt;/strong&gt;&lt;br&gt;
For a free dyno Heroku gives you 550 hours per month (&lt;a href="https://www.heroku.com/dynos" rel="noopener noreferrer"&gt;read more about dynos&lt;/a&gt;) plus 450 hours if you verify your account. &lt;br&gt;
Pushing your code into Heroku with Python Schedule might use a lot of free dyno hours. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzaf1ke76z3b2zoq80zcl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzaf1ke76z3b2zoq80zcl.png" alt="heroku ps" width="800" height="166"&gt;&lt;/a&gt;&lt;br&gt;
This is why we will take advantage of the &lt;a href="https://devcenter.heroku.com/articles/scheduler" rel="noopener noreferrer"&gt;Heroku Scheduler&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Go to the "Recources" section of your app&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F322z6x2ocz0obb09zy2r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F322z6x2ocz0obb09zy2r.png" alt="Recources" width="800" height="374"&gt;&lt;/a&gt;&lt;br&gt;
Find Heroku Scheduler and add it&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftm30al2n73muq9at5b38.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftm30al2n73muq9at5b38.png" alt="Search Heroku Scheduler" width="800" height="440"&gt;&lt;/a&gt;&lt;br&gt;
Click on Heroku Scheduler add-on&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgj2sbzcxsze2nfk2ddr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgj2sbzcxsze2nfk2ddr.png" alt="Click into Heroku" width="800" height="449"&gt;&lt;/a&gt;&lt;br&gt;
Create a job for a suitable time period and Save it&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xsg28scfz975pmgqwjp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xsg28scfz975pmgqwjp.png" alt="Create a job" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the end you might check your work with &lt;br&gt;
&lt;code&gt;heroku logs --tail&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Disclaimer!&lt;/strong&gt;&lt;br&gt;
Starting November 28th, 2022, free Heroku Dynos, free Heroku Postgres, and free Heroku Data for Redis will no longer be available.&lt;br&gt;
More information&lt;br&gt;
&lt;a href="https://blog.heroku.com/next-chapter" rel="noopener noreferrer"&gt;https://blog.heroku.com/next-chapter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>heroku</category>
    </item>
    <item>
      <title>Web Scraper with Python (Beautiful Soup) &amp; Deployment of it into Heroku [Part1]</title>
      <dc:creator>Mirzokhid Mukhsidov</dc:creator>
      <pubDate>Thu, 01 Sep 2022 18:38:17 +0000</pubDate>
      <link>https://dev.to/muxsidov/a-web-scraper-with-python-beautiful-soup-deployment-of-it-into-heroku-part1-22kb</link>
      <guid>https://dev.to/muxsidov/a-web-scraper-with-python-beautiful-soup-deployment-of-it-into-heroku-part1-22kb</guid>
      <description>&lt;p&gt;A while ago I decided to create a web crawling project using &lt;a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/" rel="noopener noreferrer"&gt;Beautiful Soup&lt;/a&gt; (a Python library for pulling data out of HTML and XML files). Here is how I did it, hurdles I faced during development and how I overcame them. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrxmto0ixmnbuoje90j0.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrxmto0ixmnbuoje90j0.jpg" alt="Meme" width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
We will use &lt;a href="https://docs.python.org/3/library/venv.html#:~:text=A%20virtual%20environment%20is%20a,part%20of%20your%20operating%20system." rel="noopener noreferrer"&gt;Virtual Environment&lt;/a&gt; throughout the development, &lt;a href="https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/" rel="noopener noreferrer"&gt;here are&lt;/a&gt; the instructions on how to install it in Windows and &lt;a href="https://realpython.com/python-virtual-environments-a-primer/#why-do-you-need-virtual-environments" rel="noopener noreferrer"&gt;here are&lt;/a&gt; the reasons.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwk41wgqskzkh22q26r5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwk41wgqskzkh22q26r5.png" alt="Install virual environment" width="800" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To deactivate your virtual Environment simply type &lt;code&gt;deactivate&lt;/code&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dhii4555cc2t5txo8xb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dhii4555cc2t5txo8xb.png" alt="To deactivate your virtual Environment simply type deactivate" width="800" height="15"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then we will create a &lt;code&gt;requirements.txt&lt;/code&gt; file for listing all the dependencies for our Python project.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F248i80vi00647u4ib178.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F248i80vi00647u4ib178.png" alt="requests.txt" width="800" height="185"&gt;&lt;/a&gt;Your requirements might differ depending on your case.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;requirements.txt&lt;/code&gt; is &lt;strong&gt;important!&lt;/strong&gt; I was too lazy to do this step at the first attempt... However, sooner or later you have to do it at least in order to push it into Heroku.&lt;br&gt;
&lt;code&gt;pip install -r requirements.txt&lt;/code&gt; is the command to install the list of requirements.&lt;br&gt;
&lt;br&gt;&lt;br&gt;
Now let me show to you how to write code to actually scrap the given web site. In a nutshell, web scraping is extracting data from websites into the form of your choice (I wrote "&lt;a href="https://www.scrapethissite.com/pages/" rel="noopener noreferrer"&gt;https://www.scrapethissite.com/pages/&lt;/a&gt;" to csv file).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8w3jvakc61owq84hskg9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8w3jvakc61owq84hskg9.png" alt="website =&amp;gt; csv" width="800" height="286"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdam9pconadel2s30eyo3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdam9pconadel2s30eyo3.png" alt="web scrapping" width="530" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
First we make web requests using python &lt;a href="https://pypi.org/project/requests/" rel="noopener noreferrer"&gt;requests&lt;/a&gt; library.&lt;/p&gt;

&lt;p&gt;As you can see below, we printed the content of received information and it is the same HTML page, which you can see with the keyboard combination of Ctrl+U in Chrome or by pressing the right click on your mouse, then View page source.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqzcotj0y3x0dv6zfkuy8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqzcotj0y3x0dv6zfkuy8.png" alt="Way to source code" width="800" height="447"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
from bs4 import BeautifulSoup

link = "https://www.scrapethissite.com/pages/"
request = requests.get(link)
print(request.content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid0stvjs5fv67r9g26qe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fid0stvjs5fv67r9g26qe.png" alt="Source Code in Terminal" width="800" height="410"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fql2wahw2glvxuem1cvcb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fql2wahw2glvxuem1cvcb.png" alt="Source Code Chrome" width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To withdraw actual data from HTML tags we are going to reach for the help of Beautiful Soup library.&lt;/p&gt;

&lt;p&gt;Get &lt;code&gt;.text&lt;/code&gt; from &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; tag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
from bs4 import BeautifulSoup

link = "https://www.scrapethissite.com/pages/"
request = requests.get(link)

soup = BeautifulSoup(request.content, "html5lib")
print(soup.title.text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fca20j4089pnbsxnbct6r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fca20j4089pnbsxnbct6r.png" alt="Out put of soup.title.text" width="800" height="96"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Withdraw hyperlinks with &lt;code&gt;.a&lt;/code&gt; tag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
from bs4 import BeautifulSoup

link = "https://www.scrapethissite.com/pages/"
request = requests.get(link)

soup = BeautifulSoup(request.content, "html5lib")
print(soup.a)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;.find_all()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
from bs4 import BeautifulSoup

link = "https://www.scrapethissite.com/pages/"
request = requests.get(link)

soup = BeautifulSoup(request.content, "html5lib")

for i in soup.find_all('h3'):
    print(i.text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F97k07j5atxos0gyj6rm2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F97k07j5atxos0gyj6rm2.png" alt=".find_all" width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can even search with CSS class &lt;code&gt;.find_all(class_="class_name")&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
from bs4 import BeautifulSoup

link = "https://www.scrapethissite.com/pages/"
request = requests.get(link)

soup = BeautifulSoup(request.content, "html5lib")

for i in soup.find_all(class_='class_name'):
    print(i.text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0vp2vk9m9guo3cfs9sv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0vp2vk9m9guo3cfs9sv.png" alt="class_=''" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
The rule of thumb here is to find the piece of data from the source code of the web site (via Ctrl+F in Chrome) and extract the data using whatever tag it is in.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdegun0g03nofqkha4xg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdegun0g03nofqkha4xg.png" alt="Ctrl + F" width="800" height="251"&gt;&lt;/a&gt;&lt;br&gt;
&lt;br&gt;&lt;br&gt;
There are many tags on &lt;a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/#" rel="noopener noreferrer"&gt;Beautiful Soup&lt;/a&gt;, and from my experience what I found out is, often tutorials or/and posts are not perfectly suitable for your case. Reading it on post sounds like I am shooting myself in the foot, doesn't it? 😅 Do not get me wrong posts/videos are by all means useful to get a general idea about the topic. Nonetheless, if you are working on a different situation it is better if you skim the docs, so you can tackle your problem with more adequate methods. Besides, by the time you are watching/reading the video tutorial/post things (versions) are very likely to be changed. So what I would suggest is going to what initially seems a hard way and read the documentary, rather than trying to cut the corners and ending up frustrated with wasted time.&lt;br&gt;&lt;br&gt;
Furthermore, if you need to insert scraped data into database in your local machine I would recommend you a &lt;a href="https://realpython.com/python-sql-libraries/#understanding-the-database-schema" rel="noopener noreferrer"&gt;real python&lt;/a&gt; article. &lt;br&gt;
In the next part we will see how I pushed scrapper into Heroku Server and how to build a database there.&lt;/p&gt;

&lt;p&gt;You can find the source code on my Git Hub page: &lt;a href="https://github.com/Muxsidov/Scraper_Blog" rel="noopener noreferrer"&gt;https://github.com/Muxsidov/Scraper_Blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webscrapping</category>
      <category>python</category>
      <category>beautifulsoup</category>
    </item>
  </channel>
</rss>
