<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Edward Pasenidis</title>
    <description>The latest articles on DEV Community by Edward Pasenidis (@pasenidis).</description>
    <link>https://dev.to/pasenidis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F285547%2F76494172-1203-42a3-b093-6ecff2eae0e3.png</url>
      <title>DEV Community: Edward Pasenidis</title>
      <link>https://dev.to/pasenidis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pasenidis"/>
    <language>en</language>
    <item>
      <title>Basics of Scraping with Python 🐍</title>
      <dc:creator>Edward Pasenidis</dc:creator>
      <pubDate>Sun, 26 Jul 2020 14:57:13 +0000</pubDate>
      <link>https://dev.to/pasenidis/basics-of-scraping-with-python-40bo</link>
      <guid>https://dev.to/pasenidis/basics-of-scraping-with-python-40bo</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8ipi7mjpvgb1gclc1pu1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F8ipi7mjpvgb1gclc1pu1.png" alt="Snap"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Prologue
&lt;/h2&gt;

&lt;p&gt;Hello, in this post I am gonna describe the process of writing a scrapper script in Python, with the help of the Beautiful Soup library.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing the dependencies
&lt;/h2&gt;

&lt;p&gt;First of all, since Beautiful Soup is a 3rd-party community project, you have to install it via the PyPI registry.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;beautifulsoup4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Philosophy of Beautiful Soup
&lt;/h2&gt;

&lt;p&gt;BS is a library that sits atop an HTML/XML parser (in our case it's the prior)&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic Script
&lt;/h2&gt;

&lt;p&gt;Now that we know how it works, let's write a tiny script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;urllib.request&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urlopen&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;


&lt;span class="n"&gt;WEBSITE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://google.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;urlopen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WEBSITE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, we also make use of the &lt;code&gt;urllib&lt;/code&gt; requests library, this just downloads the HTML for us.&lt;br&gt;
Then, we read it with the pre-declared &lt;code&gt;html&lt;/code&gt; variable that contains the google.com document&lt;/p&gt;
&lt;h2&gt;
  
  
  Parsing data
&lt;/h2&gt;

&lt;p&gt;Sometimes, we want to get specific parts of a document, such as a paragraph or an image.&lt;/p&gt;

&lt;p&gt;You can search for a specific HTML tag in BeautifulSoup with the find() attribute.&lt;/p&gt;

&lt;p&gt;Let's scrape the Google logo tag from their homepage!&lt;br&gt;
Add the following lines of code to the already existing file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;google_logo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;img&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hplogo&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;google_logo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This two lines of code will hopefully produce this output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;img&lt;/span&gt; 
&lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"Google"&lt;/span&gt; 
&lt;span class="na"&gt;height=&lt;/span&gt;&lt;span class="s"&gt;"92"&lt;/span&gt; 
&lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"hplogo"&lt;/span&gt; 
&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png"&lt;/span&gt;
&lt;span class="na"&gt;style=&lt;/span&gt;&lt;span class="s"&gt;"padding:28px 0 14px"&lt;/span&gt; 
&lt;span class="na"&gt;width=&lt;/span&gt;&lt;span class="s"&gt;"272"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, how does this work?&lt;br&gt;
Well, we are using the find() method and passing to it some arguments.&lt;br&gt;
To be exact, we are telling it that we are searching for an &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag with an id called &lt;code&gt;'hplogo'&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Epilogue
&lt;/h2&gt;

&lt;p&gt;That's all.&lt;br&gt;
To learn more about Beautiful Soup, read the &lt;a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/" rel="noopener noreferrer"&gt;docs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>A personal blog for fun</title>
      <dc:creator>Edward Pasenidis</dc:creator>
      <pubDate>Thu, 23 Jul 2020 19:09:13 +0000</pubDate>
      <link>https://dev.to/pasenidis/a-personal-blog-for-fun-2p2b</link>
      <guid>https://dev.to/pasenidis/a-personal-blog-for-fun-2p2b</guid>
      <description>&lt;p&gt;I made a blog with VueJS that makes AJAX requests to DEV.TO and parses the articles I've written. This way, I don't need any backend or DB.&lt;/p&gt;

</description>
      <category>blog</category>
      <category>vue</category>
      <category>frontend</category>
    </item>
    <item>
      <title>Making apps during quarantine!</title>
      <dc:creator>Edward Pasenidis</dc:creator>
      <pubDate>Thu, 26 Mar 2020 05:05:02 +0000</pubDate>
      <link>https://dev.to/pasenidis/making-apps-during-quarantine-1hl6</link>
      <guid>https://dev.to/pasenidis/making-apps-during-quarantine-1hl6</guid>
      <description>&lt;h2&gt;
  
  
  Boring, huh?
&lt;/h2&gt;

&lt;p&gt;Quarantine, a different perspective of "staying home as usual", only it's unusual and you can't go out if you get bored. Bad, huh? Eventually it makes you bored - that much that I created a COVID-19 tracker.&lt;br&gt;
But how does it work? I mean, what's the difference of it from many others crappy trackers? Well this one is developed by two people &amp;amp; it contains time charts :) (&lt;a href="https://covid-19-system.herokuapp.com/developers"&gt;https://covid-19-system.herokuapp.com/developers&lt;/a&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  What is this tracker all about?
&lt;/h2&gt;

&lt;p&gt;I mean, now you compare two time periods (e.g.: December &amp;amp; March)&lt;br&gt;
Kinda useless? Maybe, but social media like using phrases like "COVID-19 infection rate has raised, 5% more than it was in February" and things like that. Who knows, maybe journalists will use that thing. The funny part is that the API wasn't even created by us, yeah - you heard right!&lt;br&gt;
Basically, we will be utilizing a second API soon which is also not ours!&lt;br&gt;
That's open-source for you, beginners! (yes, especially contributing is amazing). Back to our topic, we won't even implemented a custom API, although I may also do this later. Anyways, we will be adding more charts, country search, better mobile responsibility &amp;amp; much more.&lt;/p&gt;

&lt;p&gt;Now, let's see how that thing works behind the hood...&lt;/p&gt;

&lt;h2&gt;
  
  
  Exploring the project
&lt;/h2&gt;

&lt;p&gt;So, if you &lt;code&gt;git clone&lt;/code&gt; the site repository you will basically download the repository. Let's start exploring it - open the &lt;strong&gt;src&lt;/strong&gt; folder to get started. See? There are many files; some are for Pug, other are for browser JS, there is also one CSS, nevertheless there are many things on that repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  But how do they talk?
&lt;/h2&gt;

&lt;p&gt;Well, if you type &lt;code&gt;npm start&lt;/code&gt;, a node express server will start. Express is responsible for the routes &amp;amp; some minor things inside the repo.&lt;/p&gt;

&lt;p&gt;Then comes Pug, a HTML pre-processor, something like a library that replaces placeholders inside HTML, with real content!&lt;/p&gt;

&lt;p&gt;Next coming up is the public directory which contains CSS files and JavaScript that runs in browser (not related to Node, it's linked by Pug),&lt;br&gt;
this fetches information, from an API that you can find on the GitHub project repository as soon as this article ends. [1]&lt;/p&gt;

&lt;p&gt;This was a brief documentation, I am not gonna dive deeper; you will be able to do that yourself when the major release will be ready!&lt;/p&gt;

&lt;p&gt;Let's not forget to mention the developers;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Me, (Edward, also the writer of this post)&lt;/li&gt;
&lt;li&gt;Lean, (Tasos, a cool dude who has developed from Discord bots to an Arduino-to-Discord webhook system)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Some important links
&lt;/h2&gt;

&lt;p&gt;[1]. &lt;a href="https://github.com/pasenidis/covid19-stats"&gt;https://github.com/pasenidis/covid19-stats&lt;/a&gt;&lt;br&gt;
[2]. &lt;a href="https://github.com/pasenidis"&gt;https://github.com/pasenidis&lt;/a&gt;&lt;br&gt;
[3]. &lt;a href="https://github.com/TasosY2K"&gt;https://github.com/TasosY2K&lt;/a&gt;&lt;/p&gt;

</description>
      <category>node</category>
      <category>api</category>
      <category>express</category>
      <category>coronavirus</category>
    </item>
  </channel>
</rss>
