<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kashif Aziz</title>
    <description>The latest articles on DEV Community by Kashif Aziz (@kashaziz).</description>
    <link>https://dev.to/kashaziz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F50149%2F927ab05a-0463-41b1-948e-243576052af2.jpeg</url>
      <title>DEV Community: Kashif Aziz</title>
      <link>https://dev.to/kashaziz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kashaziz"/>
    <language>en</language>
    <item>
      <title>My son's first GitHub Repository</title>
      <dc:creator>Kashif Aziz</dc:creator>
      <pubDate>Thu, 17 Oct 2019 19:22:56 +0000</pubDate>
      <link>https://dev.to/kashaziz/my-son-s-first-github-repository-lpd</link>
      <guid>https://dev.to/kashaziz/my-son-s-first-github-repository-lpd</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zhhdPkQD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/kw1ifuf7pay3jdsmbbc0.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zhhdPkQD--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/kw1ifuf7pay3jdsmbbc0.jpg" alt="Cambridge Past Papers Downloader - Python Script"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Proud moments for a Dad&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I have finally convinced my son to take a step forward and publish something in public domain. As he is learning Python these days, he made a scraper to download Cambridge past papers (GCSE / IGCSE).&lt;/p&gt;

&lt;p&gt;If you like it, encourage the kid by pressing the Star button :)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/UsamaKashif/CambridgePastPapersDownloader"&gt;https://github.com/UsamaKashif/CambridgePastPapersDownloader&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>webscraping</category>
      <category>requests</category>
      <category>beautifulsoup</category>
    </item>
    <item>
      <title>Python Wrapper For Indeed Job Search API</title>
      <dc:creator>Kashif Aziz</dc:creator>
      <pubDate>Tue, 31 Jul 2018 21:51:03 +0000</pubDate>
      <link>https://dev.to/kashaziz/python-wrapper-for-indeed-job-search-api-57of</link>
      <guid>https://dev.to/kashaziz/python-wrapper-for-indeed-job-search-api-57of</guid>
      <description>

&lt;h1&gt;
  
  
  Python Wrapper For Indeed Job Search API
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--FvkrfKJx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/76u4lfx01t8dlfkq4pds.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--FvkrfKJx--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/76u4lfx01t8dlfkq4pds.jpg" alt="Python Wrapper For Indeed Job Search API"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Recently, I was looking for a Python wrapper to work with Indeed API. Unable to find one that fulfills my requirements, I wrote a quick and simple Python script that consumes Indeed job search API and stores the search results in a CSV.&lt;/p&gt;

&lt;p&gt;In order to run the script and fetch jobs from Indeed job search API, you must have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Indeed Publisher API ID, available for free from &lt;a href="https://ads.indeed.com/jobroll/xmlfeed"&gt;here&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Python 3.x&lt;/li&gt;
&lt;li&gt;BeautifulSoup and Requests libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Usage:
&lt;/h3&gt;

&lt;p&gt;The code for Python wrapper for Indeed Job Search API is available at GitHub. &lt;a href="https://github.com/kashaziz/indeed-python-wrapper"&gt;Download it from here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Open indeedapiwrapper.py, add following parameters to fetch job listings through Indeed Job Search API:&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;params = {
    'publisher': "",    # publisher ID (Required)
    'q': "",            # Job search query
    'l': "",            # location (city / state)
    'co': "",           # Two letter Country Code
    'sort': "",         # Sort order, date or relevance
    'days': ""          # number of days to fetch jobs, maximum is 7 days
    }   
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Publisher Id is required.&lt;/li&gt;
&lt;li&gt;To search jobs, either provide query string or combination of location and country code.&lt;/li&gt;
&lt;li&gt;To return relevant jobs, the script requires the presence of either query string or the combination of location and country code. In case query sting is not present and either one of location or country code is missing, the script will use “Karachi” and “pk” as default location and country code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, following search parameters will search for all Python jobs in Karachi, Pakistan.&lt;/p&gt;



&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;params = {
    'publisher': "0000000000000000",    # Use valid Id to get results 
    'q': "python",  
    'l': "karachi",  
    'co': "pk",  
    'sort': "date",                # Sort by date
    'days': "3"                    # get jobs for 3 days, including today
    }   
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h3&gt;
  
  
  Output:
&lt;/h3&gt;

&lt;p&gt;The list of jobs will be saved in a CSV file “indeedjobs.csv” in the same directory where the script resides.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resources:
&lt;/h3&gt;

&lt;p&gt;Indeed job search API documentation is available &lt;a href="http://opensource.indeedeng.io/api-documentation/docs/job-search/"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Code for Python wrapper for Indeed Job Search API &lt;a href="https://github.com/kashaziz/indeed-python-wrapper"&gt;Download it from here&lt;/a&gt;.&lt;/p&gt;


</description>
      <category>python</category>
      <category>indeed</category>
      <category>jobsearch</category>
    </item>
    <item>
      <title>Proxy Server Rotation Script in Python</title>
      <dc:creator>Kashif Aziz</dc:creator>
      <pubDate>Sat, 30 Dec 2017 15:54:07 +0000</pubDate>
      <link>https://dev.to/kashaziz/proxy-server-rotation-script-in-python-3ec1</link>
      <guid>https://dev.to/kashaziz/proxy-server-rotation-script-in-python-3ec1</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F18eaxrnq802o0xin8uew.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fthepracticaldev.s3.amazonaws.com%2Fi%2F18eaxrnq802o0xin8uew.jpg" alt="Proxy Server Rotation Script in Python"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Rotating Proxy Servers in Python
&lt;/h1&gt;

&lt;p&gt;Recently, I have used ProxyMesh proxy server for a project. ProxyMesh offers 15 proxy servers, each denoting a specific location (such as us-dc) with 10 IP addresses rotating twice per day.&lt;/p&gt;

&lt;p&gt;However, the free trial allows you to have one proxy server at a time. This means that if you are working with a server or CDN that strictly throttles IPs, you have to change the proxy server manually to keep rotating between the 15 proxy servers.&lt;/p&gt;

&lt;p&gt;Fortunately, ProxyMesh provides an API that can be used to add and delete the proxy servers in user dashboard. Once a proxy server is assigned to the user, it can be fetched and used. All we need is a script that rotates between the proxy servers, deleting and adding them as required.&lt;/p&gt;

&lt;p&gt;Based on this, I have written a script that would rotate through ProxyMesh proxy servers using their API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;An account with ProxyMesh, either free trial or paid. Set the user name and password in rotatingproxy.py&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;self.user = ""
self.password = "" 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Setting the Proxy Server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from rotatingproxy import RotatingProxy

rproxy = RotatingProxy()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The proxy server can either be set randomly or selected from an available list of proxy servers. &lt;br&gt;
The active proxy server is saved in a text file which can be accessed as required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rproxy.set_proxy(israndom="r")  # select a random proxy server

rproxy.set_proxy(proxy_num=1)   # select proxy server with index=1 from the list of proxy servers.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Accessing the Proxy Server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def get_proxy_from_file():
    # fetches proxy from proxy.txt
    with open("proxy.txt", "r") as f:
        return loads(f.read())

proxy = get_proxy_from_file()        
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The proxy can now be used with requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
response = requests.get("url-to-fetch", proxies=proxy)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Blog post: &lt;a href="http://www.kashifaziz.me/proxy-server-rotation-python.html/" rel="noopener noreferrer"&gt;http://www.kashifaziz.me/proxy-server-rotation-python.html/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/kashaziz/rotating-proxy-python" rel="noopener noreferrer"&gt;https://github.com/kashaziz/rotating-proxy-python&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>proxyserver</category>
    </item>
    <item>
      <title>Web Scraping with Python BeautifulSoup and Requests</title>
      <dc:creator>Kashif Aziz</dc:creator>
      <pubDate>Wed, 20 Dec 2017 10:01:36 +0000</pubDate>
      <link>https://dev.to/kashaziz/web-scraping-with-python-beautifulsoup-and-requests-2n71</link>
      <guid>https://dev.to/kashaziz/web-scraping-with-python-beautifulsoup-and-requests-2n71</guid>
      <description>&lt;p&gt;This is an overview of a blog post I recently wrote about how to scrap web pages using Python BeautifulSoup and Requests libraries.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Web Scraping:
&lt;/h3&gt;

&lt;p&gt;Web scraping is the process of automatically extracting information from a website. Web scraping, or data scraping, is useful for researchers, marketers and analysts interested in compiling, filtering and repackaging data.&lt;/p&gt;

&lt;p&gt;A word of caution: Always respect the website’s privacy policy and check robots.txt before scraping. If a website offers API to interact with its data, it is better to use that instead of scraping.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web Scraping with Python and BeautifulSoup:
&lt;/h3&gt;

&lt;p&gt;Web scraping in Python is a breeze. There are number of ways to access a web page and scrap its data. I have used Python and BeautifulSoup for the purpose.&lt;/p&gt;

&lt;p&gt;In this example, I have scraped college footballer data from ESPN website.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Process:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Install requests and beautifulsoup libraries&lt;/li&gt;
&lt;li&gt;Fetch the web page and store it in a BeautifulSoup object. &lt;/li&gt;
&lt;li&gt;Set a parser to parse the HTML in the web page. I have used the default html.parser&lt;/li&gt;
&lt;li&gt;Extract the player name, school, city, playing position and grade.&lt;/li&gt;
&lt;li&gt;Appended the data to a list which will be written to a CSV file at later stage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9B8ZT2A---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://www.kashifaziz.me/wp-content/uploads/2017/10/college-footballer-data-scraping-python-beautifulsoup-code.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9B8ZT2A---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://www.kashifaziz.me/wp-content/uploads/2017/10/college-footballer-data-scraping-python-beautifulsoup-code.jpg" alt="Python BeautifulSoup Tutorial: Web Scraping In 20 Lines Of Code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Code:
&lt;/h3&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;&lt;a href="http://www.kashifaziz.me/web-scraping-python-beautifulsoup.html/"&gt;Detailed blog post is available here.&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>beautifulsoup</category>
      <category>requests</category>
      <category>webscraping</category>
    </item>
  </channel>
</rss>
