DEV Community: Kashif Aziz

Python Wrapper For Indeed Job Search API

Kashif Aziz — Tue, 31 Jul 2018 21:51:03 +0000

Python Wrapper For Indeed Job Search API

Recently, I was looking for a Python wrapper to work with Indeed API. Unable to find one that fulfills my requirements, I wrote a quick and simple Python script that consumes Indeed job search API and stores the search results in a CSV.

In order to run the script and fetch jobs from Indeed job search API, you must have:

Indeed Publisher API ID, available for free from here
Python 3.x
BeautifulSoup and Requests libraries

Usage:

The code for Python wrapper for Indeed Job Search API is available at GitHub. Download it from here.

Open indeedapiwrapper.py, add following parameters to fetch job listings through Indeed Job Search API:

params = {
    'publisher': "",    # publisher ID (Required)
    'q': "",            # Job search query
    'l': "",            # location (city / state)
    'co': "",           # Two letter Country Code
    'sort': "",         # Sort order, date or relevance
    'days': ""          # number of days to fetch jobs, maximum is 7 days
    }

Note:

Publisher Id is required.
To search jobs, either provide query string or combination of location and country code.
To return relevant jobs, the script requires the presence of either query string or the combination of location and country code. In case query sting is not present and either one of location or country code is missing, the script will use “Karachi” and “pk” as default location and country code.

For example, following search parameters will search for all Python jobs in Karachi, Pakistan.

params = {
    'publisher': "0000000000000000",    # Use valid Id to get results 
    'q': "python",  
    'l': "karachi",  
    'co': "pk",  
    'sort': "date",                # Sort by date
    'days': "3"                    # get jobs for 3 days, including today
    }

Output:

The list of jobs will be saved in a CSV file “indeedjobs.csv” in the same directory where the script resides.

Resources:

Indeed job search API documentation is available here.

Code for Python wrapper for Indeed Job Search API Download it from here.

Proxy Server Rotation Script in Python

Kashif Aziz — Sat, 30 Dec 2017 15:54:07 +0000

Rotating Proxy Servers in Python

Recently, I have used ProxyMesh proxy server for a project. ProxyMesh offers 15 proxy servers, each denoting a specific location (such as us-dc) with 10 IP addresses rotating twice per day.

However, the free trial allows you to have one proxy server at a time. This means that if you are working with a server or CDN that strictly throttles IPs, you have to change the proxy server manually to keep rotating between the 15 proxy servers.

Fortunately, ProxyMesh provides an API that can be used to add and delete the proxy servers in user dashboard. Once a proxy server is assigned to the user, it can be fetched and used. All we need is a script that rotates between the proxy servers, deleting and adding them as required.

Based on this, I have written a script that would rotate through ProxyMesh proxy servers using their API.

Prerequisites

An account with ProxyMesh, either free trial or paid. Set the user name and password in rotatingproxy.py

self.user = ""
self.password = ""

Usage

Setting the Proxy Server

from rotatingproxy import RotatingProxy

rproxy = RotatingProxy()

The proxy server can either be set randomly or selected from an available list of proxy servers.
The active proxy server is saved in a text file which can be accessed as required.

rproxy.set_proxy(israndom="r")  # select a random proxy server

rproxy.set_proxy(proxy_num=1)   # select proxy server with index=1 from the list of proxy servers.

Accessing the Proxy Server

def get_proxy_from_file():
    # fetches proxy from proxy.txt
    with open("proxy.txt", "r") as f:
        return loads(f.read())

proxy = get_proxy_from_file()

The proxy can now be used with requests:

import requests
response = requests.get("url-to-fetch", proxies=proxy)

Blog post: http://www.kashifaziz.me/proxy-server-rotation-python.html/

GitHub: https://github.com/kashaziz/rotating-proxy-python

Web Scraping with Python BeautifulSoup and Requests

Kashif Aziz — Wed, 20 Dec 2017 10:01:36 +0000

This is an overview of a blog post I recently wrote about how to scrap web pages using Python BeautifulSoup and Requests libraries.

What is Web Scraping:

Web scraping is the process of automatically extracting information from a website. Web scraping, or data scraping, is useful for researchers, marketers and analysts interested in compiling, filtering and repackaging data.

A word of caution: Always respect the website’s privacy policy and check robots.txt before scraping. If a website offers API to interact with its data, it is better to use that instead of scraping.

Web Scraping with Python and BeautifulSoup:

Web scraping in Python is a breeze. There are number of ways to access a web page and scrap its data. I have used Python and BeautifulSoup for the purpose.

In this example, I have scraped college footballer data from ESPN website.

The Process:

Install requests and beautifulsoup libraries
Fetch the web page and store it in a BeautifulSoup object.
Set a parser to parse the HTML in the web page. I have used the default html.parser
Extract the player name, school, city, playing position and grade.
Appended the data to a list which will be written to a CSV file at later stage.

The Code:

Detailed blog post is available here.