What will be scraped
πNote: Some queries may not display all sections. You can check your query in the playground.
Why using API?
- No need to create a parser from scratch and maintain it.
- Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation.
SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.
Full Code
If you don't need explanation, have a look at full code example in the online IDE.
from serpapi import GoogleSearch
import os, json
def extract_multiple_jobs():
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_jobs', # SerpApi search engine
'gl': 'us', # country of the search
'hl': 'en', # language of the search
'q': 'barista new york', # search query
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
return [job.get('job_id') for job in results['jobs_results']]
def scrape_google_jobs_listing(job_ids):
data = []
for job_id in job_ids:
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_jobs_listing', # SerpApi search engine
'q': job_id, # search query (job_id)
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
data.append({
'job_id': job_id,
'apply_options': results.get('apply_options'),
'salaries': results.get('salaries'),
'ratings': results.get('ratings')
})
return data
def main():
job_ids = extract_multiple_jobs()
google_jobs_listing_results = scrape_google_jobs_listing(job_ids)
print(json.dumps(google_jobs_listing_results, indent=2, ensure_ascii=False))
if __name__ == '__main__':
main()
Preparation
Install library:
pip install google-search-results
google-search-results
is a SerpApi API package.
Code Explanation
Import libraries:
from serpapi import GoogleSearch
import os, json
Library | Purpose |
---|---|
GoogleSearch |
to scrape and parse Google results using SerpApi web scraping library. |
os |
to return environment variable (SerpApi API key) value. |
json |
to convert extracted data to a JSON object. |
Top-level code environment
The extract_multiple_jobs()
function is called to get all the job_id
values. The resulting list of job_ids
is passed to the scrape_google_jobs_listing(job_ids)
function to retrieve the required data. The explanation of these functions will be in the corresponding headings below.
This code uses the generally accepted rule of using the __name__ == "__main__"
construct:
def main():
job_ids = extract_multiple_jobs()
google_jobs_listing_results = scrape_google_jobs_listing(job_ids)
print(json.dumps(google_jobs_listing_results, indent=2, ensure_ascii=False))
if __name__ == '__main__':
main()
This check will only be performed if the user has run this file. If the user imports this file into another, then the check will not work.
You can watch the video Python Tutorial: if name == 'main' for more details.
Extract Multiple Jobs
The function returns a list of job_id
values. The value of this identifier will be used in the next function to create the request.
This function provides a code snippet for getting data from the first page. If you want to extract data using pagination, you can see it in the Scrape Google Jobs organic results with Python blog post.
At the beginning of the function, parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params
dictionary.
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_jobs', # SerpApi search engine
'gl': 'us', # country of the search
'hl': 'en', # language of the search
'q': 'barista new york', # search query
}
Parameters | Explanation |
---|---|
api_key |
Parameter defines the SerpApi private key to use. |
engine |
Set parameter to google_jobs to use the Google Jobs API engine. |
gl |
Parameter defines the country to use for the Google search. It's a two-letter country code. (e.g., us for the United States, uk for United Kingdom, or fr for France). Head to the Google countries page for a full list of supported Google countries. |
hl |
Parameter defines the language to use for the Google Jobs search. It's a two-letter language code. (e.g., en for English, es for Spanish, or fr for French). Head to the Google languages page for a full list of supported Google languages. |
q |
Parameter defines the query you want to search. |
Then, we create a search
object where the data is retrieved from the SerpApi backend. In the results
dictionary we get data from JSON:
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
Returns a compiled list of all job_id
using list comprehension:
return [job.get('job_id') for job in results['jobs_results']]
The function looks like this:
def extract_multiple_jobs():
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_jobs', # SerpApi search engine
'gl': 'us', # country of the search
'hl': 'en', # language of the search
'q': 'barista new york', # search query
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
return [job.get('job_id') for job in results['jobs_results']]
Scrape Google Jobs Listing
This function takes the job_ids
list and returns a list of all data.
Declaring the data
list where the extracted data will be added:
data = []
For each job_id
value in the job_ids
list, separate requests will be made and the corresponding data will be retrieved:
for job_id in job_ids:
# data extraction will be here
Next, we write a parameters for making a request:
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_jobs_listing', # SerpApi search engine
'q': job_id, # search query (job_id)
}
Parameters | Explanation |
---|---|
api_key |
Parameter defines the SerpApi private key to use. |
engine |
Set parameter to google_jobs_listing to use the Google Jobs Listing API engine. |
q |
Parameter defines the job_id string which can be obtained from Google Jobs API. |
Then, we create a search
object where the data is retrieved from the SerpApi backend. In the results
dictionary we get data from JSON:
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
We can then create a dictionary structure from values such as job_id
, apply_options
, salaries
, and ratings
. The extracted data is written according to the corresponding keys. After that, the dictionary is appended to the data
list:
data.append({
'job_id': job_id,
'apply_options': results.get('apply_options'),
'salaries': results.get('salaries'),
'ratings': results.get('ratings')
})
At the end of the function, the data
list is returned with the retrieved data for each job_id
:
return data
The complete function to scrape all data would look like this:
def scrape_google_jobs_listing(job_ids):
data = []
for job_id in job_ids:
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_jobs_listing', # SerpApi search engine
'q': job_id, # search query (job_id)
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
data.append({
'job_id': job_id,
'apply_options': results.get('apply_options'),
'salaries': results.get('salaries'),
'ratings': results.get('ratings')
})
return data
Output
[
{
"job_id": "eyJqb2JfdGl0bGUiOiJCYXJpc3RhIiwiaHRpZG9jaWQiOiJuc3Y1d1hyNXdFOEFBQUFBQUFBQUFBPT0iLCJnbCI6InVzIiwiaGwiOiJlbiIsImZjIjoiRXVJQkNxSUJRVUYwVm14aVFtcFdYMjl0V0ZadU9USTNWV0ZZUlZRek9XRTJPVlJtYUc1RVZtaGpaRk5WT1VFMlNYZFpaR2ROU0dzdFoyMVBkMmxmUTNKS2RUQnJjMWxFT0dZNFNHWnFXRUZNTjB4eFRWVmtMV1JRVVRWaVJGbFVSMVo1YmxsVWVuazVPRzlxVVVsTmVXcFJjRXhPVWpWbWMwdFlTMlo2V21SUU1XSkZZa2hTY2pKaGRYcEdlRzVxTVVWNGIwZ3lhVXd3UlZGVVZ6Tk5XSGRNYXpKbVYyVjNFaGQzYkhCeFdTMWZUMHhNTW01d2RGRlFNRGhwUW05QmF4b2lRVVJWZVVWSFpqSTJWMjF3TjBoU2FtNDRPSHB5WkVWTldVMVhVWGRTU1hwMVFRIiwiZmN2IjoiMyIsImZjX2lkIjoiZmNfMSIsImFwcGx5X2xpbmsiOnsidGl0bGUiOiIubkZnMmVie2ZvbnQtd2VpZ2h0OjUwMH0uQmk2RGRje2ZvbnQtd2VpZ2h0OjUwMH1BcHBseSBkaXJlY3RseSBvbiBDdWxpbmFyeSBBZ2VudHMiLCJsaW5rIjoiaHR0cHM6Ly9jdWxpbmFyeWFnZW50cy5jb20vam9icy80MTc4NjMtQmFyaXN0YT91dG1fY2FtcGFpZ249Z29vZ2xlX2pvYnNfYXBwbHlcdTAwMjZ1dG1fc291cmNlPWdvb2dsZV9qb2JzX2FwcGx5XHUwMDI2dXRtX21lZGl1bT1vcmdhbmljIn19",
"apply_options": [
{
"title": "Apply on Trabajo.org",
"link": "https://us.trabajo.org/job-1683-20221107-34e191c4eb8c8ca3ec69adfa55061df2?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
},
{
"title": "Apply on Jobs",
"link": "https://us.fidanto.com/jobs/job-opening/nov-2022/barista-1432712052?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
},
{
"title": "Apply on Craigslist",
"link": "https://newyork.craigslist.org/mnh/fbh/d/new-york-cafe-barista/7553733276.html?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
},
{
"title": "Apply directly on Culinary Agents",
"link": "https://culinaryagents.com/jobs/417863-Barista?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
}
],
"salaries": null,
"ratings": null
},
... other results
]
Links
Add a Feature Requestπ« or a Bugπ
Top comments (0)