How to Extract Bing Images Data with SerpApi and Python

#webscraping #tutorial #python #programming

Intro
What will be scraped
Why using API?
Full Code
Preparation
Code Explanation
Output
Links

Intro

In this blog post, we'll go through the process of extracting Bing Images using the Bing Images API and the Python programming language. You can look at the complete code in the online IDE (Replit).

What will be scraped

Why using API?

There're a couple of reasons that may use API, ours in particular:

No need to create a parser from scratch and maintain it.
Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
Pay for proxies, and CAPTCHA solvers.
Figure out the legal part of scraping data.

SerpApi handles everything on the backend with fast response times under ~1.7 seconds per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

Full Code

This code retrieves all the data with pagination:

from serpapi import BingSearch
import json

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'q': 'Coffee',                  # search query
    'engine': 'bing_images',        # search engine
    'cc': 'US',                     # country of the search
    'first': 1,                     # pagination
    'count': 50,                    # number of results per page
    # 'imagesize': 'wallpaper',     # filtering by size
    # 'color2': 'color',            # filtering by color
    # 'photo': 'photo',             # filtering by image type
    # 'aspect': 'wide',             # filtering by layout
    # 'face': 'portrait',           # filtering by people
    # 'age': 'lt525600',            # filtering by date
    # 'license': 'Type-Any'         # filtering by license
}

search = BingSearch(params)         # data extraction on the SerpApi backend
results = search.get_dict()         # JSON -> Python dict

bing_images_results = {
    'images_results': [],
    'suggested_searches': [],
    'refined_searches': results.get('refined_searches', []),
    'related_searches': results.get('related_searches', []),
    'shopping_results': results.get('shopping_results', [])
}

page_count = 0
page_limit = 10

while 'error' not in results and page_count < page_limit:
    bing_images_results['images_results'].extend(results.get('images_results', []))
    bing_images_results['suggested_searches'].extend(results.get('suggested_searches', []))

    params['first'] += params['count']
    page_count += 1
    results = search.get_dict()

print(json.dumps(bing_images_results, indent=2, ensure_ascii=False))

Preparation

Install library:

pip install google-search-results

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import BingSearch
import json

Library	Purpose
`BingSearch`	to scrape and parse Bing results using SerpApi web scraping library.
`json`	to convert extracted data to a JSON object.

The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary:

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'q': 'Coffee',                  # search query
    'engine': 'bing_images',        # search engine
    'cc': 'US',                     # country of the search
    'first': 1,                     # pagination
    'count': 50,                    # number of results per page
    # 'imagesize': 'wallpaper',     # filtering by size
    # 'color2': 'color',            # filtering by color
    # 'photo': 'photo',             # filtering by image type
    # 'aspect': 'wide',             # filtering by layout
    # 'face': 'portrait',           # filtering by people
    # 'age': 'lt525600',            # filtering by date
    # 'license': 'Type-Any'         # filtering by license
}

Parameters	Explanation
`api_key`	Parameter defines the SerpApi private key to use.
`q`	Parameter defines the search query. You can use anything that you would use in a regular Bing Images search.
`engine`	Set parameter to `bing_images` to use the Bing Images API engine.
`cc`	Parameter defines the country to search from. It follows the 2-character ISO_3166-1 format. (e.g., `us` for United States, `de` for Germany, `gb` for United Kingdom, etc.).
`first`	Parameter controls the offset of the organic results. This parameter defaults to `1`. (e.g., `first=10` will move the 10th organic result to the first position).
`count`	Parameter controls the number of results per page. This parameter is only a suggestion and might not reflect the returned results.
`imagesize`	Parameter is used for filtering images by size. It can be set to: `small`, `medium`, `large`, `wallpaper`.
`color2`	Parameter is used for filtering images by color. It can be set to: `color` - Color Only, `bw` - Black & white, `FGcls_GREEN` - Green, etc.
`photo`	Parameter is used for filtering images by image type. It can be set to: `photo`, `clipart`, `linedrawing`, `animatedgif`, `transparent`.
`aspect`	Parameter is used for filtering images by layout. It can be set to: `square`, `wide`, `tall`.
`face`	Parameter is used for filtering images by people. It can be set to: `face` - Faces Only, `portrait` - Head & Shoulders.
`age`	Parameter is used for filtering images by date. It can be set to: `lt1440` - Past 24 hours, `lt10080` - Past week, `lt43200` - Past month, `lt525600` - Past year.
`license`	Parameter is used for filtering images by license. It can be set to: `Type-Any` - All Creative Commons, `L1` - Public Domain, `L2_L3_L4_L5_L6_L7` - Free to share and use, `L2_L3_L4` - Free to share and use commercially, `L2_L3_L5_L6` - Free to modify, share and use, `L2_L3` - Free to modify, share, and use commercially.

📌Note: You can also add other API Parameters.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = BingSearch(params)         # data extraction on the SerpApi backend
results = search.get_dict()         # JSON -> Python dict

At the moment, the results dictionary only stores data from 1 page. Before extracting data, the bing_images_results dictionary is created where this data will be added later. Since some of the data is only displayed on the first page, you can extract them immediately:

bing_images_results = {
    'images_results': [],
    'suggested_searches': [],
    'refined_searches': results.get('refined_searches', []),
    'related_searches': results.get('related_searches', []),
    'shopping_results': results.get('shopping_results', [])
}

The page_limit variable defines the page limit. If you want to extract data from a different number of pages, then simply write the required number into this variable.

page_limit = 10

To get all results, you need to apply pagination. This is achieved by the following check: while there is no error in the results and the current page_count value is less than the specified page_limit value, we extract the data, increase the first parameter by the value of the count parameter to get the results from next page and update the results object with the new page data:

page_count = 0

while 'error' not in results and page_count < page_limit:
    # data extraction from current page will be here

    params['first'] += params['count']
    page_count += 1
    results = search.get_dict()

Lists by corresponding keys are extended with new data from each page:

bing_images_results['images_results'].extend(results.get('images_results', []))
bing_images_results['suggested_searches'].extend(results.get('suggested_searches', []))
# thumbnail= results['images_results'][0]['thumbnail']
# link = results['images_results'][0]['link']
# title = results['images_results'][0]['title']

📌Note: In the comments above, I showed how to extract specific fields. You may have noticed the results['images_results'][0]. This is the index of a organic result, which means that we are extracting data from the first organic result. The results['images_results'][1] is from the second organic result and so on.

After the all data is retrieved, it is output in JSON format:

print(json.dumps(bing_images_results, indent=2, ensure_ascii=False))

Output

{
  "images_results": [
    {
      "thumbnail": "https://th.bing.com/th/id/OIP.-ACVKbQLg-aRb2bVxy1jfAHaE8?w=237&h=180&c=7&r=0&o=5&pid=1.7",
      "link": "https://www.bing.com/images/search?view=detailV2&ccid=%2bACVKbQL&id=16A3CD2F2C1C0D71AE4EA5B08475E79A9A2D543C&thid=OIP.-ACVKbQLg-aRb2bVxy1jfAHaE8&mediaurl=https%3a%2f%2fimages.freeimages.com%2fimages%2flarge-previews%2fbc0%2fcoffee-1317648.jpg&cdnurl=https%3a%2f%2fth.bing.com%2fth%2fid%2fR.f8009529b40b83e6916f66d5c72d637c%3frik%3dPFQtmprndYSwpQ%26pid%3dImgRaw%26r%3d0&exph=1066&expw=1599&q=Coffee&simid=608042879963195855&FORM=IRPRST&ck=733C3F62E8891EC1D9ECD99852D9CB5E&selectedIndex=0",
      "title": "coffee Free Photo Download | FreeImages",
      "size": "1599 x 1066 · jpeg",
      "source": "https://www.freeimages.com/photo/coffee-1317648",
      "domain": "freeimages.com",
      "original": "https://images.freeimages.com/images/large-previews/bc0/coffee-1317648.jpg",
      "description": "coffee freeimages",
      "position": 1
    },
    {
      "thumbnail": "https://th.bing.com/th/id/OIP.ceGbTJIrj6ajM2XlsyFXAwAAAA?w=238&h=180&c=7&r=0&o=5&pid=1.7",
      "link": "https://www.bing.com/images/search?view=detailV2&ccid=ceGbTJIr&id=A7A5C7AA478FAAAE30E29D2927035138A7ED4F86&thid=OIP.ceGbTJIrj6ajM2XlsyFXAwAAAA&mediaurl=https%3a%2f%2fwww.indigofinance.com.au%2fwp-content%2fuploads%2f2018%2f07%2fistock-157528129.jpg&cdnurl=https%3a%2f%2fth.bing.com%2fth%2fid%2fR.71e19b4c922b8fa6a33365e5b3215703%3frik%3dhk%252ftpzhRAycpnQ%26pid%3dImgRaw%26r%3d0&exph=315&expw=474&q=Coffee&simid=608037970811556740&FORM=IRPRST&ck=03F56E55E9A229798D3F9E0E47710CF6&selectedIndex=1",
      "title": "Coffee Addiction! Can it be tamed? - Indigo Finance",
      "size": "474 x 315 · jpeg",
      "source": "https://www.indigofinance.com.au/coffee-addiction-can-tamed/",
      "domain": "indigofinance.com.au",
      "original": "https://www.indigofinance.com.au/wp-content/uploads/2018/07/istock-157528129.jpg",
      "description": "coffee addiction",
      "position": 2
    },
    {
      "thumbnail": "https://serpapi.com/searches/641c636f47ccf1fac0af6454/images/d8b77d8a83f199ee4f68a702661ae07622e8f07644945d882e253318c4e63e4d.gif",
      "link": "https://www.bing.com/images/search?view=detailV2&ccid=%2fO2T2GEH&id=CDAE3AA25A9EAAC5AAFCFEAE226C4E400B130445&thid=OIP._O2T2GEHLIPfkjeW55c4fAHaEo&mediaurl=https%3a%2f%2fth.bing.com%2fth%2fid%2fR.fced93d861072c83df923796e797387c%3frik%3dRQQTC0BObCKu%252fg%26riu%3dhttp%253a%252f%252fi.slimg.com%252fsc%252fsl%252fphoto%252ff%252ffo%252ffood-coffeeandcoffeebeans-dd.jpg%26ehk%3dd3WNdg6Q8xrjnwbPLfOtvTsmTGRas1ZKp81pa7PKQqw%253d%26risl%3d%26pid%3dImgRaw%26r%3d0&exph=382&expw=610&q=Coffee&simid=607997220165786271&FORM=IRPRST&ck=03A478D663783F2B09856F5B2AAAE68B&selectedIndex=2",
      "title": "America's Best Coffee Shops - SmarterTravel.com",
      "size": "610 x 382 · jpeg",
      "source": "http://www.smartertravel.com/blogs/today-in-travel/america-best-coffee-shops.html?id=10909975",
      "domain": "smartertravel.com",
      "original": "http://i.slimg.com/sc/sl/photo/f/fo/food-coffeeandcoffeebeans-dd.jpg",
      "description": "coffee shops america cafe coffe food cup foods wine facts espresso coffees tweet caffeine un good healthy give con flavors",
      "position": 3
    },
    ... other images results
  ],
  "suggested_searches": [
    {
      "thumbnail": "https://th.bing.com/th?q=Coffee+Maker&w=42&h=42&c=7&rs=1&p=0&o=5&pid=1.7&mkt=en-US&cc=US&setlang=en&adlt=moderate&t=1",
      "link": "https://www.bing.com/images/search?q=Coffee+Maker&FORM=RESTAB",
      "serpapi_link": "https://serpapi.com/search.json?device=desktop&engine=bing_images&q=Coffee+Maker",
      "name": "Coffee Maker"
    },
    {
      "thumbnail": "https://th.bing.com/th?q=Best+Coffee+Beans&w=42&h=42&c=7&rs=1&p=0&o=5&pid=1.7&mkt=en-US&cc=US&setlang=en&adlt=moderate&t=1",
      "link": "https://www.bing.com/images/search?q=Best+Coffee+Beans&FORM=RESTAB",
      "serpapi_link": "https://serpapi.com/search.json?device=desktop&engine=bing_images&q=Best+Coffee+Beans",
      "name": "Best Coffee Beans"
    },
    {
      "thumbnail": "https://th.bing.com/th?q=Coffee+Flavors&w=42&h=42&c=7&rs=1&p=0&o=5&pid=1.7&mkt=en-US&cc=US&setlang=en&adlt=moderate&t=1",
      "link": "https://www.bing.com/images/search?q=Coffee+Flavors&FORM=RESTAB",
      "serpapi_link": "https://serpapi.com/search.json?device=desktop&engine=bing_images&q=Coffee+Flavors",
      "name": "Coffee Flavors"
    },
    ... other suggested searches
  ],
  "refined_searches": [
    {
      "thumbnail": "https://th.bing.com/th?q=Coffee+Cart&w=120&h=120&c=1&rs=1&qlt=90&cb=1&pid=InlineBlock&mkt=en-US&cc=US&setlang=en&adlt=moderate&t=1&mw=247",
      "link": "https://www.bing.com/images/search?q=Coffee+Cart&qft=&FORM=IRTRRL",
      "serpapi_link": "https://serpapi.com/search.json?device=desktop&engine=bing_images&q=Coffee+Cart",
      "name": "Cart"
    },
    {
      "thumbnail": "https://th.bing.com/th?q=Healthy+Coffee&w=120&h=120&c=1&rs=1&qlt=90&cb=1&pid=InlineBlock&mkt=en-US&cc=US&setlang=en&adlt=moderate&t=1&mw=247",
      "link": "https://www.bing.com/images/search?q=Healthy+Coffee&qft=&FORM=IRTRRL",
      "serpapi_link": "https://serpapi.com/search.json?device=desktop&engine=bing_images&q=Healthy+Coffee",
      "name": "Healthy"
    },
    {
      "thumbnail": "https://th.bing.com/th?q=Coffee+Packaging&w=120&h=120&c=1&rs=1&qlt=90&cb=1&pid=InlineBlock&mkt=en-US&cc=US&setlang=en&adlt=moderate&t=1&mw=247",
      "link": "https://www.bing.com/images/search?q=Coffee+Packaging&qft=&FORM=IRTRRL",
      "serpapi_link": "https://serpapi.com/search.json?device=desktop&engine=bing_images&q=Coffee+Packaging",
      "name": "Packaging"
    },
    ... other refined searches
  ],
  "related_searches": [
    {
      "link": "https://www.bing.com/images/search?q=Teeccino+Coffee&qft=&fsm=1&FORM=SHOPSO",
      "serpapi_link": "https://serpapi.com/search.json?device=desktop&engine=bing_images&q=Teeccino+Coffee",
      "name": "Teeccino Coffee"
    },
    {
      "link": "https://www.bing.com/images/search?q=Espresso+Coffees&qft=&fsm=1&FORM=SHOPSO",
      "serpapi_link": "https://serpapi.com/search.json?device=desktop&engine=bing_images&q=Espresso+Coffees",
      "name": "Espresso Coffees"
    },
    {
      "link": "https://www.bing.com/images/search?q=Folgers+Coffee&qft=&fsm=1&FORM=SHOPSO",
      "serpapi_link": "https://serpapi.com/search.json?device=desktop&engine=bing_images&q=Folgers+Coffee",
      "name": "Folgers Coffee"
    }
  ],
  "shopping_results": []
}