DEV Community

Cover image for How to scrape Bing Search with Node.js?
Mikhail Zub
Mikhail Zub

Posted on

How to scrape Bing Search with Node.js?

Intro

In this post, I want to explain how to scrape Microsoft Bing search results with Node.js. I will show you several ways to do this.

Preparation

First, we need to create a Node.js project and add npm package "Puppeeteer". To do this, in the directory with our project, open the command line and enter:
npm init -y
then:
npm i puppeteer

What will be scraped

Bing search

Process

SelectorGadget Chrome extension was used to grab CSS selectors.
The Gif below illustrates the approach of selecting different parts of the organic results.
Grab CSS selectors

Code

const puppeteer = require("puppeteer");

const searchString = "cheerio js";
const encodedString = encodeURI(searchString);

async function getOrganicResults() {
  const browser = await puppeteer.launch({
    headless: false,
    args: ["--no-sandbox", "--disable-setuid-sandbox"],
  });

  const page = await browser.newPage();

  await page.setDefaultNavigationTimeout(60000);
  await page.goto(
    `https://bing.com/search?q=${encodedString}&setmkt=en-WW&setlang=en`
  );
  await page.waitForSelector(".b_pag");
  const numberOfResults = await page.$$("#b_results > li");
  for (let i = 1; i <= numberOfResults.length; i++) {
    await page.hover(`#b_results > li:nth-child(${i})`);
    await page.waitForTimeout(1000);
  }
  await page.hover(".b_pag");

  const result = await page.evaluate(function () {
    return Array.from(document.querySelectorAll("li.b_algo")).map((el) => ({
      link: el.querySelector("h2 > a").getAttribute("href"),
      title: el.querySelector("h2 > a").innerText,
      snippet: el.querySelector("p, .b_mText div").innerText,
    }));
  });

  await browser.close();

  console.log(result);
  return result;
}

getOrganicResults();
Enter fullscreen mode Exit fullscreen mode

Output

[
  {
    link: 'https://cheerio.js.org/',
    title: 'cheerio - js',
    snippet: 'Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API. ϟ Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient.'
  },
  {
    link: 'https://github.com/cheeriojs/cheerio',
    title: 'GitHub - cheeriojs/cheerio: Fast, flexible, and lean ...',
    snippet: "28/07/2017 · Cheerio's selector implementation is nearly identical to jQuery's, so the API is very similar. $( selector, [context], [root] ) selector searches within the context scope which searches within the root scope. selector and context can be a string expression, DOM Element, array of DOM elements, or cheerio object. root is typically the HTML document string."
  },
  ...
Enter fullscreen mode Exit fullscreen mode

Using Bing Search Organic Results API

SerpApi is a free API with 100 search per month. If you need more searches, there are paid plans.

The difference is that all that needs to be done is just to iterate over a ready made, structured JSON instead of coding everything from scratch, and selecting correct selectors which could be time consuming at times.

First we need to install "google-search-results-nodejs". To do this you need to enter:
npm i google-search-results-nodejs

Code

const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch("YOUR_SECRET_KEY");//To get the key, register on serpapi.com

const params = {
  engine: "bing",
  q: "cheerio js",
};

const callback = function (data) {
  console.log("SerpApi results:");
  console.log(data.organic_results);
};

search.json(params, callback);
Enter fullscreen mode Exit fullscreen mode

Output

[
  {
    position: 1,
    title: 'cheerio - js',
    link: 'https://cheerio.js.org/',
    displayed_link: 'https://cheerio.js.org',
    snippet: 'Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure. It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript. This makes Cheerio …',
    sitelinks: { expanded: [Array] }
  },
  {
    position: 2,
    title: 'GitHub - cheeriojs/cheerio: Fast, flexible, and lean ...',
    link: 'https://github.com/cheeriojs/cheerio',
    displayed_link: 'https://github.com/cheeriojs/cheerio',
    snippet: 'Jul 28, 2017 · Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure. It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript. This makes Cheerio much, much faster than other solutions.'
  },
...
Enter fullscreen mode Exit fullscreen mode

Links

Code in the online IDESerpApi Playground

Outro

If you want to see how to scrape something using Node.js that I didn't write about yet or you want to see some project made with SerpApi, please write me a message.

Discussion (0)