Intro
In this post, I want to explain how to scrape Microsoft Bing search results with Node.js. I will show you several ways to do this.
Preparation
First, we need to create a Node.js project and add npm package "Puppeeteer". To do this, in the directory with our project, open the command line and enter:
npm init -y
then:
npm i puppeteer
What will be scraped
Process
SelectorGadget Chrome extension was used to grab CSS selectors.
The Gif below illustrates the approach of selecting different parts of the organic results.
Code
const puppeteer = require("puppeteer");
const searchString = "cheerio js";
const encodedString = encodeURI(searchString);
async function getOrganicResults() {
const browser = await puppeteer.launch({
headless: false,
args: ["--no-sandbox", "--disable-setuid-sandbox"],
});
const page = await browser.newPage();
await page.setDefaultNavigationTimeout(60000);
await page.goto(
`https://bing.com/search?q=${encodedString}&setmkt=en-WW&setlang=en`
);
await page.waitForSelector(".b_pag");
const numberOfResults = await page.$$("#b_results > li");
for (let i = 1; i <= numberOfResults.length; i++) {
await page.hover(`#b_results > li:nth-child(${i})`);
await page.waitForTimeout(1000);
}
await page.hover(".b_pag");
const result = await page.evaluate(function () {
return Array.from(document.querySelectorAll("li.b_algo")).map((el) => ({
link: el.querySelector("h2 > a").getAttribute("href"),
title: el.querySelector("h2 > a").innerText,
snippet: el.querySelector("p, .b_mText div").innerText,
}));
});
await browser.close();
console.log(result);
return result;
}
getOrganicResults();
Output
[
{
link: 'https://cheerio.js.org/',
title: 'cheerio - js',
snippet: 'Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API. ϟ Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient.'
},
{
link: 'https://github.com/cheeriojs/cheerio',
title: 'GitHub - cheeriojs/cheerio: Fast, flexible, and lean ...',
snippet: "28/07/2017 · Cheerio's selector implementation is nearly identical to jQuery's, so the API is very similar. $( selector, [context], [root] ) selector searches within the context scope which searches within the root scope. selector and context can be a string expression, DOM Element, array of DOM elements, or cheerio object. root is typically the HTML document string."
},
...
Using Bing Search Organic Results API
SerpApi is a free API with 100 search per month. If you need more searches, there are paid plans.
The difference is that all that needs to be done is just to iterate over a ready made, structured JSON instead of coding everything from scratch, and selecting correct selectors which could be time consuming at times.
First we need to install "google-search-results-nodejs". To do this you need to enter:
npm i google-search-results-nodejs
Code
const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch("YOUR_SECRET_KEY");//To get the key, register on serpapi.com
const params = {
engine: "bing",
q: "cheerio js",
};
const callback = function (data) {
console.log("SerpApi results:");
console.log(data.organic_results);
};
search.json(params, callback);
Output
[
{
position: 1,
title: 'cheerio - js',
link: 'https://cheerio.js.org/',
displayed_link: 'https://cheerio.js.org',
snippet: 'Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure. It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript. This makes Cheerio …',
sitelinks: { expanded: [Array] }
},
{
position: 2,
title: 'GitHub - cheeriojs/cheerio: Fast, flexible, and lean ...',
link: 'https://github.com/cheeriojs/cheerio',
displayed_link: 'https://github.com/cheeriojs/cheerio',
snippet: 'Jul 28, 2017 · Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure. It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript. This makes Cheerio much, much faster than other solutions.'
},
...
Links
Code in the online IDE • SerpApi Playground
Outro
If you want to see how to scrape something using Node.js that I didn't write about yet or you want to see some project made with SerpApi, please write me a message.
Top comments (0)