DEV Community

Cover image for πŸš€ πŸ€–πŸ’»πŸ” How to scrape g2 using Python, Selenium and Bose Framework πŸ…Ά2οΈβƒ£πŸπŸ–₯️
Chetan
Chetan

Posted on

πŸš€ πŸ€–πŸ’»πŸ” How to scrape g2 using Python, Selenium and Bose Framework πŸ…Ά2οΈβƒ£πŸπŸ–₯️

g2

Introduction

In this article, you will learn how to scrape g2.com using Bose Framework.

Also, Scraping g2.com is an excellent way to do competitor analysis.

Bose Framework, is a Selenium based Bot Development Framework that provides a comprehensive set of tools and functionalities specifically aimed at making the Bot Development Process easy for Developers.

To make it easy to scrape g2.com, I have prepared a script that you can use to scrape g2 effectively. This article will walk you through the steps of utilizing the script.

Installation

  1. Clone Starter Template
git clone https://github.com/omkarcloud/g2-scraper
cd g2-scraper
Enter fullscreen mode Exit fullscreen mode
  1. Install dependencies
python -m pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Usage

  • In extract_product_links.py specify your Task.product_url
  • Run Project
python main.py
Enter fullscreen mode Exit fullscreen mode

The script will start running and output progress updates to the console. When the scraper is complete, it will generate a JSON file named pending.json in the output directory. The JSON file will contain the product links.

Once the bot is detected by Cloudflare, the script will recognize it and prompt you to press the "Enter" key in the console once you have successfully solved the Cloudflare captcha.

Additionaly, you don't have to configure the Selenium driver as it will automatically download the appropriate driver based on your Chrome browser version.

  • In main.py change task variable to src.extract_product_links
  • Rerun Project
python main.py
Enter fullscreen mode Exit fullscreen mode
  • The products will be extracted and stored in the output/finished.csv and output/finished.json file after scraping.

Top comments (0)