DEV Community

Cover image for My First Repo: Scraping API + Node.js SDK with Captcha Bypass
Decentraliseur
Decentraliseur

Posted on • Updated on

My First Repo: Scraping API + Node.js SDK with Captcha Bypass

Hi devs,

I've just created my absolutely first Github repo, and I wanted to get your opinion on it.

The concerned project is a Nodejs SDK for web scraping that, connected to an API I maintain, aims to be ultra easy to use. (I've planned to adapt this SDK to Python in the future)

The easy-to-use aspect is really one of the key points of the project, with the pricing.

I've used many web scraping apis in the past, and I found them costly and the dev experience was terrible.

ScrapingAPI is supposed to simplify the whole process of extracting data from the web: captcha bypass, requests and clean data extraction.

The API isn't live for now, but I could be glad if you could give me some feedback about what could be added, improved or removed from the readme.

Here is the github: https://github.com/scrapingapi/scraper

Many thanks in advance

Top comments (1)

Collapse
 
gaetanlegac profile image
Decentraliseur

I've just updated it with simple examples on scraping Google search results and Amazon product info.

Google search for bitcoin: get current price and results

import Scraper, { $ } from 'scrapingapi';
const page = new Scraper('API_KEY');

// Scrape Google search results for "bitcoin"
page.get("https://www.google.com/search?q=bitcoin", { device: "desktop" }, {
    // Extract the current bitcoin price                  
    price: $("#search .obcontainer .card-section > div:eq(1)").filter("price"),
    // For each Google search result
    results: $("h2:contains('Web results') + div").each({
        // We retrieve the URL
        url: $("a[href]").attr("href").filter("url"),
        // ... And the title text
        title: $("h3")
    })
}).then( data => {

    console.log("Here are the results:", data);

});
Enter fullscreen mode Exit fullscreen mode

You get this response:

{
    "url": "https://www.google.com/search?q=bitcoin",
    "status": 200,
    "time": 2.930,
    "data": {
        "price": {
            "amount": 49805.02,
            "currency": "EUR"
        },
        "results": [{
            "url": "https://bitcoin.org/",
            "title": "Bitcoin - Open source P2P money"
        }, {
            "url": "https://coinmarketcap.com/currencies/bitcoin/",
            "title": "Bitcoin price today, BTC to USD live, marketcap and chart"
        }, {
            "url": "https://www.bitcoin.com/",
            "title": "Bitcoin.com | Buy BTC, ETH & BCH | Wallet, news, markets ..."
        }, {
            "url": "https://en.wikipedia.org/wiki/Bitcoin",
            "title": "Bitcoin - Wikipedia"
        }]
    }
}
Enter fullscreen mode Exit fullscreen mode

Amazon product page

(I didn't know what product to pick, so I took a random but cool tshirt)

page.get("https://www.amazon.com/dp/B08L76BSZ5", { device: 'mobile', withHeaders: true }, {

    title: $("#title"),
    price: $("#corePrice_feature_div .a-offscreen:first"),
    image: $("#main-image").attr("src"),
    reviews: {
        rating: $(".cr-widget-Acr [data-hook='average-stars-rating-text']")
    }

});
Enter fullscreen mode Exit fullscreen mode

Response:

{
    "url": 'https://www.amazon.com/dp/B08L76BSZ5',
    "status": 200,
    "time": 5.329,
    "data": {
      "title": "sportbull Unisex 3D Printed Graphics Novelty Casual Short Sleeve T-Shirts Tees",
      "price": "$9.99",
      "image": "https://m.media-amazon.com/images/I/71c3pFtZywL._AC_AC_SY350_QL65_.jpg",
      "reviews": {
          "rating": "4.4 out of 5"
      }
}
Enter fullscreen mode Exit fullscreen mode

Next steps

I'm currently writing a price comparison api, so I (and you soon) can test the scraping api in a real world usage.
I hope the test version will be live this weekend. Totally free of course, I don't want to charge for an unstable service

Peace