Decentraliseur

Posted on Mar 27, 2022 • Edited on Mar 31, 2022

My First Repo: Scraping API + Node.js SDK with Captcha Bypass

#showdev #typescript #github #node

Hi devs,

I've just created my absolutely first Github repo, and I wanted to get your opinion on it.

The concerned project is a Nodejs SDK for web scraping that, connected to an API I maintain, aims to be ultra easy to use. (I've planned to adapt this SDK to Python in the future)

The easy-to-use aspect is really one of the key points of the project, with the pricing.

I've used many web scraping apis in the past, and I found them costly and the dev experience was terrible.

ScrapingAPI is supposed to simplify the whole process of extracting data from the web: captcha bypass, requests and clean data extraction.

The API isn't live for now, but I could be glad if you could give me some feedback about what could be added, improved or removed from the readme.

Here is the github: https://github.com/scrapingapi/scraper

Many thanks in advance

Top comments (1)

Decentraliseur • Mar 31 '22

I've just updated it with simple examples on scraping Google search results and Amazon product info.

Google search for bitcoin: get current price and results

import Scraper, { $ } from 'scrapingapi';
const page = new Scraper('API_KEY');

// Scrape Google search results for "bitcoin"
page.get("https://www.google.com/search?q=bitcoin", { device: "desktop" }, {
    // Extract the current bitcoin price                  
    price: $("#search .obcontainer .card-section > div:eq(1)").filter("price"),
    // For each Google search result
    results: $("h2:contains('Web results') + div").each({
        // We retrieve the URL
        url: $("a[href]").attr("href").filter("url"),
        // ... And the title text
        title: $("h3")
    })
}).then( data => {

    console.log("Here are the results:", data);

});

You get this response:

{
    "url": "https://www.google.com/search?q=bitcoin",
    "status": 200,
    "time": 2.930,
    "data": {
        "price": {
            "amount": 49805.02,
            "currency": "EUR"
        },
        "results": [{
            "url": "https://bitcoin.org/",
            "title": "Bitcoin - Open source P2P money"
        }, {
            "url": "https://coinmarketcap.com/currencies/bitcoin/",
            "title": "Bitcoin price today, BTC to USD live, marketcap and chart"
        }, {
            "url": "https://www.bitcoin.com/",
            "title": "Bitcoin.com | Buy BTC, ETH & BCH | Wallet, news, markets ..."
        }, {
            "url": "https://en.wikipedia.org/wiki/Bitcoin",
            "title": "Bitcoin - Wikipedia"
        }]
    }
}

Amazon product page

(I didn't know what product to pick, so I took a random but cool tshirt)

page.get("https://www.amazon.com/dp/B08L76BSZ5", { device: 'mobile', withHeaders: true }, {

    title: $("#title"),
    price: $("#corePrice_feature_div .a-offscreen:first"),
    image: $("#main-image").attr("src"),
    reviews: {
        rating: $(".cr-widget-Acr [data-hook='average-stars-rating-text']")
    }

});

Response:

{
    "url": 'https://www.amazon.com/dp/B08L76BSZ5',
    "status": 200,
    "time": 5.329,
    "data": {
      "title": "sportbull Unisex 3D Printed Graphics Novelty Casual Short Sleeve T-Shirts Tees",
      "price": "$9.99",
      "image": "https://m.media-amazon.com/images/I/71c3pFtZywL._AC_AC_SY350_QL65_.jpg",
      "reviews": {
          "rating": "4.4 out of 5"
      }
}

Next steps

I'm currently writing a price comparison api, so I (and you soon) can test the scraping api in a real world usage.
I hope the test version will be live this weekend. Totally free of course, I don't want to charge for an unstable service

Peace

DEV Community

My First Repo: Scraping API + Node.js SDK with Captcha Bypass

Top comments (1)

Amazon product page

Response:

Next steps

Read next

I made OpenAPI and LLM schema definitions

Github's Top 36 items of Dec 19, 2024

Tutorial: Laravel Next.js Tutorial

I have forked pump.fun completely