DEV Community: Scraper.AI

Checking the availability of NVIDIA and AMD Graphic Cards and CPUs

Scraper.AI — Fri, 08 Jan 2021 11:58:17 +0000

The new NVIDIA RTX3070, RTX3080 and RTX3090 Graphic cards have just been released, but buying one is going to be an adventure on its own! Currently there is almost no stock, and if there is, Scalpers who monitor these stocks and steal them from you are going to make your life difficult!

But what if you could play a scalper yourself, and only needed 5 minutes to do so?! Well let me introduce a way on how you can easily monitor these websites yourself and get notified when the stock changes!

Let's start off by loading the NVIDIA Website for the Founders Edition card (I used the German website).

With Scraper.AI we can now easily state that we want to extract the price and details, but also put an interval on how frequently we wish to extract information! So open up the extension and select that you wish to "Monitor Data"

The main part will now open up and you will be able to utilize the Single mode to select non-repeating content such as the Price, Image and Title of the graphic card.

Once that is done, just specify the interval you wish to extract the data for and select Finish

Your data is now being extracted and the dashboard will show up with your extracted record!

Based on the interval specified in the extension (or as you can also see in the Schedule **section). Your data will now be updated! A Final thing** to do however, is to make sure that we get notified of changes! So open up the **Notifications **section and fill in your email!

Next time data is updated, you will now be notified of any changes through your email!

The 11 best free web scraping tools that can use proxies compared

Scraper.AI — Fri, 30 Oct 2020 06:35:29 +0000

Scraper.AI

Website: https://scraper.ai

Scraper.AI is a new player on the market offering a wide variety of features like scraping websites with multiple pages, scrollable pages, authenticated pages and many more. Next to this you're also future proofed as they offer a great API for extracting pages through the API yourself.

Not that technical? No problem, with their unique visual extractor you can extract any data you want without tprogramming knowledge!

Advantages

Many features
Intuïtive UI
Easy to learn, no extensive tutorials needed to get started
Uses Many proxies to give a consistent result
Fast
Free plan available, cheap compared to others
It's a SaaS, no need to keep your browser open for a long time

Disadvantages

It's an overall solution, not niche targeting
Rather new player on the market

Octoparse

Website: https://www.octoparse.com

A Free, Simple, and Powerful Web Scraping Tool. Automate Data Extraction from websites within clicks without coding.

Advantages

Focuses more on niches scraping
Fair pricing
Consistent results
It's a SaaS, no need to keep your browser open for a long time

Disadvantages

Steep learning curve
Doesn't offer API scraping

Scrapy

Website: https://github.com/scrapy/scrapy

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Advantages

Most popular python library
Open-source

Disadvantages

You still need to run your own servers
Only for scraping
Still need programmers to implement it

Puppeteer

Website: https://github.com/puppeteer/puppeteer

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.

Advantages

Most popular node.js library for scraping
Battle tested
Open-Source
Reliable
Direct implementation for proxies

Disadvantages

Requires good knowledge of timeouts, scrape processing, ...
You still need to run your own servers
Only for scraping
Still need programmers to implement it

Playwright

Website: https://github.com/microsoft/playwright

Playwright is a Node.js library to automate Chromium, Firefox and WebKit with a single API. Playwright is built to enable cross-browser web automation that is ever-green, capable, reliable and fast.

Advantages

Competitor to puppeteer
Open-Source
Reliable

Disadvantages

Harder to use than Puppeteer
Requires a lot of tweaking per browser
Newer than puppeteer
You still need to run your own servers
Only for scraping
Still need programmers to implement it

Cheerio

Website: https://github.com/cheeriojs/cheerio

Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure. It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript. If your use case requires any of this functionality, you should consider projects like PhantomJS or JSDom.

Advantages

HTML parser
Famous open-source Node.JS library
Good functions for extracting data from a HTML

Disadvantages

not really a scraper, you need to render a page using puppeteer and then extract the data

BeautifulSoup

Website: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Advantages

HTML parser
Famous open-source Python library
Good functions for extracting data from a HTML ### Disadvantages

not really a scraper, you need to render a page using puppeteer and then extract the data

Scraper API

Website: https://www.scraperapi.com/

Scraper API handles proxies, browsers, and CAPTCHAs, so you can get the HTML from any web page with a simple API call!

Advantages

Reliable results
Many proxies available
Good at it's single feature, rendering a webpage using it's API

Disadvantages

Programming knowledge required

Selenium

Website: https://www.selenium.dev/

Selenium automates browsers. That's it! What you do with that power is entirely up to you. Primarily it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should) also be automated as well.

Advantages

Works well
Battle-proven
Open-source
Available for many programming languages

Disadvantages

Older technology
Can be a pain to set up

Mozenda

Website: https://www.mozenda.com/

A bigger web data extraction software that's often used by enterprise customers

Advantages

Works well
Battle-proven
Focuses on enterprises

Disadvantages

Expensive

Kimura

Website: https://github.com/vifreefly/kimuraframework

Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites.

Advantages

Ruby (if you use ruby often)
Open-source
Good documented setup

Disadvantages

Not frequently updated anymore

Tweet Cryptocurrency and Bitcoin prices every hour

Scraper.AI — Thu, 27 Aug 2020 14:17:05 +0000

Get started within 30min

It’s always interesting to get an update on the different Cryptocurrency pricings. Sadly enough there are not that many services out there that do that out of the box for us, so let’s create one!

Prerequisites

Scraper.AI account
AWS Account
IFTTT account

Flow

Once the prerequisites are met, we can now set-up a flow that will scrape the cryptocurrency pricings from https://coinmarketcap.com/ every hour and send a Tweet once these got received. To do this, we need to create a flow that looks like this:

Scraper.AI -> Data Processor -> Twitter

The easiest way to do this is to have a Serverless function that gets executed on the Scraper.AI Webhook, which will manipulate the data and send this to Twitter. To send to Twitter, we utilize the IFTTT service, which has a connector to send a Tweet. In components we will thus get something like this:

Scraper.AI -> AWS Lambda -> IFTTT -> Twitter

Setup

Setting up AWS Lambda

The first thing we want to do is to set-up our AWS Lambda function. Here we want to configure a function that can be called through an HTTP endpoint, so we have to add a Trigger with the API Gateway to it.

Once we then have our boilerplate and trigger configured, we can add a package.json and index.js

Note: you will have to create this locally, install the NPM modules with npm install, archive it into a .zip file and upload this to the Lambda portal.

This was the hardest part! The rest is just click-click-click 😉 so let’s continue!

Setting up IFTTT

IFTTT (If This, Then That) is an easy service to utilize. We can create an applet here that takes the URL called above in our AWS function (which we will get at the end here) and send it to Twitter.

Let’s configure IFTTT

Now Twitter has been connected, the last thing we need to do is to get the Link for our Webhook.

Note: this URL you will have to enter in the AWS Lambda Function!!!

Setting Up Scraper.AI

Last but not least, we need to monitor https://coinmarketcap.com/ to changes. Luckily we can utilize https://Scraper.AI here!
Navigate to https://coinmarketcap.com/ and select the properties name, price and volume

Then continue until the Website is scraped and displayed on Scraper.AI

Note: Configure scheduling to happen hourly! This ensures that you post the newest tweets every time.

Finally, go to Notifications and enter the AWS Lambda endpoint you copied earlier into the “Webhook” field.

Congratulations! Tweets should now start appearing into your Twitter account.

Note: to test this easily, you can “Scrape Manually” on the dashboard!

An example of this can be seen below:

11 usages for web scraping

Scraper.AI — Wed, 29 Jul 2020 05:58:26 +0000

and why you should start scraping the web now

Many people ask themselves the question, how do I improve in the areas covered below. In a lot of the cases it's important to know that web scraping plays a major role in it and that it's actually rather easy to get started.

With web scraping you can extract data from any website and as some may say "Data is the new gold". There is so much important data to be gathered from websites that you can make great business decisions on.

Below are our 11 usages for web scraping and why you should start scraping now.

Build your new product

No Code tools are on a rise. They allow you to create certain flows, analyze data, create stunning websites and more without any knowledge of coding. You don't have to be a technical founder anymore to start a business. But what many people don't know about web scraping is that you can leverage it to use that data as input to your new application. For example, you want to create a mobile application that shows the latest cryptocurrency prices? No problem, it only takes an hour.

SEO

Having organic growth is next to direct traffic, the biggest source of traffic coming to your website. Sadly it's also the hardest one to optimize for. It requires a lot of persistence, monitoring and analyzing to get ranked among the top. Even when you're at the top you still have to optimize for click throughs on your links. Tough business! Luckily this is where web scraping is very powerful. It can help to decrease your workload and automate some tasks for you. For example:

Help you track the ranking of your website for your specific keywords
Identify and monitor keywords of competitors by getting a daily list
Analyze the top ranking keywords and alternatives for you and a lot more.

Dataset creation

One of the big tasks of a data scientist, computer vision specialist and others that need data is the creation of a trustworthy and well composed dataset. Web scraping can take any website list and compose it into a usable data stream that you can build on top of. If you have a great tool like Scraper.AI you can also monitor that data and keep your dataset up-to-date in the land of fast changing data.

Competitor tracking

Knowing the competition is one of the most valuable techniques there is. It helps you to connect to your audience but also makes you stay on top of your sector. Having a web scraper and being to extract the prices of a webshop of your competitor ensures you that you can be the cheapest out there.

For beginning startups they can for example monitor new comers in their space and see what approach they're taking. It might mean that they've found product market fit.

Discovery

Where it all starts, the starts with is replaced with a fancy arrow (➜)

Starting a business ➜ Discovering an idea
Getting started with SEO ➜ Discovering keywords
Knowing your competition ➜ Discovering your competitors
Investing ➜ Discovering what to invest in
Buying a property ➜ Discovering a property
Making money on stocks ➜ Discovering what stocks to buy
...

This list can go on for a long while, an action almost always starts with the discovery of something. Discovering means that you want to get the data that justifies the decision you're gonna take to get to that action.

You might want to watch stock websites to get to learn about price changes of your favourite stock.

Web scraping can get these latests stocks, properties, and many more.

Product monitoring

You're a vendor, dropshipper, amazon seller or anyone that sells a product. A major part will be making sure that you're product fulfills certain demands. You want to watch for reviews, correct pricing, advertising, ... it's a lot of work.

Scraping reviews ensures that you can sustain you're high rating, act on low ratings and make sure that they get resolved correctly. Watching prices from competitors, analyzing your advertising metrics in one dashboard. They can all be automated with web scraping. In essence you're making your personalized software product.

Marketing automation

Finding potential influencers becomes a lot easier, you can go to instagram, facebook, quora, ... and get a list of comments or profiles with most views, likes, watches. The only thing you have to do is open up the scraper, select the names and you're ready to go. Most websites don't show emails anymore due to privacy reasons, but you can private message them or use some other tools to get the email for that user.

Lead generation

A great way to get interesting prospects for your business is generating leads. As soon as you have leads you can setup a sales pipeline and get these leads converted to customers.

Getting the leads

But getting these leads is not always straightforward. You first need to find your audience. Luckily there are already great tools out there like LinkedIn search, yellow pages, google maps, angellist, producthunt, ... They all have one thing in common. They show you a list of leads. Great! Because this means we can extract the data from that list and convert it to something usable for us, an excel file, csv, ... anything we need to get these leads converted and expand our business.

Investment optimization

As a Venture Capital firm you might be interested in startups that were backed by others VCs, you can just go to their page, spend an hour or so per day and get that list. Or you can scrape and monitor them. You can extract the data every day and get notified when a new startup gets added.

By also combining this with dataset creation you can go in the history of a startup and minimize the risk you take in investing in a startup.

Finance

Stocks, crypto, personal finances and more are something everyone has touched at least once. But it involves a lot of manual labour.

Cryptocurrency or stock investors might find themselves looking at listing sites hourly to get the latest prices and volumes. Monitoring these could come in handy and web scrapers can get you this information in no time, leaving you with more time to do the analysis. Some services also offer direct API endpoints making it as easy to integrate as possible

Your personal finances are important, you can scrape them and get notified when a bill comes through, a salary gets deposited and more. Become aware of any event that happens at any time.

Real estate monitoring

Buying a property can be painful, there are plenty of listing sites, aggregators, agents and more and they all have their own time schedule. And when a property gets listed it's a race to get there as the first potential buyer.

Automate this process and let it scrape every hour to get a list of new additions or updates and make sure that you're among the first to get there. Among the first I say because there are others doing exactly the same!

Summary

Web scraping is not new and has proven to be valuable. It can automate time-consuming tasks and leave you with more time in hand to focus on the exact problem you're trying to solve. Being from analyzing data to buying a property. Extracting data from the web has never been this easy and should be a must-do to get most out of your business.

Resources

Build your first crypto app using Amazon HoneyCode

Scraper.AI — Wed, 29 Jul 2020 05:53:23 +0000

Amazon has recently announced their new project called Amazon Honeycode which is a no-code tool focused on creating web & mobile applications.

The product goes from a data first perspective, you create a dataset and then visualize it. After importing data, you can manipulate it, create new aggregating columns, rename columns and much more.

Now, let dataset creation just be our strength over at scraper.ai. In this gentle introduction to Amazon HoneyCode we show how to get the data from a crypto website. Afterwards we’ll add the data to the Amazon HoneyCode platform. In the end we’ll have a fully functional web app ready to be published showing the latest crypto prices.

Get started

To get started, head over to a crypto website, in this example we’ll use https://coinmarketcap.com to get the latest crypto prices.

Let’s get the data going by opening up the https://scraper.ai extension. After clicking “Select Element” we select the data we’re interested in, in this case we’ll use:

Coin name (Bitcoin, Ethereum, …)
Market cap
Coin Price

which are also the labels we’re going to give to the fields.

After clicking next twice we’re shown the data we’ve just extracted. As you can see, there is also an API url which we’ll use in a following story.

For now we’ll keep it simple and “Download as CSV”. so let’s hit that button.

Now we have all the data required we can head over to Amazon HoneyCode. Creating an account for their service only takes 1 minute. Afterwards we’re shown the screen below.

Take the following steps:

Create Workbook
Import CSV
Select the CSV we’ve exported in the step above

That’s it for the data, the data is now imported to their data view and we’re ready to use it for our Application.

Let’s create the application by going to the “Apps” panel in the sidebar and clicking the Plus icon. We’ll choose “Use app wizard”.

When we select our newly created table “Table1” as source, all fields will be pre-filled. I suggest using some sensible names and not “Table1”, but to keep this guide easy to follow we’ve gone with the insensible “Table1” name.

After going through the “App wizard”, the app is opened up immediately.

For viewing the app on our mobile devices we have to use the Amazon HoneyCode app viewer (https://play.google.com/store/apps/details?id=com.amazon.aws.honeycode).

Summary

We’ve learned how easy it is to import data to Amazon HoneyCode and get an app up and running within minutes. For now the Amazon HoneyCode application is rather limited and only supports websites through their own app rather than having native apps. This narrows the scope more to enterprise applications or companies with a good distribution channel.

Amazon HoneyCode has some integrations set up to import data more easily, but for external services it’s quite cumbersome and I hope they’re making the process easier. For now it’s limited to accepted partners such as “Google Analytics”, “Marketo” and more.

In a following guide we’ll show how we can use other AWS services to import data into Amazon HoneyCode on a regular basis. Sadly it becomes rather technical due to the lack of 3rd party integrations available with Amazon HoneyCode.