loading...

How to Scrape a Static Website

robghchen profile image Robert Chen Originally published at Medium on ・2 min read

A really quick tutorial

Prerequisites: Knowledge of React.js will be required for this tutorial.

Let’s say you want to pull data from the frontend of a website because there’s no API available. You inspect the page and see that the data is available in the HTML, so how do you gather that information to be used in your app? It’s rather simple, we’re going to install two libraries and write less than 50 lines of code to demonstrate the scraping of a website. To keep this tutorial simple, we’ll use https://pokedex.org/ as our example.

1) In terminal:

create-react-app scraping-demo
cd scraping-demo
npm i request-promise
npm i cheerio
Enter fullscreen mode Exit fullscreen mode

2) We’re going to start by using request-promise to get the HTML from https://pokedex.org/ into a console log.

In App.js:

3) Sometimes you may come across a CORS error blocking you from fetching. For demonstration purposes, try fetching pokemon.com

rp("https://www.pokemon.com/us/pokedex/")
Enter fullscreen mode Exit fullscreen mode

You should see an error like this in the console:

error

4) You can get around CORS by using https://cors-anywhere.herokuapp.com. Simply add that URL before your desired fetch URL like so:

rp("https://cors-anywhere.herokuapp.com/https://www.pokemon.com/us/pokedex/")
Enter fullscreen mode Exit fullscreen mode

Now you should be able to see the HTML from pokemon.com show in your console.

5) But we won’t have to use cors-anywhere for rp("https://pokedex.org/"), so let’s proceed

console

6) Now that we have the HTML, let’s use the cheerio library to help us grab the exact data that we want from desired element tags. In this example, we’ll grab all the names of the pokemon then display them in a list.

In App.js:

7) You should see a list of all the pokemon names display onto your screen:

list

It’s that simple! You scraped those names from the HTML without having to directly access any backend. Now try scraping the examples on http://toscrape.com/ for practice. Enjoy your new abilities!


I'll be coding live Saturdays 11am-3pm EST, ask questions! https://www.youtube.com/channel/UCHXw3WolW7kEZvExoA3P3VQ
https://www.twitch.tv/gameincode

Discussion

pic
Editor guide