DEV Community

TeraCrawler
TeraCrawler

Posted on

Lead Generation Through Web Scraping

Web scraping is one of the best ways to generate leads for your business quite easily.

Today we will look at a practical example of how to get leads from online business directories.

Here are 57 of them that you can scrape leads from

That's a lot. So in this article lets learn how to scrape one of them, Yellow pages so that you can use the same techniques to scrape data from the others. Here we are imagining a scenario where we are looking to generate a list of Dentists that we can target.

BeautifulSoup will help us extract information, and we will retrieve crucial pieces of information from Yellow Pages.

To start with, this is the boilerplate code we need to get the Yellowpages.com search results page and set up BeautifulSoup to help us use CSS selectors to query the page for meaningful data.

We are also passing the user agent headers to simulate a browser call, so we don't get blocked.

Now let's analyse the Yellow pages search results. This is how it looks.

And when we inspect the page, we find that each of the items HTML is encapsulated in a tag with the class v-card

We could just use this to break the HTML document into these cards which contain individual item information like this.

And when you run it.

You can tell that the code is isolating the v-cards HTML

On further inspection, you can see that the name of the place always has the class business-name. So let's try and retrieve that.

That will get us the names.

Bingo!

Now let's get the other data pieces.

And when run.

Produces all the info we need including ratings, reviews, phone, address etc

If you want to use this in production and want to scale to thousands of links, then you will find that you will get IP blocked easily by Yellow Pages. In this scenario, using a rotating proxy service to rotate IPs is almost a must. You can use a service like Proxies API to route your calls through a pool of millions of residential proxies.

If you want to scale the crawling speed and don't want to set up you own infrastructure, you can use our Cloud base crawler crawltohell.com to easily crawl thousands of URLs at high speed from our network of crawlers.

Top comments (0)