DEV Community

Neha Setia Nagpal
Neha Setia Nagpal

Posted on • Updated on

Let's go headless

Are you trying to scrape dynamic data, nested in Javascript elements or Ajax pages, which is not accessible from the raw HTML response that the server delivers?

Consider integrating a headless browser with the scraping logic...headless? Wait, What? but why?

Let me explain...

Websites have moved on from static HTML and CSS barebone structures to interactive and responsive Javascript structures that are rendered by browsers to dynamically inject HTML and CSS into websites when demanded.

Regular HTML scrapers do not include functionality to render the page full of Ajax and Javascript elements like a real user in an automated web scraping. Introducing Headless Browser...

A headless browser is a browser with no Graphical User Interface(GUI). Headless browsers exhibit human-like behavior in an automated task to scrape the required data embedded in javascript elements without the additional overhead of loading and processing visual elements of a website. They are controlled programmatically using command-line tools.

There are many web scraping tools that can be used for headless browsing. The most popular ones are Puppeteer, Selenium, and Playwright. Let's talk about Puppeteer...

Puppeteer is a Node.js library that supports controlling headless Chrome and Firefox. It allows downloading data, using proxies, and more. The library is maintained by the Chrome DevTools team with an active open-source community which is why Puppeteer has become one of the most popular options. For this reason, @zytedata has been working on the Zyte SmartProxy Puppeteer library to work in conjunction with the Smart Proxy Manager to crawl javascript-heavy websites with ease.

Zyte SmartProxy Puppeteer library is a client library built on top of Puppeteer – a high-level API to control headless chrome, written to work seamlessly with Smart Proxy Manager.

With Zyte SmartProxy Puppeteer library, there is

  • No need to manage an additional headless-proxy tool running in the background.
  • No need to run a separate piece of software to connect Puppeteer with Smart Proxy Manager

Resources:

The Zyte SmartProxy Puppeteer library is open-source and available here:

To use Smart Proxy Manager, Sign up for a 14-day free trial.

and follow this hands-on tutorial to integrate Smart proxy Manager with puppeteer

Top comments (0)