Recently I play around with Puppeteer as an attempt to build a simple back-end service. Puppeteer is an awesome tool to emulate the browser's behaviors, which makes web scraping possible. It is sponsored by Google, and Javascript has more strong ties to HTML document(that is, the crawler's target) than Python does, though Python is a friendly language to get your feet wet.
As the title suggests, I would like to show you how to debug puppeteer just like the way we use Ipython or Jupyter notebooks.
I pretty like debug-driven learning which offers an overview of a particular object or function. For Javascript, there is console in the browser to meet my need, but when we run or debug puppeteer-based script, we might not know the way to access puppeteer's API in browser's console.
After I messed around with it, I discovered the node --inspect
option which would add a node logo in devtools if you run your script with headless mode turned off.
Therefore, if you console.log(page)
(page is an object representing the page in the browser) in the script, run it with --inspect
option and click the node logo in the devtools, you would see the page object in the console and in turn you can access its API. For me, it is enough to learn puppeteer step by step.
Basically, the simple steps are as follows:
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
console.log(page);
// And you can access the page object in the browser console if you follow the steps I specified above.
}
)();
Thanks for reading!
Top comments (0)