Playwright is a high-level API to control and automate headless Chrome (Chromium), Firefox, and Webkit. It can be considered as an extended Puppeteer, as it allows using more browser types to automate modern web apps testing and scraping. Playwright API can be used in JavaScript & TypeScript, Python, C# and, Java. In this article, we are going to show how to set up a proxy in Playwright for all the supported browsers.
Configuring proxy in Playwright
Playwright can be considered as Puppeteer's successor with a similar API, so many developers prefer to use it for a single page applications data extraction and anti-scraping avoidance while automating their data mining tasks. On the other hand, it has a different way to set up a proxy parameters than Puppeteer. Before the Jun 2020, it was a huge problem to make proxy works across all the browsers, but, luckily, the API been unified to pass proxy options via a browser's launch
method. Let's try it out for all the browsers:
Launch proxy
option
It's possible to pass proper proxy settings inside proxy
property in options
object for browserType.launch
method:
const playwright = require('playwright');
const launchOptions = {
proxy: {
server: '222.165.235.2:80'
}
};
(async () => {
for (const browserType of ['chromium', 'firefox', 'webkit']) {
const browser = await playwright[browserType].launch(launchOptions);
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://httpbin.org/ip');
console.log(await page.textContent("*"));
await browser.close();
}
})();
As a result, you'll observe a similar output:
{
"origin": "222.165.235.2"
}
{
"origin": "222.165.235.2"
}
{
"origin": "222.165.235.2"
}
As you can observe, all the browsers have different ways to pass proxy settings. For example, Firefox requires passing profile configuration file to set up browser proxy.
Command line arguments (only for Chromium)
It's also possible to pass proxy settings via command-line arguments as we do it with Puppeteer. Below you can find the example for Chromium proxy options:
const playwright = require('playwright');
const launchOptions = {
args: [ '--proxy-server=http://222.165.235.2:80' ]
};
(async () => {
for (const browserType of ['chromium']) {
const browser = await playwright[browserType].launch(launchOptions);
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('https://httpbin.org/ip');
console.log(await page.textContent("*"));
await browser.close();
}
})();
Other browsers also allow you to set up proxy parameters in their native way, but the behavior may differ between operating systems and browser versions.
How to specify proxy settings for a separate page or request
By using the methods above you'll be able to set up proxy settings for the whole browser session, not for request or the page. In our previous article we have shared info about setting up your own rotation proxy server and separating each request with using it.
Reducing the complexity
In order to simplify your web scraper and have more time for data mining tasks itself, you might want to get rid of the infrastructure hell and just focus on what you really want to achieve (extract the data).
ScrapingAnt API provides the ability to scrape the target page with only one API call. All the proxies rotation and cloud headless Chrome rendering already handled by the API side. You can check out how simple it is with the ScrapingAnt Javascript client:
const ScrapingAntClient = require('@scrapingant/scrapingant-client');
const client = new ScrapingAntClient({ apiKey: '<YOUR-SCRAPINGANT-API-KEY>' });
// Check the proxy address
client.scrape('https://httpbin.org/ip')
.then(res => console.log(res))
.catch(err => console.error(err.message));
With ScrapingAnt API, you can forget about headless browsers infrastructure and maintenance. You can use it for free, follow here to sign in and get your API token.
Top comments (0)