Today I will talk about the User Agent difference when we running Puppeteer in headless and headful mode.
For people not familiar with Puppeteer, Puppeteer is a Node library that provides many high-level API to control the headless Chrome or Chromium over DevTools protocol. You can go to https://pptr.dev/ for more details.
Puppeteer in headless mode means you control Chrome or Chromium browser without displaying the browser UI. In the opposite, Puppeteer in headful mode will display the browser UI and this is useful for debugging.
As mentioned here https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent, User Agent string is a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent.
Web browser send User-Agent request header when we browse a web pages on the internet. Here is sample of my User Agent.
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36
Preparation
Install Puppeteer with this command.
npm i puppeteer
The code
OK now let's create a code to show User Agent string when running Puppeteer in headless mode.
File puppeteer_headless.js
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
console.log(await browser.userAgent());
await browser.close();
})();
Run it.
node puppeteer_headless.js
On my machine it will display like below.
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/79.0.3945.0 Safari/537.36
Please notice there is sub string HeadlessChrome
there.
OK now let's create a code to show User Agent string when running Puppeteer in headful mode.
File puppeteer_headful.js
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
console.log(await browser.userAgent());
await browser.close();
})();
Run with
node puppeteer_headful.js
On my machine it will display like below.
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.0 Safari/537.36
Now we can see that this User Agent string is similar like normal web browser User Agent string.
Why this is interesting? Suppose you want to scrap a website using Puppeteer in headless mode and the target website put a protection by detecting the User Agent string (blocking ChromeHeadless) then your scraping activity might be blocked.
How to set User Agent on headless Chrome
Anyway we still can set User Agent string in Puppeteer headless mode, it will override the default headless Chrome User Agent string.
Here is the code sample.
File puppeteer_set_user_agent.js
const puppeteer = require('puppeteer');
(async () => {
// prepare for headless chrome
const browser = await puppeteer.launch();
const page = await browser.newPage();
// set user agent (override the default headless User Agent)
await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36');
// go to Google home page
await page.goto('https://google.com');
// get the User Agent on the context of Puppeteer
const userAgent = await page.evaluate(() => navigator.userAgent );
// If everything correct then no 'HeadlessChrome' sub string on userAgent
console.log(userAgent);
await browser.close();
})();
It will display User Agent that we already set before we browse to Google web page.
Thank you and I hope you enjoy it.
Top comments (12)
Hi everyone,
I am using Puppeteer library in NodeJS for runtime PDF file generation. It works fine on my local system, but when I deploy my app on a cPanel Based CentOs Os server, it throws an error. Any solution would be appreciated.
There must be other altered behaviours too. Some tests were not working in headless mode, after developping them with browser display.
ic ic, thanks for the info
Hi I wanted to know how to change the cdc variable to go undetected from the message of "chrome is controlled by an automation software". No idea if the site detected...
Hi Rudra, thanks for the question. Actually I still have no idea about it as well. But any use case for you to hide that thing?
I found this link help.applitools.com/hc/en-us/artic... that maybe related to it?
Thank you.
That's perfect dude! Thank you!
Hello sir,
could you help me please?
I would like to load random useragent for each page lunch.
How do i do that?
Example:
page.setUserAgent('/utils/referers.txt');
thank you
I would have a variable called userAgents that is an array of user agent strings then do something like
await page.setUserAgent(userAgents[Math.floor(Math.random()*userAgents.length)]);
I logged in just to add like to this comment
thank you :)
Nice article, thanks