Puppeteer, a Node library developed by Google, offers a powerful API for controlling headless or full browsers via the DevTools Protocol. One standout feature of Puppeteer is its capability to intercept and manipulate network requests, empowering developers to customize requests, modify responses, and manage data flow during web scraping or automation tasks.
Understanding Request Interception
Request interception in Puppeteer enables developers to observe, modify, or block outgoing HTTP requests and incoming responses. This feature proves invaluable when optimizing page loading, simulating various network conditions, or managing dynamic content loading.
Enabling Request Interception
To activate request interception in Puppeteer, you follow these steps:
- Activate request interception on the page using
page.setRequestInterception(true)
. - Capture all requests made on the site, emitting an event for each network request.
- Capture all API responses on the site via
page.on('response')
.
await page.setRequestInterception(true);
page.on('request', (request) => {
// Your custom logic here
request.continue();
});
page.on('response', (response) => {
// Your response handling logic here
});
Modifying Requests
Request interception facilitates modification of outgoing requests' properties, such as setting custom headers, altering request methods, or adjusting the request payload.
page.on('request', (request) => {
const headers = request.headers();
headers['Authorization'] = 'Bearer YOUR_TOKEN';
request.continue({ headers });
});
Blocking Requests
Another powerful aspect of request interception is the ability to block specific requests based on certain conditions.
page.on('request', (request) => {
if (request.url().includes('blocked-resource')) {
request.abort();
} else {
request.continue();
}
});
Real-world Examples
Let's explore practical use cases for request interception in Puppeteer:
- Dynamic Content Loading
page.on('request', async (request) => {
if (request.url().includes('dynamic-content')) {
await request.continue();
await page.waitForSelector('.loaded-element');
} else {
request.continue();
}
});
- API Mocking
page.on('request', (request) => {
if (request.url().includes('mock-api')) {
request.respond({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ mockData: true }),
});
} else {
request.continue();
}
});
Note: Keep in mind that Puppeteer's page.on("request")
only captures requests made using the page object. XHR and fetch requests made within the page's context are captured, but requests initiated outside the context of the page might not be intercepted.
Practical implementations for the alternative ways
Now let's start the implementation of request interception on the IRCTC website.
const puppeteer = require("puppeteer-extra");
const pluginStealth = require("puppeteer-extra-plugin-stealth")();
puppeteer.use(pluginStealth);
const scrape = async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
const page = await browser.newPage();
await page.goto("https://www.irctc.co.in/nget/train-search", {
waitUntil: "networkidle0",
});
await page.type("#destination > span > input", "MAS");
await page.keyboard.press("ArrowDown");
await page.keyboard.press("Enter");
await page.type("#origin > span > input", "KRR");
await page.keyboard.press("ArrowDown");
let headers;
page.on("response", async (response) => {
if (
response
.request()
.url()
.includes(
"https://www.irctc.co.in/eticketing/protected/mapps1/altAvlEnq/TC"
)
) {
headers = response.request().headers();
const apiRes = await fetch(
"https://www.irctc.co.in/eticketing/protected/mapps1/altAvlEnq/TC",
{
headers,
body: '{"concessionBooking":false,"srcStn":"MAS","destStn":"MMCT","jrnyClass":"","jrnyDate":"20240225","quotaCode":"GN","currentBooking":"false","flexiFlag":false,"handicapFlag":false,"ticketType":"E","loyaltyRedemptionBooking":false,"ftBooking":false}',
method: "POST",
credentials: "omit",
}
);
console.log(await apiRes.json());
}
});
await page.keyboard.press("Enter");
await page.click("[label='Find Trains']");
};
scrape();
In the above code, we would have accessed the response emitter and then entered the destination station as KRR. However, in the API fetch call body, we are using the destination station as MMCT. Thus, we get the response as per the body, and we can access the data accordingly.
Note: the above code doesn't work at times as the IRCTC asks for login sometimes, in such cases, wait for some time and try again after sometime
Conclusion
Delving into Puppeteer's request interception unlocks a realm of possibilities for web automation and testing. With the ability to tweak headers, intercept and block requests, or simulate diverse network conditions, you have the tools to orchestrate a symphony of digital interactions.
So, dive in, explore, and let your creativity soar. Whether you're a seasoned developer or new to web automation, Puppeteer's request interception offers endless opportunities for innovation. Happy coding!
Top comments (0)