DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’» is a community of 967,611 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Create account Log in
Cover image for How to make POST, PUT and DELETE requests using Puppeteer?
Oleg Kulyk
Oleg Kulyk

Posted on • Originally published at scrapingant.com

How to make POST, PUT and DELETE requests using Puppeteer?

Making POST, PUT, and DELETE requests is a crucial web scraping and web testing technique. Still, this functionality is not included in Puppeteer's API as a separate function.

Let's check out the workaround for this situation and create a helper function to fix this out.

Why do we need POST while web scraping?

POST request is one of several available HTTP request types. By design, this method is used to send data to a web server for processing and possible storage.

POST requests usage is one of the main ways to send form data during login or registration. It is also one of the ways to send any data to the web server.

Earlier, one of the main implementation patterns of login or registration was to send the form data with the required authorization parameters via POST request and get the protected content as a response to this request (along with cookies to avoid re-entering the authentication and authorization data).

Nowadays SPAs (Single Page Applications) also use POST requests to send data to API, but such requests usually return only necessary data to update web page and not the whole page.

Thus, many sites use POST requests for client-server communication and this requires the ability of sending POST requests while web scraping.

Unfortunately, Puppeteer developers haven't introduced the native way of making requests other than GET, but it's not a big deal for us to create a workaround.

Interception of the initial request

The idea behind our approach is quite simple - we need to change the request type while opening the page, so we can send POST data along with opening a page.

To do that, we have to intercept the request using page.on('request') handler.

We're going to use HTTPBin which can help us with our solution testing.

Let's check out the simple JS snippet which just opens HTTPBin's POST endpoint:

const puppeteer = require('puppeteer');
const TARGET_URL = 'https://httpbin.org/post';

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(TARGET_URL);
    console.log(await page.content());
})();
Enter fullscreen mode Exit fullscreen mode

The result is, definitely, not what we're trying to achieve:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html><head><title>405 Method Not Allowed</title>
</head><body><h1>Method Not Allowed</h1>
<p>The method is not allowed for the requested URL.</p>
</body></html>
Enter fullscreen mode Exit fullscreen mode

So, let's add a request interception:

const puppeteer = require('puppeteer');
const TARGET_URL = 'https://httpbin.org/post';
const POST_JSON = { hello: 'I like ScrapingAnt' };

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.setRequestInterception(true);
    page.once('request', request => {
        request.continue({ method: 'POST', postData: JSON.stringify(POST_JSON), headers: request.headers });
    });
    await page.goto(TARGET_URL);
    console.log(await page.content());
})();
Enter fullscreen mode Exit fullscreen mode

This time our POST request has been successfully executed:

<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{
  "args": {}, 
  "data": "{\"hello\":\"I like ScrapingAnt\"}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US", 
    "Content-Length": "30", 
    "Host": "httpbin.org", 
    "Sec-Fetch-Dest": "document", 
    "Sec-Fetch-Mode": "navigate", 
    "Sec-Fetch-Site": "none", 
    "Sec-Fetch-User": "?1", 
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/93.0.4577.0 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-61757548-7608d72817d01768524a3298"
  }, 
  "json": {
    "hello": "I like ScrapingAnt"
  }, 
  "origin": "7.133.7.133", 
  "url": "https://httpbin.org/post"
}
</pre></body></html>
Enter fullscreen mode Exit fullscreen mode

Unfortunately, this code is only a proof-of-concept, not a complete solution, because any request made by the browser will be converted to a POST request.

Let's improve our code and make it production-ready.

Puppeteer's extension

The next big step of our HTTP requests Puppeteer journey is to create something reusable to avoid code duplication while making non-GET requests.

To achieve that, let's create a function gotoExtended:

async function gotoExtended(page, request) {
    const { url, method, headers, postData } = request;

    if (method !== 'GET' || postData || headers) {
        let wasCalled = false;
        await page.setRequestInterception(true);
        const interceptRequestHandler = async (request) => {
            try {
                if (wasCalled) {
                    return await request.continue();
                }

                wasCalled = true;
                const requestParams = {};

                if (method !== 'GET') requestParams.method = method;
                if (postData) requestParams.postData = postData;
                if (headers) requestParams.headers = headers;
                await request.continue(requestParams);
                await page.setRequestInterception(false);
            } catch (error) {
                log.debug('Error while request interception', { error });
            }
        };

        await page.on('request', interceptRequestHandler);
    }

    return page.goto(url);
}
Enter fullscreen mode Exit fullscreen mode

The usage of this function is straightforward and simple:

const puppeteer = require('puppeteer');
const TARGET_URL = 'https://httpbin.org/post';
const POST_JSON = { hello: 'I like ScrapingAnt' };
const headers = { header_1: 'custom_header' };

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await gotoExtended(page, { url: TARGET_URL, method: 'POST', postData: JSON.stringify(POST_JSON), headers });
    console.log(await page.content());
})();
Enter fullscreen mode Exit fullscreen mode

It is also possible to use this helper function with any available HTTP method and custom headers.

Conclusion

Having the ability to send any request type using Puppeteer allows web scraping specialists to extend their capabilities and improve their data crawlers' performance by skipping unnecessary steps like form filling.

As usual, we recommend you extend your web scraping knowledge using our articles:

Happy Web Scraping, and don't forget to cover your code with unit-tests πŸ‘·

Top comments (0)

Need a better mental model for async/await?

Check out this classic DEV post on the subject.

β­οΈπŸŽ€ JavaScript Visualized: Promises & Async/Await

async await