DEV Community


Posted on

Scraping product detail page into a separeted process is not optime for me

Hi, I'm using symfony/panther to scrape the entire catalog of an online store where product cards are displayed at a rate of 30 per page. The detail is that for each product I must also scrap the product details page to obtain other data that is not present on the catalog page. My initial idea was to scrap each one of the catalog pages, store the present products and send for each one of them to a message queue (with symfony messenger) the id of the product stored in the local database and the url of the product detail page, to process the detail page from a separate worker and update the record in my local database.
I currently have more than 20,000 products registered in the local database and the same number of messages stored in the message queue ready to process (one message for each product). My problem is that by analyzing a bit I think that in this way I would incur a waste of time when processing each of the messages because a panther client would have to be initialized (with the chrome driver) for each message.
What other option can I have?

Top comments (0)