DEV Community

Discussion on: How To Scrape Amazon at Scale With Python Scrapy, And Never Get Banned

Collapse
 
iankerins profile image
Ian Kerins

You need to create a Item Pipeline like this in your pipelines.py file.

# pipelines.py 

import mysql.connector

class SaveMySQLPipeline:

    def __init__(self):
        self.conn = mysql.connector.connect(
            host = 'localhost',
            user = 'root',
            password = '*******',
            database = 'dbname'
        )

        ## Create cursor, used to execute commands
        self.cur = self.conn.cursor()

        ## Create quotes table if none exists
        self.cur.execute("""
        CREATE TABLE IF NOT EXISTS products (
            id int NOT NULL auto_increment, 
            asin text,
            title text,
            image VARCHAR(255),
            PRIMARY KEY (id)
        )
        """)



    def process_item(self, item, spider):

        ## Define insert statement
        self.cur.execute(""" insert into quotes (asin, title, image) values (%s,%s,%s)""", (
            item["asin"],
            item["Title"],
            item["MainImage"]
        ))

        ## Execute insert of data into database
        self.conn.commit()


    def close_spider(self, spider):

        ## Close cursor & connection to database 
        self.cur.close()
        self.conn.close()
Enter fullscreen mode Exit fullscreen mode

And then enable it in your settings.py file.

# settings.py

ITEM_PIPELINES = {
   'tutorial.pipelines.TutorialPipeline': 300
   'tutorial.pipelines.SaveMySQLPipeline': 350,
}

Enter fullscreen mode Exit fullscreen mode
Collapse
 
smaug profile image
smaug

firstly thank you for answer.

'tutorial.pipelines.TutorialPipeline': 300 <<< I guess a comma is needed here.

secondly, the price information cannot be scraped in the above codes. what could be the reason for this?

For example, can we register mysql by scraping 3 different prices of the following product?

amazon.com/dp/B07KSJLQCD

List Price: $25.00
Price: $17.45
Lightning deal " if any"

MS -> My Telegram t.me/smesut

Thread Thread
 
iankerins profile image
Ian Kerins

To scrape those extra pricing details you will need to find the selectors for them and add those field to the item.

When I open that page, I don't see the fields as Amazon is probably only showing them based on the geography you are in.

So if you create new selectors for those prices you want and add them to the item, then you can update the mysql storage pipeline to store that data as well.