loading...
Cover image for Python Scrapy Installation And Example Crawler

Python Scrapy Installation And Example Crawler

hakan profile image Hakan Torun ・1 min read

Scrapy is an open source framework written in Python that allows you to extract data from structural content such as html and xml. It is able to scraping and crawling on websites especially fast enough. First of all, you should install python packet manager, pip.

Installing with using pip:

pip install scrapy

Starting a new project:

scrapy startproject tutorial

When a scrapy project is created, a file / directory structure will be created as follows.

tutorial/
    scrapy.cfg            # deploy configuration file

    tutorial/             # project's Python module, you'll import your code from here
        __init__.py

        items.py          # project items definition file

        pipelines.py      # project pipelines file

        settings.py       # project settings file

        spiders/          # a directory where you'll later put your spiders
            __init__.py

Example spider:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
        'http://quotes.toscrape.com/page/2/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').extract_first(),
                'author': quote.css('small.author::text').extract_first(),
                'tags': quote.css('div.tags a.tag::text').extract(),
            }

Scrapy spiders can scan and extract data on one or more addresses. With Scrapy selectors, the desired fields can be selected and filtered. Scrapy is supported by xpath scrapy in selectors.

For crawling:

scrapy crawl quotes

For writing output to json file:

scrapy crawl quotes -o qutoes.json

Discussion

pic
Editor guide