In the past three months, I worked on various open source projects, including my own project Repo Context Packager, Math Worksheet Generator and Open Web Calendar. This month, I want to challenge myself to work on a larger and more widely used project - Scrapy, a Python module for web crawling.
Why I chose Scrapy
By working on several issues in Open Web Calendar, I've gained experience in working on Python project with a comprehensive test suite and continuous integration. I want to push myself further so I tried to find a larger Python project to work on. Besides, I want to find a project that I would use on a regular basis. One of the topics I'm interested in is web crawling, and I do need to extract data from online sources sometimes for statistical analysis. I then searched for "open source web scraper", and found Scrapy, which happens to be a great choice that satisfies all the criteria, with a large user base, plenty of issues to work on, and a well organized program structure.
My Work Plan
I will begin by carefully reading Scrapy’s official documentation and contribution guidelines to understand its core concepts, project structure, and coding standards. Next, I will install Scrapy locally and experiment with building a few small crawling projects, which will help me better understand how Scrapy components work together in practice. After gaining familiarity with both the documentation and actual usage, I will start exploring the issues on Scrapy’s GitHub repository. Since there are hundreds of open issues, I should be able to pick some issues that I'm interested in. Eventually, I should be able to submit pull requests following Scrapy’s contribution process and improve my work based on feedback from maintainers.
Expected Outcomes
Through contributing to Scrapy, I expect to gain a deeper understanding of web crawling and data extraction, by learning how professional programmers design crawlers to handle data extraction efficiently. Another important outcome is being able to directly improve Scrapy itself, By fixing bugs, enhancing features, or improving documentation, I can contribute to a tool that is widely used by developers around the world, which is both meaningful and motivating. Finally, I aim to become a long-term Scrapy user. After becoming comfortable with the framework, I plan to use it for real data extraction tasks related to my own research and statistical analysis, making it a core tool in my future projects.
Top comments (0)