In my previous contribution to Scrapy, I worked on code cleanup for test configuration as a starting point to familiarize myself with the project structure and development workflow. That work helped me understand how Scrapy organizes its test setup and how changes are validated through its testing framework.
This time, I moved a step closer to Scrapy’s core data extraction logic by working on a refactoring task related to how spiders are configured. Compared to test-related cleanup, this task required a deeper understanding of how Scrapy’s internal components interact during crawling and configuration.
Digging into Interrelated Issues
The main goal of the issue was to deprecate an old attribute previously used to configure spiders. This attribute has been replaced by a list of settings, which provides a more consistent and unified configuration flow across the project.
To work effectively on this task, I had to explore earlier issues and pull requests that introduced or discussed the change. Reading through the discussions and code changes helped me understand the motivation behind the refactor, the design decisions that were already made, and the constraints I needed to respect.
This experience showed me a common pattern in large projects that changes are often interrelated. Features and refactors are often split across multiple issues and pull requests, meaning contributors must spend significant time studying project history. While this increases the initial learning curve, it also ensures that changes remain consistent and well thought out.
Working on the Issue
After studying the issue and related discussions, I realized that the refactor involved more than just modifying class definitions. I also needed to update existing tests so that they validated the new settings list instead of relying on the deprecated attribute.
To make my approach clear, I created a draft pull request outlining my plan and progress. This made my work transparent and allowed others to follow my reasoning early on.
During implementation, I relied on tools like git grep to locate all relevant references to the deprecated attribute. This ensured that the refactor was thorough and consistent across the codebase. After completing the changes, I updated the pull request summary to explain what was changed, making it ready for review.
After receiving the maintainer's comment, I further explained the rationale of changes and the logical flow of the updated codes. The maintainer later suggested that it would be better if another issue is tackled first, so the pull request is kept open at this stage. Although the pull request has not been merged, the work I've done and the discussion may provide some insight to other contributors, or even my future self.
Reflection on My Work on Scrapy
Through this contribution, I’ve become more comfortable working with large and complex codebases. I’ve learned that progress in such projects is incremental, as the saying goes, Rome wasn’t built in a day. Understanding comes gradually through repeated exposure, patience, and continuous effort.
This experience also reinforced the importance of refactoring and cleanup. These maintenance tasks are essential for long-term project sustainability, especially as project requirements evolve and programming languages introduce new features or deprecate old ones. Without careful maintenance, projects risk being overwhelmed by technical debt or falling victim to problems like the second-system effect.
Looking back at my initial expectations, I’m glad that I was able to improve Scrapy’s code readability and maintainability. At the same time, I now better understand that fully grasping a project of this size is a long-term process. I plan to continue contributing to Scrapy and using it in the future to deepen my understanding further.
Review of My Open Source Journey
Since September this year, I’ve been actively involved in the open source community. My journey started with working on my own projects and collaborating with my classmate, then gradually expanded to contributing to public open source repositories on GitHub.
I began with smaller contributions such as bug fixes, adding features, and writing tests, which helped me understand contribution workflows and project structures. Over time, I took on more involved tasks like deprecating older Python version, optimizing test, and eventually refactoring code in larger project.
Working on open source has consistently challenged me to step outside my comfort zone. By progressing from small fixes to more complex refactors, I can clearly see my growth in both programming and problem-solving skills.
Open source projects are a valuable learning resource, especially for beginners. They provide opportunities to learn from experienced developers across diverse domains. I plan to continue my open source journey, sharpen my technical skills, and build meaningful connections within the community. Looking forward to contributing to impactful projects in the future!
Top comments (0)