A while back, I was working on a data project. Nothing crazy. I just needed to pull product prices from a handful of e-commerce sites every day and dump them into a spreadsheet. Simple enough, right?
Wrong.
Getting the first spider up and running took me way longer than it should have. And once I finally got it working, I quickly realized that was just the easy part. The moment I needed to scrape a site that required login, or one that loaded everything with JavaScript, or figure out how to store my data in an actual database instead of just a CSV file, I was lost again. Each new challenge felt like starting over from scratch.
Over time I figured all of it out, but the process was slow and frustrating. Too much time spent hunting for answers that should have been easy to find. That's the gap I wanted to fill. So I sat down and started writing, chapter by chapter, everything I wished someone had taught me from the beginning.
What Is The Scrapy Handbook?
The Scrapy Handbook is a free, open-source, 45-chapter guide that takes you from knowing absolutely nothing about web scraping to building and deploying production-ready scrapers with confidence. It's not a collection of random tips. It's a structured, end-to-end journey.
I started writing it in February 2025, and it took about a year to get right. Every chapter went through multiple rounds of revision. Code examples were tested. Explanations were rewritten until they actually made sense without needing a PhD to understand them.
The whole thing lives on GitHub, and it's completely free to read.
Who Is This For?
If you've ever thought "I want to scrape a website but I have no idea where to start," this handbook is for you.
If you already know the basics but get stuck the moment things get complicated (JavaScript sites, databases, proxies, deployment), this handbook is also for you.
The handbook is written so that a complete beginner can follow along from chapter one. But it also goes deep enough that someone with experience will still find value in the later chapters.
What's Inside?
The handbook is split into nine parts, and each part builds on the one before it. Here's the journey you'll take.
Part I is where everything begins. You'll learn what web scraping actually is, why it matters, and how websites work under the hood. Then you'll set up your environment, create your very first Scrapy spider, and get comfortable with CSS selectors and XPath. By the time you finish Part I, you'll have working scrapers and a solid foundation to build on.
Part II is all about data. Extracting it cleanly, structuring it with Scrapy Items and ItemLoaders, cleaning it properly, running it through pipelines, and exporting it in formats you can actually use. A lot of people skip this stuff and end up with messy, unusable data later. This part makes sure that doesn't happen to you.
Part III is where things get interesting. Forms, login pages, JavaScript-rendered websites, media file downloads, sitemaps, error handling, and performance optimization. These are the scenarios that trip up most beginners, and this part walks through each one step by step.
Part IV covers databases. SQLite for getting started, PostgreSQL for production, ORM with SQLAlchemy, and MongoDB for when your data doesn't fit neatly into tables. You'll learn how to connect all of this directly to your Scrapy pipelines.
Part V is about scaling. What happens when you need to scrape millions of pages? This part covers distributed crawling with Scrapy-Redis, scaling strategies, cost analysis, resource optimization, and the important conversation about balancing speed with ethics.
Part VI is deployment. How do you take a spider that works on your laptop and make it run reliably on a server? VPS setup, production hardening, monitoring, logging, and scheduling with cron. Real, practical stuff.
Part VII dives into the internals. Spider middlewares, downloader middlewares, Scrapy extensions, and the signals system. If you want to customize Scrapy at a deeper level, this is where you need to be.
Part VIII covers the professional side of scraping. Proxies, IP rotation, anti-bot techniques, testing your spiders, async programming, debugging, profiling, and even building APIs with your scraped data.
Part IX wraps everything up with the bigger picture. Legal and ethical considerations, the future of web scraping, and a roadmap for where to go next in your journey.
A Quick Taste
Want to see what Scrapy looks like in action? Here's all it takes to get started:
pip install scrapy
scrapy startproject myproject
cd myproject
scrapy genspider example example.com
scrapy crawl example
That's it. Five lines, and you have a spider running. The handbook takes you from this exact starting point all the way to distributed, production-grade systems.
One Thing I Want to Be Honest About
Web scraping is one of those fields where things break. Not because of anything you did wrong, but because the websites you're scraping change their layout, update their code, or add new protections overnight. A selector that works perfectly today might return nothing tomorrow.
I wrote this in the handbook's README, and I meant it. If you find something in the handbook that no longer works, don't get frustrated. That's not a failure. That's literally the nature of web scraping. It's a skill you develop over time, and the handbook gives you the tools and the mindset to handle it.
If you do find something outdated, feel free to open an issue on the GitHub repo. Or better yet, fix it yourself and submit a pull request. I review and merge contributions as a priority.
Why Open Source?
I could have turned this into a paid course or a book on Amazon. Honestly, the thought crossed my mind. But I kept coming back to the same feeling I had when I was starting out, that frustration of not having a single, clear place to learn this stuff. I didn't want anyone else to go through that.
So it's free. It's on GitHub. Anyone can read it, contribute to it, and benefit from it.
What's Next?
The handbook is a living document. I'm still adding to it, refining explanations, and updating examples as Scrapy and the scraping landscape evolve. If there's a topic you think is missing or an explanation that could be clearer, I genuinely want to hear from you.
Go check it out, start from chapter one, and let me know what you think. I'd love to hear your feedback.
👉 The Scrapy Handbook on GitHub
Happy scraping! 🕷️
Top comments (0)