I installed Scrapy globally on my laptop. Then I started a new project that needed a different Scrapy version. I upgraded Scrapy and my old project broke. Dependencies conflicted. Everything was a mess.
I had to reinstall Python to fix it. Two days wasted.
Then I learned about virtual environments. Each project gets its own isolated Python environment. No more conflicts, no more breaking old projects, no more dependency hell.
Let me show you why virtual environments are essential and how to use them properly.
What Is a Virtual Environment?
A virtual environment is an isolated Python installation for your project.
Think of it like:
- A separate apartment for each project
- Each has its own furniture (packages)
- Changes in one don't affect others
Without virtual environment:
System Python
├── scrapy==2.11
├── requests==2.31
└── All projects share these
If you upgrade for one project, ALL projects affected!
With virtual environments:
System Python
│
├── project1/venv/
│ ├── scrapy==2.11
│ └── requests==2.31
│
├── project2/venv/
│ ├── scrapy==2.8
│ └── requests==2.28
│
└── project3/venv/
├── scrapy==3.0
└── requests==2.32
Each project is independent!
Why Virtual Environments Are Essential
Reason 1: Avoid Dependency Conflicts
Scenario:
- Project A needs scrapy==2.8
- Project B needs scrapy==2.11
- Without venv: IMPOSSIBLE
- With venv: No problem!
Reason 2: Clean System Python
Keep your system Python clean. Don't pollute it with hundreds of packages.
# BAD: Installing globally
pip install scrapy # Goes to system Python
# GOOD: Installing in venv
source venv/bin/activate
pip install scrapy # Goes to venv only
Reason 3: Reproducible Environments
Share exact dependencies with team:
pip freeze > requirements.txt
Anyone can recreate your exact environment:
pip install -r requirements.txt
Reason 4: Easy Cleanup
Delete project? Delete venv folder. Done. No traces left.
rm -rf venv/ # Clean removal
Reason 5: Different Python Versions
Different projects can use different Python versions:
# Project 1: Python 3.9
python3.9 -m venv venv
# Project 2: Python 3.11
python3.11 -m venv venv
Creating Virtual Environments
Method 1: Using venv (Built-in)
Python 3.3+ includes venv module:
# Create virtual environment
python3 -m venv venv
This creates a venv/ folder with isolated Python.
What's inside venv/:
venv/
├── bin/ # Executables (python, pip, scrapy)
├── lib/ # Installed packages
├── include/ # C headers
└── pyvenv.cfg # Configuration
Method 2: Using virtualenv (Third-party)
More features than built-in venv:
# Install virtualenv
pip install virtualenv
# Create virtual environment
virtualenv venv
Method 3: Using conda
For data science projects:
conda create -n myproject python=3.11
My recommendation: Use built-in venv for simplicity.
Activating Virtual Environments
Linux/Mac
source venv/bin/activate
Your prompt changes:
(venv) user@computer:~/project$
The (venv) shows you're inside the virtual environment.
Windows
venv\Scripts\activate
Verify Activation
which python
# Output: /home/user/project/venv/bin/python
which pip
# Output: /home/user/project/venv/bin/pip
Both point to venv, not system!
Installing Scrapy in Virtual Environment
Step-by-Step
# 1. Create project directory
mkdir my_scraper
cd my_scraper
# 2. Create virtual environment
python3 -m venv venv
# 3. Activate it
source venv/bin/activate
# 4. Upgrade pip (optional but recommended)
pip install --upgrade pip
# 5. Install Scrapy
pip install scrapy
# 6. Verify installation
scrapy version
Output:
Scrapy 2.11.0
Install Additional Packages
pip install scrapy-playwright
pip install scrapy-selenium
pip install pandas
pip install psycopg2-binary
All go into your venv, not system!
Creating a Scrapy Project in Virtual Environment
Complete Workflow
# 1. Create and activate venv
mkdir ecommerce_scraper
cd ecommerce_scraper
python3 -m venv venv
source venv/bin/activate
# 2. Install Scrapy
pip install scrapy
# 3. Create Scrapy project
scrapy startproject ecommerce .
# 4. Create a spider
cd ecommerce/spiders
scrapy genspider products example.com
# 5. Install additional packages
pip install scrapy-playwright
# 6. Save dependencies
pip freeze > requirements.txt
Your directory structure:
ecommerce_scraper/
├── venv/ # Virtual environment
├── scrapy.cfg
├── ecommerce/
│ ├── __init__.py
│ ├── items.py
│ ├── middlewares.py
│ ├── pipelines.py
│ ├── settings.py
│ └── spiders/
│ ├── __init__.py
│ └── products.py
└── requirements.txt # Dependencies list
Requirements.txt: Sharing Dependencies
Creating requirements.txt
pip freeze > requirements.txt
Example requirements.txt:
scrapy==2.11.0
scrapy-playwright==0.0.34
playwright==1.40.0
requests==2.31.0
pandas==2.1.4
psycopg2-binary==2.9.9
Using requirements.txt
Someone else setting up your project:
# Clone project
git clone https://github.com/user/project.git
cd project
# Create venv
python3 -m venv venv
source venv/bin/activate
# Install all dependencies at once
pip install -r requirements.txt
Boom! Exact same environment.
Deactivating Virtual Environment
When done working:
deactivate
Prompt returns to normal:
user@computer:~/project$
Now you're back in system Python.
Multiple Virtual Environments
Different projects, different venvs:
# Project 1
cd ~/projects/scraper1
source venv/bin/activate
python --version # Python 3.9.18
# Deactivate
deactivate
# Project 2
cd ~/projects/scraper2
source venv/bin/activate
python --version # Python 3.11.7
# They don't interfere!
Virtual Environment Best Practices
Practice 1: One venv per project
# GOOD
project1/venv/
project2/venv/
project3/venv/
# BAD (don't share venv)
shared_venv/
├── project1/
├── project2/
└── project3/
Practice 2: Name it "venv"
Standard name makes it easy to remember and .gitignore.
python3 -m venv venv # Standard
# Not: python3 -m venv env
# Not: python3 -m venv virtualenv
Practice 3: Add venv to .gitignore
Never commit venv to git!
# .gitignore
venv/
*.pyc
__pycache__/
.scrapy/
Practice 4: Always activate before working
# Start work session
cd project
source venv/bin/activate
# Do work
scrapy crawl myspider
# End session
deactivate
Practice 5: Keep requirements.txt updated
After installing new packages:
pip install new-package
pip freeze > requirements.txt
git add requirements.txt
git commit -m "Add new-package"
Common Workflows
Workflow 1: Starting New Project
# Create project
mkdir my_scraper && cd my_scraper
# Setup venv
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
# Install Scrapy
pip install scrapy
# Create project
scrapy startproject myproject .
# Save dependencies
pip freeze > requirements.txt
# Git
git init
echo "venv/" > .gitignore
git add .
git commit -m "Initial commit"
Workflow 2: Working on Existing Project
# Clone project
git clone https://github.com/user/project.git
cd project
# Setup venv
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers if needed
playwright install
# Run spider
scrapy crawl myspider
Workflow 3: Updating Dependencies
# Activate venv
source venv/bin/activate
# Update package
pip install --upgrade scrapy
# Update requirements.txt
pip freeze > requirements.txt
# Test everything still works
scrapy crawl myspider
# Commit changes
git add requirements.txt
git commit -m "Update Scrapy to 2.11.0"
Troubleshooting Virtual Environments
Issue 1: "python not found" after activation
Problem: Wrong Python version or venv corrupted.
Solution:
deactivate
rm -rf venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Issue 2: Packages not found
Problem: Forgot to activate venv.
Check:
which python
# Should show: /path/to/project/venv/bin/python
# Not: /usr/bin/python
Fix:
source venv/bin/activate
Issue 3: Different pip
Problem: Using system pip instead of venv pip.
Check:
which pip
# Should show: /path/to/project/venv/bin/pip
Fix:
source venv/bin/activate
Or use python -m pip:
python -m pip install scrapy
Issue 4: Venv too large
Problem: Venv takes up lots of space.
Check size:
du -sh venv
Common causes:
- Downloaded packages cached
- Multiple Python versions
Clean pip cache:
pip cache purge
Advanced: Shell Aliases
Make venv management easier with aliases.
Add to ~/.bashrc or ~/.zshrc
# Create and activate venv
alias venv-create='python3 -m venv venv'
alias venv-activate='source venv/bin/activate'
# Common venv commands
alias venv-install='pip install -r requirements.txt'
alias venv-save='pip freeze > requirements.txt'
alias venv-clean='deactivate && rm -rf venv'
# Quick project setup
venv-new() {
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install scrapy
pip freeze > requirements.txt
}
Usage
# Create new project
cd myproject
venv-new
# Work on existing project
cd existing-project
venv-activate
Virtual Environment with Docker
For ultimate isolation, use Docker with virtual environments.
Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Create venv
RUN python -m venv /app/venv
# Activate venv
ENV PATH="/app/venv/bin:$PATH"
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy project
COPY . .
CMD ["scrapy", "crawl", "myspider"]
Benefits:
- Isolated OS environment
- Consistent across all systems
- Easy deployment
Real-World Project Structure
Complete Setup
my_scraper/
├── venv/ # Virtual environment (not in git)
├── .gitignore # Ignore venv, cache, etc
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── scrapy.cfg # Scrapy config
├── myproject/
│ ├── __init__.py
│ ├── settings.py
│ ├── items.py
│ ├── pipelines.py
│ ├── middlewares.py
│ └── spiders/
│ ├── __init__.py
│ ├── products.py
│ └── categories.py
├── scripts/
│ ├── run_spider.sh # Helper scripts
│ └── deploy.sh
├── data/ # Output data
└── logs/ # Log files
README.md
# My Scraper Project
## Setup
1. Create virtual environment:
bash
python3 -m venv venv
2. Activate virtual environment:
bash
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
3. Install dependencies:
bash
pip install -r requirements.txt
4. Install Playwright browsers (if needed):
bash
playwright install
## Running
bash
Activate venv
source venv/bin/activate
Run spider
scrapy crawl products
Or use script
./scripts/run_spider.sh products
## Development
Update dependencies:
bash
pip install new-package
pip freeze > requirements.txt
Virtual Environment in Production
On Server
# Deploy to server
ssh user@server
cd /opt/scrapers
# Clone project
git clone https://github.com/user/project.git myproject
cd myproject
# Setup venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run with nohup
nohup scrapy crawl myspider > spider.log 2>&1 &
Systemd Service (Better)
# /etc/systemd/system/myspider.service
[Unit]
Description=My Scrapy Spider
After=network.target
[Service]
Type=simple
User=scrapy
WorkingDirectory=/opt/scrapers/myproject
Environment="PATH=/opt/scrapers/myproject/venv/bin"
ExecStart=/opt/scrapers/myproject/venv/bin/scrapy crawl myspider
Restart=always
[Install]
WantedBy=multi-user.target
Uses venv automatically!
Comparing Virtual Environment Tools
venv (Built-in)
Pros:
- Built into Python 3.3+
- No installation needed
- Simple and fast
- Official
Cons:
- Basic features only
- Can't use different Python versions easily
Best for: Most Scrapy projects
virtualenv
Pros:
- More features than venv
- Faster
- Works with Python 2
Cons:
- Need to install it
- Extra dependency
Best for: Complex projects, Python 2 support
conda
Pros:
- Manages Python versions
- Non-Python dependencies (C libraries)
- Great for data science
Cons:
- Large download
- Slower
- Overkill for Scrapy
Best for: Data science + scraping projects
pipenv
Pros:
- Manages venv + requirements.txt automatically
- Lockfile for reproducibility
- Modern workflow
Cons:
- Slower
- More complex
- Extra dependency
Best for: Teams wanting strict dependency management
poetry
Pros:
- Modern dependency management
- Great for packaging
- Lockfile
Cons:
- Learning curve
- Different workflow
- Extra complexity
Best for: Publishing packages, large projects
My recommendation for Scrapy: Use built-in venv. Simple, fast, works everywhere.
Summary
Why virtual environments:
- Isolate project dependencies
- Avoid conflicts
- Clean system Python
- Reproducible environments
- Easy cleanup
Basic commands:
# Create
python3 -m venv venv
# Activate
source venv/bin/activate
# Install
pip install scrapy
# Save dependencies
pip freeze > requirements.txt
# Deactivate
deactivate
# Remove
rm -rf venv
Best practices:
- One venv per project
- Name it "venv"
- Add to .gitignore
- Keep requirements.txt updated
- Always activate before work
- Deactivate when done
Remember:
- Never commit venv to git
- Always use requirements.txt
- Activate venv before running spider
- Update requirements.txt after changes
- Document setup in README
Virtual environments are not optional. They're essential for professional Python development. Use them for every Scrapy project!
Happy scraping! 🕷️
Top comments (0)