DEV Community

Muhammad Ikramullah Khan
Muhammad Ikramullah Khan

Posted on

Virtual Environments: The Right Way to Run Scrapy Projects

I installed Scrapy globally on my laptop. Then I started a new project that needed a different Scrapy version. I upgraded Scrapy and my old project broke. Dependencies conflicted. Everything was a mess.

I had to reinstall Python to fix it. Two days wasted.

Then I learned about virtual environments. Each project gets its own isolated Python environment. No more conflicts, no more breaking old projects, no more dependency hell.

Let me show you why virtual environments are essential and how to use them properly.


What Is a Virtual Environment?

A virtual environment is an isolated Python installation for your project.

Think of it like:

  • A separate apartment for each project
  • Each has its own furniture (packages)
  • Changes in one don't affect others

Without virtual environment:

System Python
├── scrapy==2.11
├── requests==2.31
└── All projects share these
Enter fullscreen mode Exit fullscreen mode

If you upgrade for one project, ALL projects affected!

With virtual environments:

System Python
│
├── project1/venv/
│   ├── scrapy==2.11
│   └── requests==2.31
│
├── project2/venv/
│   ├── scrapy==2.8
│   └── requests==2.28
│
└── project3/venv/
    ├── scrapy==3.0
    └── requests==2.32
Enter fullscreen mode Exit fullscreen mode

Each project is independent!


Why Virtual Environments Are Essential

Reason 1: Avoid Dependency Conflicts

Scenario:

  • Project A needs scrapy==2.8
  • Project B needs scrapy==2.11
  • Without venv: IMPOSSIBLE
  • With venv: No problem!

Reason 2: Clean System Python

Keep your system Python clean. Don't pollute it with hundreds of packages.

# BAD: Installing globally
pip install scrapy  # Goes to system Python

# GOOD: Installing in venv
source venv/bin/activate
pip install scrapy  # Goes to venv only
Enter fullscreen mode Exit fullscreen mode

Reason 3: Reproducible Environments

Share exact dependencies with team:

pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

Anyone can recreate your exact environment:

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Reason 4: Easy Cleanup

Delete project? Delete venv folder. Done. No traces left.

rm -rf venv/  # Clean removal
Enter fullscreen mode Exit fullscreen mode

Reason 5: Different Python Versions

Different projects can use different Python versions:

# Project 1: Python 3.9
python3.9 -m venv venv

# Project 2: Python 3.11
python3.11 -m venv venv
Enter fullscreen mode Exit fullscreen mode

Creating Virtual Environments

Method 1: Using venv (Built-in)

Python 3.3+ includes venv module:

# Create virtual environment
python3 -m venv venv
Enter fullscreen mode Exit fullscreen mode

This creates a venv/ folder with isolated Python.

What's inside venv/:

venv/
├── bin/          # Executables (python, pip, scrapy)
├── lib/          # Installed packages
├── include/      # C headers
└── pyvenv.cfg    # Configuration
Enter fullscreen mode Exit fullscreen mode

Method 2: Using virtualenv (Third-party)

More features than built-in venv:

# Install virtualenv
pip install virtualenv

# Create virtual environment
virtualenv venv
Enter fullscreen mode Exit fullscreen mode

Method 3: Using conda

For data science projects:

conda create -n myproject python=3.11
Enter fullscreen mode Exit fullscreen mode

My recommendation: Use built-in venv for simplicity.


Activating Virtual Environments

Linux/Mac

source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

Your prompt changes:

(venv) user@computer:~/project$
Enter fullscreen mode Exit fullscreen mode

The (venv) shows you're inside the virtual environment.

Windows

venv\Scripts\activate
Enter fullscreen mode Exit fullscreen mode

Verify Activation

which python
# Output: /home/user/project/venv/bin/python

which pip
# Output: /home/user/project/venv/bin/pip
Enter fullscreen mode Exit fullscreen mode

Both point to venv, not system!


Installing Scrapy in Virtual Environment

Step-by-Step

# 1. Create project directory
mkdir my_scraper
cd my_scraper

# 2. Create virtual environment
python3 -m venv venv

# 3. Activate it
source venv/bin/activate

# 4. Upgrade pip (optional but recommended)
pip install --upgrade pip

# 5. Install Scrapy
pip install scrapy

# 6. Verify installation
scrapy version
Enter fullscreen mode Exit fullscreen mode

Output:

Scrapy 2.11.0
Enter fullscreen mode Exit fullscreen mode

Install Additional Packages

pip install scrapy-playwright
pip install scrapy-selenium
pip install pandas
pip install psycopg2-binary
Enter fullscreen mode Exit fullscreen mode

All go into your venv, not system!


Creating a Scrapy Project in Virtual Environment

Complete Workflow

# 1. Create and activate venv
mkdir ecommerce_scraper
cd ecommerce_scraper
python3 -m venv venv
source venv/bin/activate

# 2. Install Scrapy
pip install scrapy

# 3. Create Scrapy project
scrapy startproject ecommerce .

# 4. Create a spider
cd ecommerce/spiders
scrapy genspider products example.com

# 5. Install additional packages
pip install scrapy-playwright

# 6. Save dependencies
pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

Your directory structure:

ecommerce_scraper/
├── venv/                   # Virtual environment
├── scrapy.cfg
├── ecommerce/
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders/
│       ├── __init__.py
│       └── products.py
└── requirements.txt        # Dependencies list
Enter fullscreen mode Exit fullscreen mode

Requirements.txt: Sharing Dependencies

Creating requirements.txt

pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

Example requirements.txt:

scrapy==2.11.0
scrapy-playwright==0.0.34
playwright==1.40.0
requests==2.31.0
pandas==2.1.4
psycopg2-binary==2.9.9
Enter fullscreen mode Exit fullscreen mode

Using requirements.txt

Someone else setting up your project:

# Clone project
git clone https://github.com/user/project.git
cd project

# Create venv
python3 -m venv venv
source venv/bin/activate

# Install all dependencies at once
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Boom! Exact same environment.


Deactivating Virtual Environment

When done working:

deactivate
Enter fullscreen mode Exit fullscreen mode

Prompt returns to normal:

user@computer:~/project$
Enter fullscreen mode Exit fullscreen mode

Now you're back in system Python.


Multiple Virtual Environments

Different projects, different venvs:

# Project 1
cd ~/projects/scraper1
source venv/bin/activate
python --version  # Python 3.9.18

# Deactivate
deactivate

# Project 2
cd ~/projects/scraper2
source venv/bin/activate
python --version  # Python 3.11.7

# They don't interfere!
Enter fullscreen mode Exit fullscreen mode

Virtual Environment Best Practices

Practice 1: One venv per project

# GOOD
project1/venv/
project2/venv/
project3/venv/

# BAD (don't share venv)
shared_venv/
├── project1/
├── project2/
└── project3/
Enter fullscreen mode Exit fullscreen mode

Practice 2: Name it "venv"

Standard name makes it easy to remember and .gitignore.

python3 -m venv venv  # Standard
# Not: python3 -m venv env
# Not: python3 -m venv virtualenv
Enter fullscreen mode Exit fullscreen mode

Practice 3: Add venv to .gitignore

Never commit venv to git!

# .gitignore
venv/
*.pyc
__pycache__/
.scrapy/
Enter fullscreen mode Exit fullscreen mode

Practice 4: Always activate before working

# Start work session
cd project
source venv/bin/activate

# Do work
scrapy crawl myspider

# End session
deactivate
Enter fullscreen mode Exit fullscreen mode

Practice 5: Keep requirements.txt updated

After installing new packages:

pip install new-package
pip freeze > requirements.txt
git add requirements.txt
git commit -m "Add new-package"
Enter fullscreen mode Exit fullscreen mode

Common Workflows

Workflow 1: Starting New Project

# Create project
mkdir my_scraper && cd my_scraper

# Setup venv
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install Scrapy
pip install scrapy

# Create project
scrapy startproject myproject .

# Save dependencies
pip freeze > requirements.txt

# Git
git init
echo "venv/" > .gitignore
git add .
git commit -m "Initial commit"
Enter fullscreen mode Exit fullscreen mode

Workflow 2: Working on Existing Project

# Clone project
git clone https://github.com/user/project.git
cd project

# Setup venv
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers if needed
playwright install

# Run spider
scrapy crawl myspider
Enter fullscreen mode Exit fullscreen mode

Workflow 3: Updating Dependencies

# Activate venv
source venv/bin/activate

# Update package
pip install --upgrade scrapy

# Update requirements.txt
pip freeze > requirements.txt

# Test everything still works
scrapy crawl myspider

# Commit changes
git add requirements.txt
git commit -m "Update Scrapy to 2.11.0"
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Virtual Environments

Issue 1: "python not found" after activation

Problem: Wrong Python version or venv corrupted.

Solution:

deactivate
rm -rf venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Issue 2: Packages not found

Problem: Forgot to activate venv.

Check:

which python
# Should show: /path/to/project/venv/bin/python
# Not: /usr/bin/python
Enter fullscreen mode Exit fullscreen mode

Fix:

source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

Issue 3: Different pip

Problem: Using system pip instead of venv pip.

Check:

which pip
# Should show: /path/to/project/venv/bin/pip
Enter fullscreen mode Exit fullscreen mode

Fix:

source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

Or use python -m pip:

python -m pip install scrapy
Enter fullscreen mode Exit fullscreen mode

Issue 4: Venv too large

Problem: Venv takes up lots of space.

Check size:

du -sh venv
Enter fullscreen mode Exit fullscreen mode

Common causes:

  • Downloaded packages cached
  • Multiple Python versions

Clean pip cache:

pip cache purge
Enter fullscreen mode Exit fullscreen mode

Advanced: Shell Aliases

Make venv management easier with aliases.

Add to ~/.bashrc or ~/.zshrc

# Create and activate venv
alias venv-create='python3 -m venv venv'
alias venv-activate='source venv/bin/activate'

# Common venv commands
alias venv-install='pip install -r requirements.txt'
alias venv-save='pip freeze > requirements.txt'
alias venv-clean='deactivate && rm -rf venv'

# Quick project setup
venv-new() {
    python3 -m venv venv
    source venv/bin/activate
    pip install --upgrade pip
    pip install scrapy
    pip freeze > requirements.txt
}
Enter fullscreen mode Exit fullscreen mode

Usage

# Create new project
cd myproject
venv-new

# Work on existing project
cd existing-project
venv-activate
Enter fullscreen mode Exit fullscreen mode

Virtual Environment with Docker

For ultimate isolation, use Docker with virtual environments.

Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Create venv
RUN python -m venv /app/venv

# Activate venv
ENV PATH="/app/venv/bin:$PATH"

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy project
COPY . .

CMD ["scrapy", "crawl", "myspider"]
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Isolated OS environment
  • Consistent across all systems
  • Easy deployment

Real-World Project Structure

Complete Setup

my_scraper/
├── venv/                      # Virtual environment (not in git)
├── .gitignore                 # Ignore venv, cache, etc
├── README.md                  # Project documentation
├── requirements.txt           # Python dependencies
├── scrapy.cfg                 # Scrapy config
├── myproject/
│   ├── __init__.py
│   ├── settings.py
│   ├── items.py
│   ├── pipelines.py
│   ├── middlewares.py
│   └── spiders/
│       ├── __init__.py
│       ├── products.py
│       └── categories.py
├── scripts/
│   ├── run_spider.sh          # Helper scripts
│   └── deploy.sh
├── data/                      # Output data
└── logs/                      # Log files
Enter fullscreen mode Exit fullscreen mode

README.md

# My Scraper Project

## Setup

1. Create virtual environment:
Enter fullscreen mode Exit fullscreen mode


bash
python3 -m venv venv


2. Activate virtual environment:
Enter fullscreen mode Exit fullscreen mode


bash
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows


3. Install dependencies:
Enter fullscreen mode Exit fullscreen mode


bash
pip install -r requirements.txt


4. Install Playwright browsers (if needed):
Enter fullscreen mode Exit fullscreen mode


bash
playwright install


## Running

Enter fullscreen mode Exit fullscreen mode


bash

Activate venv

source venv/bin/activate

Run spider

scrapy crawl products

Or use script

./scripts/run_spider.sh products


## Development

Update dependencies:
Enter fullscreen mode Exit fullscreen mode


bash
pip install new-package
pip freeze > requirements.txt

Enter fullscreen mode Exit fullscreen mode

Virtual Environment in Production

On Server

# Deploy to server
ssh user@server
cd /opt/scrapers

# Clone project
git clone https://github.com/user/project.git myproject
cd myproject

# Setup venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run with nohup
nohup scrapy crawl myspider > spider.log 2>&1 &
Enter fullscreen mode Exit fullscreen mode

Systemd Service (Better)

# /etc/systemd/system/myspider.service
[Unit]
Description=My Scrapy Spider
After=network.target

[Service]
Type=simple
User=scrapy
WorkingDirectory=/opt/scrapers/myproject
Environment="PATH=/opt/scrapers/myproject/venv/bin"
ExecStart=/opt/scrapers/myproject/venv/bin/scrapy crawl myspider
Restart=always

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode

Uses venv automatically!


Comparing Virtual Environment Tools

venv (Built-in)

Pros:

  • Built into Python 3.3+
  • No installation needed
  • Simple and fast
  • Official

Cons:

  • Basic features only
  • Can't use different Python versions easily

Best for: Most Scrapy projects

virtualenv

Pros:

  • More features than venv
  • Faster
  • Works with Python 2

Cons:

  • Need to install it
  • Extra dependency

Best for: Complex projects, Python 2 support

conda

Pros:

  • Manages Python versions
  • Non-Python dependencies (C libraries)
  • Great for data science

Cons:

  • Large download
  • Slower
  • Overkill for Scrapy

Best for: Data science + scraping projects

pipenv

Pros:

  • Manages venv + requirements.txt automatically
  • Lockfile for reproducibility
  • Modern workflow

Cons:

  • Slower
  • More complex
  • Extra dependency

Best for: Teams wanting strict dependency management

poetry

Pros:

  • Modern dependency management
  • Great for packaging
  • Lockfile

Cons:

  • Learning curve
  • Different workflow
  • Extra complexity

Best for: Publishing packages, large projects

My recommendation for Scrapy: Use built-in venv. Simple, fast, works everywhere.


Summary

Why virtual environments:

  • Isolate project dependencies
  • Avoid conflicts
  • Clean system Python
  • Reproducible environments
  • Easy cleanup

Basic commands:

# Create
python3 -m venv venv

# Activate
source venv/bin/activate

# Install
pip install scrapy

# Save dependencies
pip freeze > requirements.txt

# Deactivate
deactivate

# Remove
rm -rf venv
Enter fullscreen mode Exit fullscreen mode

Best practices:

  • One venv per project
  • Name it "venv"
  • Add to .gitignore
  • Keep requirements.txt updated
  • Always activate before work
  • Deactivate when done

Remember:

  • Never commit venv to git
  • Always use requirements.txt
  • Activate venv before running spider
  • Update requirements.txt after changes
  • Document setup in README

Virtual environments are not optional. They're essential for professional Python development. Use them for every Scrapy project!

Happy scraping! 🕷️

Top comments (0)