Muhammad Ikramullah Khan

Posted on Jan 14

Virtual Environments: The Right Way to Run Scrapy Projects

#python #webdev #programming #beginners

I installed Scrapy globally on my laptop. Then I started a new project that needed a different Scrapy version. I upgraded Scrapy and my old project broke. Dependencies conflicted. Everything was a mess.

I had to reinstall Python to fix it. Two days wasted.

Then I learned about virtual environments. Each project gets its own isolated Python environment. No more conflicts, no more breaking old projects, no more dependency hell.

Let me show you why virtual environments are essential and how to use them properly.

What Is a Virtual Environment?

A virtual environment is an isolated Python installation for your project.

Think of it like:

A separate apartment for each project
Each has its own furniture (packages)
Changes in one don't affect others

Without virtual environment:

System Python
├── scrapy==2.11
├── requests==2.31
└── All projects share these

If you upgrade for one project, ALL projects affected!

With virtual environments:

System Python
│
├── project1/venv/
│   ├── scrapy==2.11
│   └── requests==2.31
│
├── project2/venv/
│   ├── scrapy==2.8
│   └── requests==2.28
│
└── project3/venv/
    ├── scrapy==3.0
    └── requests==2.32

Each project is independent!

Why Virtual Environments Are Essential

Reason 1: Avoid Dependency Conflicts

Scenario:

Project A needs scrapy==2.8
Project B needs scrapy==2.11
Without venv: IMPOSSIBLE
With venv: No problem!

Reason 2: Clean System Python

Keep your system Python clean. Don't pollute it with hundreds of packages.

# BAD: Installing globally
pip install scrapy  # Goes to system Python

# GOOD: Installing in venv
source venv/bin/activate
pip install scrapy  # Goes to venv only

Reason 3: Reproducible Environments

Share exact dependencies with team:

pip freeze > requirements.txt

Anyone can recreate your exact environment:

pip install -r requirements.txt

Reason 4: Easy Cleanup

Delete project? Delete venv folder. Done. No traces left.

rm -rf venv/  # Clean removal

Reason 5: Different Python Versions

Different projects can use different Python versions:

# Project 1: Python 3.9
python3.9 -m venv venv

# Project 2: Python 3.11
python3.11 -m venv venv

Creating Virtual Environments

Method 1: Using venv (Built-in)

Python 3.3+ includes venv module:

# Create virtual environment
python3 -m venv venv

This creates a venv/ folder with isolated Python.

What's inside venv/:

venv/
├── bin/          # Executables (python, pip, scrapy)
├── lib/          # Installed packages
├── include/      # C headers
└── pyvenv.cfg    # Configuration

Method 2: Using virtualenv (Third-party)

More features than built-in venv:

# Install virtualenv
pip install virtualenv

# Create virtual environment
virtualenv venv

Method 3: Using conda

For data science projects:

conda create -n myproject python=3.11

My recommendation: Use built-in venv for simplicity.

Activating Virtual Environments

Linux/Mac

source venv/bin/activate

Your prompt changes:

(venv) user@computer:~/project$

The (venv) shows you're inside the virtual environment.

Windows

venv\Scripts\activate

Verify Activation

which python
# Output: /home/user/project/venv/bin/python

which pip
# Output: /home/user/project/venv/bin/pip

Both point to venv, not system!

Installing Scrapy in Virtual Environment

Step-by-Step

# 1. Create project directory
mkdir my_scraper
cd my_scraper

# 2. Create virtual environment
python3 -m venv venv

# 3. Activate it
source venv/bin/activate

# 4. Upgrade pip (optional but recommended)
pip install --upgrade pip

# 5. Install Scrapy
pip install scrapy

# 6. Verify installation
scrapy version

Output:

Scrapy 2.11.0

Install Additional Packages

pip install scrapy-playwright
pip install scrapy-selenium
pip install pandas
pip install psycopg2-binary

All go into your venv, not system!

Creating a Scrapy Project in Virtual Environment

Complete Workflow

# 1. Create and activate venv
mkdir ecommerce_scraper
cd ecommerce_scraper
python3 -m venv venv
source venv/bin/activate

# 2. Install Scrapy
pip install scrapy

# 3. Create Scrapy project
scrapy startproject ecommerce .

# 4. Create a spider
cd ecommerce/spiders
scrapy genspider products example.com

# 5. Install additional packages
pip install scrapy-playwright

# 6. Save dependencies
pip freeze > requirements.txt

Your directory structure:

ecommerce_scraper/
├── venv/                   # Virtual environment
├── scrapy.cfg
├── ecommerce/
│   ├── __init__.py
│   ├── items.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders/
│       ├── __init__.py
│       └── products.py
└── requirements.txt        # Dependencies list

Requirements.txt: Sharing Dependencies

Creating requirements.txt

pip freeze > requirements.txt

Example requirements.txt:

scrapy==2.11.0
scrapy-playwright==0.0.34
playwright==1.40.0
requests==2.31.0
pandas==2.1.4
psycopg2-binary==2.9.9

Using requirements.txt

Someone else setting up your project:

# Clone project
git clone https://github.com/user/project.git
cd project

# Create venv
python3 -m venv venv
source venv/bin/activate

# Install all dependencies at once
pip install -r requirements.txt

Boom! Exact same environment.

Deactivating Virtual Environment

When done working:

deactivate

Prompt returns to normal:

user@computer:~/project$

Now you're back in system Python.

Multiple Virtual Environments

Different projects, different venvs:

# Project 1
cd ~/projects/scraper1
source venv/bin/activate
python --version  # Python 3.9.18

# Deactivate
deactivate

# Project 2
cd ~/projects/scraper2
source venv/bin/activate
python --version  # Python 3.11.7

# They don't interfere!

Virtual Environment Best Practices

Practice 1: One venv per project

# GOOD
project1/venv/
project2/venv/
project3/venv/

# BAD (don't share venv)
shared_venv/
├── project1/
├── project2/
└── project3/

Practice 2: Name it "venv"

Standard name makes it easy to remember and .gitignore.

python3 -m venv venv  # Standard
# Not: python3 -m venv env
# Not: python3 -m venv virtualenv

Practice 3: Add venv to .gitignore

Never commit venv to git!

# .gitignore
venv/
*.pyc
__pycache__/
.scrapy/

Practice 4: Always activate before working

# Start work session
cd project
source venv/bin/activate

# Do work
scrapy crawl myspider

# End session
deactivate

Practice 5: Keep requirements.txt updated

After installing new packages:

pip install new-package
pip freeze > requirements.txt
git add requirements.txt
git commit -m "Add new-package"

Common Workflows

Workflow 1: Starting New Project

# Create project
mkdir my_scraper && cd my_scraper

# Setup venv
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# Install Scrapy
pip install scrapy

# Create project
scrapy startproject myproject .

# Save dependencies
pip freeze > requirements.txt

# Git
git init
echo "venv/" > .gitignore
git add .
git commit -m "Initial commit"

Workflow 2: Working on Existing Project

# Clone project
git clone https://github.com/user/project.git
cd project

# Setup venv
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers if needed
playwright install

# Run spider
scrapy crawl myspider

Workflow 3: Updating Dependencies

# Activate venv
source venv/bin/activate

# Update package
pip install --upgrade scrapy

# Update requirements.txt
pip freeze > requirements.txt

# Test everything still works
scrapy crawl myspider

# Commit changes
git add requirements.txt
git commit -m "Update Scrapy to 2.11.0"

Troubleshooting Virtual Environments

Issue 1: "python not found" after activation

Problem: Wrong Python version or venv corrupted.

Solution:

deactivate
rm -rf venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Issue 2: Packages not found

Problem: Forgot to activate venv.

Check:

which python
# Should show: /path/to/project/venv/bin/python
# Not: /usr/bin/python

Fix:

source venv/bin/activate

Issue 3: Different pip

Problem: Using system pip instead of venv pip.

Check:

which pip
# Should show: /path/to/project/venv/bin/pip

Fix:

source venv/bin/activate

Or use python -m pip:

python -m pip install scrapy

Issue 4: Venv too large

Problem: Venv takes up lots of space.

Check size:

du -sh venv

Common causes:

Downloaded packages cached
Multiple Python versions

Clean pip cache:

pip cache purge

Advanced: Shell Aliases

Make venv management easier with aliases.

Add to ~/.bashrc or ~/.zshrc

# Create and activate venv
alias venv-create='python3 -m venv venv'
alias venv-activate='source venv/bin/activate'

# Common venv commands
alias venv-install='pip install -r requirements.txt'
alias venv-save='pip freeze > requirements.txt'
alias venv-clean='deactivate && rm -rf venv'

# Quick project setup
venv-new() {
    python3 -m venv venv
    source venv/bin/activate
    pip install --upgrade pip
    pip install scrapy
    pip freeze > requirements.txt
}

Usage

# Create new project
cd myproject
venv-new

# Work on existing project
cd existing-project
venv-activate

Virtual Environment with Docker

For ultimate isolation, use Docker with virtual environments.

Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Create venv
RUN python -m venv /app/venv

# Activate venv
ENV PATH="/app/venv/bin:$PATH"

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy project
COPY . .

CMD ["scrapy", "crawl", "myspider"]

Benefits:

Isolated OS environment
Consistent across all systems
Easy deployment

Real-World Project Structure

Complete Setup

my_scraper/
├── venv/                      # Virtual environment (not in git)
├── .gitignore                 # Ignore venv, cache, etc
├── README.md                  # Project documentation
├── requirements.txt           # Python dependencies
├── scrapy.cfg                 # Scrapy config
├── myproject/
│   ├── __init__.py
│   ├── settings.py
│   ├── items.py
│   ├── pipelines.py
│   ├── middlewares.py
│   └── spiders/
│       ├── __init__.py
│       ├── products.py
│       └── categories.py
├── scripts/
│   ├── run_spider.sh          # Helper scripts
│   └── deploy.sh
├── data/                      # Output data
└── logs/                      # Log files

README.md

# My Scraper Project

## Setup

1. Create virtual environment:

bash
python3 -m venv venv


2. Activate virtual environment:

bash
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows


3. Install dependencies:

bash
pip install -r requirements.txt


4. Install Playwright browsers (if needed):

bash
playwright install


## Running

bash

Activate venv

source venv/bin/activate

Run spider

scrapy crawl products

Or use script

./scripts/run_spider.sh products


## Development

Update dependencies:

bash
pip install new-package
pip freeze > requirements.txt

Virtual Environment in Production

On Server

# Deploy to server
ssh user@server
cd /opt/scrapers

# Clone project
git clone https://github.com/user/project.git myproject
cd myproject

# Setup venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run with nohup
nohup scrapy crawl myspider > spider.log 2>&1 &

Systemd Service (Better)

# /etc/systemd/system/myspider.service
[Unit]
Description=My Scrapy Spider
After=network.target

[Service]
Type=simple
User=scrapy
WorkingDirectory=/opt/scrapers/myproject
Environment="PATH=/opt/scrapers/myproject/venv/bin"
ExecStart=/opt/scrapers/myproject/venv/bin/scrapy crawl myspider
Restart=always

[Install]
WantedBy=multi-user.target

Uses venv automatically!

Comparing Virtual Environment Tools

venv (Built-in)

Pros:

Built into Python 3.3+
No installation needed
Simple and fast
Official

Cons:

Basic features only
Can't use different Python versions easily

Best for: Most Scrapy projects

virtualenv

Pros:

More features than venv
Faster
Works with Python 2

Cons:

Need to install it
Extra dependency

Best for: Complex projects, Python 2 support

conda

Pros:

Manages Python versions
Non-Python dependencies (C libraries)
Great for data science

Cons:

Large download
Slower
Overkill for Scrapy

Best for: Data science + scraping projects

pipenv

Pros:

Manages venv + requirements.txt automatically
Lockfile for reproducibility
Modern workflow

Cons:

Slower
More complex
Extra dependency

Best for: Teams wanting strict dependency management

poetry

Pros:

Modern dependency management
Great for packaging
Lockfile

Cons:

Learning curve
Different workflow
Extra complexity

Best for: Publishing packages, large projects

My recommendation for Scrapy: Use built-in venv. Simple, fast, works everywhere.

Summary

Why virtual environments:

Isolate project dependencies
Avoid conflicts
Clean system Python
Reproducible environments
Easy cleanup

Basic commands:

# Create
python3 -m venv venv

# Activate
source venv/bin/activate

# Install
pip install scrapy

# Save dependencies
pip freeze > requirements.txt

# Deactivate
deactivate

# Remove
rm -rf venv