DEV Community

Cover image for Virtual Environments: Why Your Projects Keep Breaking Each Other
Akhilesh
Akhilesh

Posted on

Virtual Environments: Why Your Projects Keep Breaking Each Other

You install NumPy for Project A. Works great.

Six months later you start Project B. It needs a newer version of NumPy with features that did not exist before. You upgrade. Project B works.

You go back to Project A. It breaks. The new NumPy version changed something your old code depended on.

You downgrade NumPy. Project A works again. Project B breaks.

You go in circles. Neither project is stable. Your global Python installation is a mess of conflicting requirements.

This is the dependency hell that every Python developer hits eventually. Virtual environments are the solution. One isolated environment per project. Each with its own packages, its own versions, no conflict possible.


What a Virtual Environment Actually Is

A virtual environment is a folder. Inside that folder lives a complete, self-contained Python installation: the Python interpreter, pip, and all the packages you install while the environment is active.

When you activate an environment, your terminal uses the Python and pip inside that folder instead of the global ones. Install something while active and it goes into the folder. Deactivate and you are back to the global installation. The folder's packages are gone from view.

Nothing is shared between environments. NumPy 1.20 in one environment and NumPy 1.26 in another coexist perfectly because they live in different folders.


Creating and Using a Virtual Environment

python -m venv venv
Enter fullscreen mode Exit fullscreen mode

This creates a folder called venv in your current directory. The folder contains a full Python installation.

Activate it:

# Mac and Linux
source venv/bin/activate

# Windows Command Prompt
venv\Scripts\activate.bat

# Windows PowerShell
venv\Scripts\Activate.ps1
Enter fullscreen mode Exit fullscreen mode

Your terminal prompt changes to show the environment name:

(venv) user@machine:~/my_project$
Enter fullscreen mode Exit fullscreen mode

The (venv) prefix tells you the environment is active. Every pip install now goes into this environment, not the global Python.

pip install pandas numpy matplotlib scikit-learn
Enter fullscreen mode Exit fullscreen mode

These install only into the venv folder. No other project is affected.

Deactivate when done:

deactivate
Enter fullscreen mode Exit fullscreen mode

The (venv) prefix disappears. You are back to global Python.


requirements.txt: The File That Recreates Everything

Once you have installed your packages, save them:

pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

Open requirements.txt and it looks like this:

joblib==1.3.2
matplotlib==3.8.2
numpy==1.26.3
pandas==2.1.4
pillow==10.2.0
scikit-learn==1.3.2
scipy==1.12.0
seaborn==0.13.1
Enter fullscreen mode Exit fullscreen mode

Every package and its exact version. On a new machine, or when a collaborator clones your project, they recreate the exact same environment:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Three commands. Identical environment. Every package at the exact version you developed with.

This is reproducibility. A year from now, you or anyone else can clone your project, run those three commands, and have everything working exactly as it did today.


What Goes in .gitignore

The venv folder itself should never go into Git. It is large (hundreds of megabytes), platform-specific, and completely regenerable from requirements.txt.

Your .gitignore should include:

venv/
.venv/
env/
ENV/
Enter fullscreen mode Exit fullscreen mode

What goes into Git: the requirements.txt file. That is all you need to recreate the environment.

git add requirements.txt
git commit -m "Add requirements.txt with project dependencies"
Enter fullscreen mode Exit fullscreen mode

Conda: The Alternative Worth Knowing

venv is built into Python and handles Python packages. Conda handles both Python packages and non-Python dependencies (C libraries, CUDA, R packages, etc.).

If you install Anaconda or Miniconda (the lightweight version):

conda create -n ml_env python=3.11
conda activate ml_env
conda install numpy pandas scikit-learn
pip install plotly streamlit
Enter fullscreen mode Exit fullscreen mode

Conda environments work the same way as venv: isolated, per-project, reproducible. The advantage is that Conda can install things like CUDA dependencies and compiled libraries that pip sometimes cannot handle cleanly.

Save a Conda environment:

conda env export > environment.yml
Enter fullscreen mode Exit fullscreen mode

Recreate it:

conda env create -f environment.yml
Enter fullscreen mode Exit fullscreen mode

For data science and AI work, Conda handles GPU dependencies (CUDA, cuDNN) more reliably than pure pip. When you reach the deep learning phase and need specific CUDA versions, Conda is the tool that makes installation clean.

For now, venv is simpler and sufficient for everything through Phase 6 of this series.


The Standard Project Structure

Every project you build from here should start the same way:

mkdir new_project
cd new_project
git init
python -m venv venv
source venv/bin/activate
pip install <your packages>
pip freeze > requirements.txt
echo "venv/" > .gitignore
echo "__pycache__/" >> .gitignore
echo "*.pyc" >> .gitignore
echo ".env" >> .gitignore
git add .
git commit -m "Initial project setup with virtual environment"
Enter fullscreen mode Exit fullscreen mode

Ten commands. Every new project. Five minutes. Then you have a clean, isolated, version-controlled starting point.

Structure the folders too:

my_project/
├── data/
│   ├── raw/
│   └── processed/
├── notebooks/
├── src/
├── tests/
├── venv/
├── .gitignore
├── requirements.txt
└── README.md
Enter fullscreen mode Exit fullscreen mode

data/raw/ for original downloaded data, never modified. data/processed/ for cleaned versions. notebooks/ for Jupyter notebooks. src/ for Python modules and scripts. tests/ for test files. This structure is recognizable to every data scientist and AI engineer who reviews your projects.


Managing Multiple Python Versions

Sometimes you need Python 3.9 for one project and Python 3.11 for another. pyenv handles this.

# Install pyenv (Mac/Linux)
curl https://pyenv.run | bash

# Install a specific Python version
pyenv install 3.11.7
pyenv install 3.9.18

# Set Python version for a specific directory
cd my_old_project
pyenv local 3.9.18

cd my_new_project
pyenv local 3.11.7
Enter fullscreen mode Exit fullscreen mode

pyenv local creates a .python-version file in the folder. Every time you cd into that folder, pyenv automatically switches to that Python version. Combined with venv, you have complete control over both Python version and package versions per project.

Windows users can use pyenv-win, same concept different installer.


The .env File: Environment Variables

API keys, database passwords, and configuration values should never be hardcoded in your Python files and never committed to Git.

Store them in a .env file:

DATABASE_URL=postgresql://user:password@localhost:5432/mydb
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
DEBUG=True
MODEL_PATH=/data/models/best_model.pkl
Enter fullscreen mode Exit fullscreen mode

Load them in Python with python-dotenv:

pip install python-dotenv
Enter fullscreen mode Exit fullscreen mode
from dotenv import load_dotenv
import os

load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")
db_url  = os.getenv("DATABASE_URL")

print(f"API key loaded: {'yes' if api_key else 'no'}")
Enter fullscreen mode Exit fullscreen mode

Add .env to your .gitignore immediately. It should never, ever appear in your Git history. If you accidentally commit an API key, rotate it immediately and consider it compromised.

Commit a .env.example file instead:

DATABASE_URL=your_database_url_here
OPENAI_API_KEY=your_api_key_here
ANTHROPIC_API_KEY=your_api_key_here
Enter fullscreen mode Exit fullscreen mode

This tells collaborators what variables they need to set without exposing actual values.


Quick Reference

python -m venv venv              # create environment
source venv/bin/activate         # activate (Mac/Linux)
venv\Scripts\activate            # activate (Windows)
deactivate                       # deactivate
pip install package              # install into active env
pip freeze > requirements.txt    # save dependencies
pip install -r requirements.txt  # install from file
pip list                         # list installed packages
pip show pandas                  # details about one package
pip uninstall pandas             # remove a package
Enter fullscreen mode Exit fullscreen mode

A Resource Worth Reading

Hynek Schlawack, a well-respected Python developer, wrote "Hypermodern Python" on medium/his blog that covers virtual environments, dependency management, and project structure as professional Python developers actually use them. More advanced than this post but the right next step once the basics click. Search "Hynek Schlawack Hypermodern Python."

The Real Python tutorial called "Python Virtual Environments: A Primer" at realpython.com goes deep on how environments work under the hood, not just how to use them. Understanding why they work the way they do makes the occasional weird behavior (like forgetting to activate) immediately diagnosable. Search "Real Python virtual environments primer."


Try This

Create three separate project folders: phase6_ml, phase7_dl, and phase8_nlp.

For each one:

Initialize a Git repository. Create a virtual environment. Activate it. Install the packages appropriate for that phase (scikit-learn and pandas for ml, torch and torchvision for dl, transformers and tokenizers for nlp). Save a requirements.txt. Create a proper .gitignore. Create a basic README.md. Make an initial commit.

Verify isolation: activate the phase6_ml environment and confirm import torch fails. Activate phase7_dl and confirm import sklearn works. Each environment has exactly what it needs and nothing else.

These three repositories will hold your Phase 6, 7, and 8 projects. They are ready.


Phase 5 Is Now Actually Complete

Post 49 closes the tools phase. Git, GitHub, Jupyter, Colab, and virtual environments. The infrastructure is in place.

This is the last time you set up before doing. From here forward, every post builds something that runs, predicts, classifies, generates, or deploys.

Phase 6 starts next post. Machine learning from scratch, beginning with the most important question in the field: what does it actually mean for a machine to learn?

Top comments (0)