If you’re trying to run PySpark on Windows with Python 3.13, you’ll quickly run into errors like:
AttributeError: module 'socketserver' has no attribute 'UnixStreamServer'
This can be frustrating—especially when your code is perfectly fine.
In this post, I’ll walk you through a complete, working setup for PySpark on Windows by:
- Installing Python 3.11 alongside Python 3.13
- Creating a clean virtual environment
- Installing a compatible PySpark version
- Optionally fixing Windows-specific Spark warnings using winutils.exe
This setup is stable, beginner-friendly, and recommended for learning and local development.
Why PySpark Fails with Python 3.13
The problem isn’t your code—it’s compatibility.
- PySpark does not yet support Python 3.13
- PySpark 4.x has known issues on Windows
- Some internal APIs were removed in Python 3.13 that PySpark still relies on
✅ The correct combination on Windows is:
- Python 3.11
- PySpark 3.5.x
- Java 8 or 11
Step 1: Install Python 3.11 (Side-by-Side)
Do not uninstall Python 3.13. Instead, install Python 3.11 alongside it.
-
Download Python 3.11 (64-bit):
-
Run the installer:
- Check “Add Python to PATH”
- Click Customize installation
- Enable “Install for all users”
Finish the installation
Verify it worked:
py -3.11 --version
You should see:
Python 3.11.x
Step 2: Allow Virtual Environment Activation in PowerShell
By default, Windows blocks script execution, which prevents virtual environments from activating.
Run this once:
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
Press Y to confirm.
This change is safe and only applies to your user account.
Step 3: Create a Python 3.11 Virtual Environment
Navigate to your project directory:
cd C:\Users\User\Desktop\Training\Week5
Remove any old virtual environment:
Remove-Item -Recurse -Force venv
Create a new one using Python 3.11:
py -3.11 -m venv venv
Activate it:
venv\Scripts\activate
You should now see:
(venv)
Confirm the Python version:
python --version
Expected output:
Python 3.11.x
Step 4: Install the Correct PySpark Version
Do not install the latest PySpark blindly.
❌ Avoid
pip install pyspark
✅ Install the Windows-safe version
pip install pyspark==3.5.1
Verify:
pip show pyspark
You should see:
Version: 3.5.1
Step 5: Test Your Spark Setup
Create a file called Lab1.py:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Test").getOrCreate()
df = spark.range(10)
df.show()
spark.stop()
Run it:
python Lab1.py
If you see numbers from 0 to 9, Spark is running successfully 🎉
Common Mistakes to Avoid
-
Running Python explicitly from 3.13:
C:\...\Python313\python.exe Lab1.py Installing PySpark 4.x on Windows
Using Python 3.12 or newer with Spark
Forgetting to activate the virtual environment
Golden Rule
When (venv) is active, always use python, never a full Python path.
Optional: Fix winutils.exe Warnings on Windows
You may see warnings like:
Did not find winutils.exe
HADOOP_HOME and hadoop.home.dir are unset
Spark works fine without winutils, but adding it removes these warnings.
Which winutils Version to Use
- Hadoop version: 3.3.6
- Compatible with Spark 3.5.x
Setup winutils
- Create this folder:
C:\hadoop\bin\
- Place
winutils.exeinside:
C:\hadoop\bin\winutils.exe
- Set environment variables:
setx HADOOP_HOME C:\hadoop
setx PATH "%PATH%;C:\hadoop\bin"
- Restart PowerShell and verify:
winutils.exe
If usage info prints, it’s working.
Final Working Setup
| Component | Version |
|---|---|
| Python | 3.11.x |
| PySpark | 3.5.1 |
| Hadoop (winutils) | 3.3.6 |
| OS | Windows |
| Virtual Environment | Enabled |
Conclusion
Setting up PySpark on Windows requires careful version alignment, but once configured correctly, it works reliably.
By:
- Keeping Python 3.13 installed
- Using Python 3.11 in a virtual environment
- Pinning PySpark to 3.5.1
- Optionally configuring winutils
you now have a stable Spark development environment on Windows.
Happy coding 🚀
Top comments (0)