I've dockerized 20+ scraping projects. Every time, I hit the same problems:
- Playwright browsers bloating the image to 2GB+
- Chrome crashing with 'out of memory' in containers
- Different behavior between local and production
- Slow builds when changing one line of code
Here's the Dockerfile I now use for every project. It took months of pain to get right.
The Dockerfile
# Stage 1: Dependencies (cached layer)
FROM python:3.12-slim AS deps
WORKDIR /app
# System deps for Playwright/Chrome
RUN apt-get update && apt-get install -y --no-install-recommends \
libnss3 libatk1.0-0 libatk-bridge2.0-0 libdrm2 \
libxkbcommon0 libxcomposite1 libxdamage1 libxrandr2 \
libgbm1 libasound2 libpango-1.0-0 libcairo2 \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Install ONLY Chromium (not all browsers)
RUN playwright install chromium --with-deps
# Stage 2: App
FROM deps AS app
WORKDIR /app
COPY . .
# Non-root user (important for security)
RUN useradd -m scraper
USER scraper
CMD ["python", "scrape.py"]
Image size: ~650MB (vs 2.1GB with the default Playwright image)
Why This Works
1. Multi-Stage Build
The deps stage is cached. When you change your Python code, Docker only rebuilds the app stage — 5 seconds instead of 5 minutes.
2. Chromium Only
playwright install chromium --with-deps installs just Chromium (~350MB). The default playwright install installs Chromium + Firefox + WebKit (~1.2GB). You almost never need all three.
3. Slim Base Image
python:3.12-slim is 130MB vs python:3.12 at 1GB. The manual apt-get installs only the exact libraries Chromium needs.
4. Non-Root User
Running Chrome as root in Docker works but is a security risk. The scraper user prevents container escape attacks.
The docker-compose.yml
services:
scraper:
build: .
restart: unless-stopped
environment:
- PYTHONUNBUFFERED=1
deploy:
resources:
limits:
memory: 1G # Prevent Chrome from eating all RAM
cpus: '1.0'
volumes:
- ./data:/app/data # Persist scraped data
shm_size: '256m' # CRITICAL: Chrome needs this
The shm_size Trick
Chrome in Docker crashes with "out of memory" even with 4GB RAM. The issue is /dev/shm — Docker defaults it to 64MB, Chrome needs more.
Three fixes:
# Option 1: Increase shm_size
shm_size: '256m'
# Option 2: Disable shm usage in Chrome
# In your Python code:
browser = playwright.chromium.launch(
args=['--disable-dev-shm-usage']
)
# Option 3: Mount /dev/shm
volumes:
- /dev/shm:/dev/shm
I use Option 1 — cleanest solution.
Cron Scheduling
For daily scrapes, add a cron service:
scheduler:
image: mcuadros/ofelia:latest
volumes:
- /var/run/docker.sock:/var/run/docker.sock
labels:
ofelia.job-local.scrape.schedule: "0 8 * * *"
ofelia.job-local.scrape.command: "docker compose run --rm scraper"
Or simply use crontab on the host:
0 8 * * * cd /app && docker compose run --rm scraper >> /var/log/scraper.log 2>&1
Common Mistakes I Made (So You Don't Have To)
1. Installing chromium-browser via apt
# BAD — version mismatch with Playwright
RUN apt-get install -y chromium-browser
# GOOD — Playwright downloads the exact version it needs
RUN playwright install chromium
2. Missing --no-cache-dir
# BAD — pip cache bloats the image by 200MB+
RUN pip install -r requirements.txt
# GOOD
RUN pip install --no-cache-dir -r requirements.txt
3. COPY . . Before requirements
# BAD — any code change invalidates pip cache
COPY . .
RUN pip install -r requirements.txt
# GOOD — requirements layer is cached
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
4. Running as Root
# BAD — security risk
CMD ["python", "scrape.py"]
# GOOD
RUN useradd -m scraper
USER scraper
CMD ["python", "scrape.py"]
Performance Numbers
| Metric | Before | After |
|---|---|---|
| Image size | 2.1 GB | 650 MB |
| Build time (cold) | 8 min | 3 min |
| Build time (code change) | 8 min | 5 sec |
| Memory usage | 1.5 GB | 800 MB |
| Chrome crashes | ~2/day | 0 |
Full starter template: python-web-scraping-starter
More scraping tools: awesome-web-scraping-2026
What does your Docker setup for scraping look like? Any tricks I'm missing? 👇
More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs
Top comments (0)