masoomjethwa

Posted on Sep 21

Setting Up a Scalable JupyterHub Classroom on Debian 12 LTS with DockerSpawner

#devops #education #docker #jupyter

Hey Dev.to community! If you're an educator, data scientist, or sysadmin looking to set up a multi-user Jupyter environment for teaching or collaboration, you've come to the right place. Today, we're diving into a complete, automated setup for JupyterHub on Debian 12 LTS using Docker and DockerSpawner. This configuration is perfect for classrooms: it provides isolated containers per user, resource limits to prevent overloads, dummy users for testing, benchmarking tools, and even a shared notebook to simulate student workloads.

By the end of this article, you'll have a turnkey script to deploy everything in minutes. We'll cover why this setup rocks for education, the step-by-step automation, testing tips, and extensions for production. Let's automate everything—no manual tinkering required!

Why JupyterHub with DockerSpawner for Classrooms?

JupyterHub is a multi-user hub that serves Jupyter notebooks to multiple users. Pairing it with DockerSpawner takes it to the next level:

Isolation: Each student gets their own Docker container, preventing one user's heavy computation from crashing others.
Scalability: Easy to add resource limits (CPU/RAM) and persistent storage.
Simplicity: Use DummyAuthenticator for quick testing with 20 dummy users (e.g., student01 to student20).
Benchmarking: Built-in tools to monitor system load during simulated classroom sessions.
Persistence: Per-user volumes for saving work, plus a shared folder for common resources like benchmark notebooks.

This setup runs on Debian 12 LTS (stable and secure) and uses jupyter/minimal-notebook as the base image. It's great for teaching data science, ML, or Python basics without worrying about shared environments.

Prerequisites:

A Debian 12 LTS server (VM, cloud instance) with sudo access.
Minimum specs: 16GB RAM, 4-core CPU, 512GB disk for 20 users.
Internet for package installs.

The Automated Setup Script

Here's the magic: a single Bash script that handles dependencies, config, deployment, and monitoring. Copy-paste it into a file (e.g., setup_jupyterhub.sh), make it executable (chmod +x setup_jupyterhub.sh), and run as root (sudo ./setup_jupyterhub.sh).

#!/bin/bash

set -e

echo "🚀 [1/8] Updating system and installing dependencies..."
apt update && apt upgrade -y
apt install -y python3 python3-pip git curl \
               docker.io docker-compose \
               htop iotop iftop sysstat nload stress-ng

echo "🔧 [2/8] Enabling and starting Docker..."
systemctl enable docker
systemctl start docker
usermod -aG docker ${SUDO_USER:-$USER}

echo "📁 [3/8] Creating JupyterHub directory..."
mkdir -p /opt/jupyterhub-dockerspawner/shared
cd /opt/jupyterhub-dockerspawner

echo "🧪 [4/8] Creating benchmark notebook in shared folder..."
cat > shared/benchmark_notebook.ipynb <<EOF
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Benchmark Notebook\\n",
    "This simulates plotting, pandas, numpy, and compute work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\\n",
    "import numpy as np\\n",
    "import matplotlib.pyplot as plt\\n",
    "\\n",
    "df = pd.DataFrame(np.random.randn(1000, 4), columns=list('ABCD'))\\n",
    "df = df.cumsum()\\n",
    "df.plot()\\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute load\\n",
    "for _ in range(10000):\\n",
    "    np.linalg.inv(np.random.rand(10, 10))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.x"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
EOF

echo "📄 [5/8] Creating docker-compose.yml..."
cat > docker-compose.yml <<EOF
version: '3'
services:
  jupyterhub:
    image: jupyterhub/jupyterhub:latest
    container_name: jupyterhub
    restart: always
    ports:
      - "8000:8000"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py
      - ./shared:/srv/jupyterhub/shared
EOF

echo "⚙️ [6/8] Installing DummyAuthenticator and generating config..."
pip3 install jupyterhub-dummyauthenticator dockerspawner
cat > jupyterhub_config.py <<EOF
from dockerspawner import DockerSpawner

c.JupyterHub.spawner_class = 'dockerspawner.DockerSpawner'
c.JupyterHub.ip = '0.0.0.0'
c.JupyterHub.port = 8000

# DockerSpawner config
c.DockerSpawner.image = 'jupyter/minimal-notebook:latest'
c.DockerSpawner.network_name = 'bridge'
c.DockerSpawner.remove = True
c.DockerSpawner.debug = True

# Mount per-user volume and shared folder
c.DockerSpawner.volumes = {
    'jupyterhub-user-{username}': '/home/jovyan/work',
    '/srv/jupyterhub/shared': {'bind': '/home/jovyan/shared', 'mode': 'ro'}
}

# Resource limits per user (adjust as needed)
c.Spawner.cpu_limit = 0.5
c.Spawner.mem_limit = '1G'

# Authentication (Dummy for testing)
c.JupyterHub.authenticator_class = 'dummyauthenticator.DummyAuthenticator'
c.DummyAuthenticator.password = 'pass123'

# Default to JupyterLab interface
c.Spawner.default_url = '/lab'
EOF

echo "📈 [7/8] Creating system monitoring script..."
cat > monitor.sh <<'EOF'
#!/bin/bash
mkdir -p logs
echo "Starting CPU and memory log (every 5s)..."
vmstat 5 > logs/vmstat.log &
echo "Starting disk I/O log..."
iostat -xm 5 > logs/iostat.log &
echo "Starting network log..."
iftop -t -s 60 -L 50 > logs/iftop.log &
echo "Use 'tail -f logs/vmstat.log' to monitor in real time."
EOF
chmod +x monitor.sh

echo "📡 [8/8] Starting JupyterHub..."
docker-compose up -d

IP=$(hostname -I | awk '{print $1}')
echo "✅ Setup complete! Access JupyterHub at: http://$IP:8000"
echo "Login as any of: student01 to student20 | Password: pass123"
echo "To monitor system: ./monitor.sh"
echo "Check containers: docker ps"

This script:

Installs deps like Docker, Compose, and monitoring tools (htop, stress-ng).
Sets up directories and configs.
Deploys JupyterHub via Docker Compose.
Creates a benchmark notebook for load testing.
Adds a monitoring script for logs.

After running, access at http://<your-server-ip>:8000. Login with studentXX (01-20) and password pass123.

Understanding the Key Components

DockerSpawner Magic

DockerSpawner spawns a fresh container for each user login. Config highlights:

Image: jupyter/minimal-notebook:latest – lightweight with Python basics.
Volumes: Per-user (jupyterhub-user-{username}) for persistent /home/jovyan/work; shared folder as read-only.
Limits: 0.5 CPU and 1GB RAM per user – tweak in jupyterhub_config.py for your hardware.

Authentication and Users

We use DummyAuthenticator for simplicity: any username works with the password. Suggest 20 dummy users like student01. For real classes, swap to OAuth (Google/Microsoft) or LDAP.

Benchmarking Notebook

The shared benchmark_notebook.ipynb simulates student work:

Generates and plots random data with pandas/numpy/matplotlib.
Runs CPU-intensive matrix inversions.

Run it across users to test load!

Monitoring Tools

htop/iotop/iftop: Real-time views.
monitor.sh: Logs CPU, memory, disk, network every 5s.
docker stats: Per-container metrics.
stress-ng: For artificial load (e.g., stress-ng --cpu 4 --timeout 60s).

Testing and Benchmarking Your Setup

SSH in and run the script.
Start monitoring: ./monitor.sh.
Login as multiple students via browser.
Open /home/jovyan/shared/benchmark_notebook.ipynb and execute.
Watch resources: Expect spikes but no crashes thanks to limits.
Verify isolation: docker ps shows user-specific containers.

Pro Tip: For 20 users, monitor for bottlenecks. If RAM hits limits, scale your server or adjust caps.

Security: Firewall and Best Practices

Don't leave ports wide open! Set up UFW:

apt install -y ufw
ufw allow 22/tcp   # SSH
ufw allow 8000/tcp # JupyterHub
ufw enable
ufw status

Disable root SSH.
Add HTTPS with Let's Encrypt for production.
Regularly update: apt update && apt upgrade.

Resource Estimation Table

Resource	Recommendation for 20 Light Users
RAM	16GB+ (1GB/user + overhead)
CPU	4+ cores (0.5/user)
Disk	512GB+ (for notebooks/volumes)
Network	100Mbps+ for multi-user access

Troubleshooting Common Issues

Docker Fails: Check systemctl status docker.
Login Errors: Verify password in config; restart docker-compose restart.
Resource Overload: Increase limits or add swap.
Container Logs: docker logs jupyterhub.
Persistence Issues: Ensure volumes are mounted correctly.

Advanced Customizations

Ready to level up?

Custom Image: Build one with pre-installed libs (e.g., scikit-learn):

  FROM jupyter/minimal-notebook:latest
  RUN pip install pandas numpy matplotlib scikit-learn

Update c.DockerSpawner.image to your built image.

OAuth: Install jupyterhub-oauthenticator and configure for Google.
External Storage: Mount a disk to /opt/jupyterhub-dockerspawner/userdata.
Logging: Set c.JupyterHub.log_level = 'DEBUG' for user activity tracking.
Kubernetes Scaling: Migrate to Helm charts for larger classes.

If you need help with these, drop a comment!

Conclusion

This Docker-powered JupyterHub setup turns your Debian server into a robust classroom tool in minutes. It's secure, scalable, and ready for benchmarking—perfect for educators automating "everything." Try it out, and let me know how it goes in the comments. What's your favorite Jupyter trick?

Thanks for reading! If this helped, give it a ❤️ or unicorn. Follow for more sysadmin and data science tips.

DEV Community