Buun ch.

Posted on Sep 8 • Edited on Oct 31

JupyterHub on Kubernetes: Secure Notebook Secrets with Vault

#kubernetes #jupyter #vault #selfhosted

In this article, we set up a multi‑user JupyterHub on a Kubernetes home lab and make it practical for day‑to‑day work. We’ll install the chart with Helm (wrapped in Just recipes), enable user profiles and custom images, connect notebooks to in‑cluster services like PostgreSQL, and manage API keys directly from notebooks using Vault with a tiny Python helper. The end result is a self‑hosted notebook platform with single sign‑on, sensible defaults, and a clean developer experience.

If you’ve followed the earlier posts in this series, you already have a k3s cluster, Keycloak for OIDC, Vault, and (optionally) Longhorn running. We’ll build on top of that foundation here.

Repository: https://github.com/buun-ch/buun-stack

Japanese(日本語)

What JupyterHub is

JupyterHub is a multi‑user gateway for Jupyter. On Kubernetes, the hub spawns one pod per user, mounts a persistent volume for their files, and proxies traffic to each user server. Authentication is pluggable; in this setup we authenticate with Keycloak over OIDC. This model gives each person an isolated, reproducible environment while keeping administration centralized.

The benefits for a small team or home lab are straightforward: there’s one place to sign in and manage access; users choose an environment that fits their work; and you keep data local with predictable performance and cost.

Installing JupyterHub with Helm

This repository ships a set of Just recipes that automate the JupyterHub installation.

Clone the repo and enter the workspace:


git clone https://github.com/buun-ch/buun-stack
cd buun-stack

Make sure you have a .env.local file with your Keycloak and Vault settings (see README.md), then run:

# Interactive install (prompts for host, optional NFS, Vault integration)
just jupyterhub::install

During installation you’ll be asked to confirm:

JupyterHub host (FQDN) used for OAuth callbacks
Whether to enable NFS PV (requires Longhorn); if yes, supply NFS IP and path
Whether to enable Vault integration for notebook secrets

If you publish the JupyterHub through Cloudflare Tunnel, add a public hostname entry.

For example:

Subdomain: jupyter
Domain: example.com
Service: https://localhost:443
Advanced options: disable TLS verification

Open the URL in your browser, sign in with Keycloak, and start a server to verify everything works.

Customization

Profiles let users pick an image and resource shape when starting their server. They’re defined in the Helm values and applied by KubeSpawner. By default, only the official “Jupyter Notebook Data Science Stack” is available, so the image puller can finish quickly. You can also enable a custom “Buun‑stack” image that includes additional libraries and the Vault integration used below.

To turn on the Buun‑stack profile, enable it and optionally the CUDA variant in .env.local:

# Enable profiles (export or put these in .env.local)
JUPYTER_PROFILE_BUUN_STACK_ENABLED=true
# Optional GPU profile
JUPYTER_PROFILE_BUUN_STACK_CUDA_ENABLED=true
# Optional: turn off the default datascience image
JUPYTER_PROFILE_DATASCIENCE_ENABLED=false

# Configure image registry and tag in .env.local as needed
IMAGE_REGISTRY=localhost:30500
JUPYTER_PYTHON_KERNEL_TAG=python-3.12-28

Then build and push the images:

# Build and push kernel images
just jupyterhub::build-kernel-images
just jupyterhub::push-kernel-images

The Buun‑stack Dockerfile bundles the Python components needed for Vault along with common data/ML packages, so users can start coding right away.

Optional: NFS‑backed storage with Longhorn

If you work with large or shared datasets, enable an NFS‑backed PersistentVolume via Longhorn and mount it into user servers. This keeps data local to your environment, reduces egress, and makes backups straightforward. You can preconfigure NFS settings and let the installer provision the PV/PVC:

export JUPYTERHUB_NFS_PV_ENABLED=true
export JUPYTER_NFS_IP=192.168.10.1
export JUPYTER_NFS_PATH=/volume1/drive1/jupyter
just jupyterhub::install

Service Integration

Because JupyterHub runs inside the cluster, notebooks can reach services over Kubernetes DNS without port‑forwarding. For example, a PostgreSQL URL might be postgresql://user:password@postgres-cluster-rw.postgres:5432/mydb. The environment variables POSTGRES_HOST and POSTGRES_PORT will be injected into the user server, allowing your notebook code to construct these URLs at runtime like this:

import os

pg_host = os.getenv('POSTGRES_HOST')
pg_port = os.getenv('POSTGRES_PORT')
pg_url = f'postgresql://user:password@{pg_host}:{pg_port}/mydb'

With DuckDB, you can attach Postgres and move data in a few lines of SQL, which is handy for ad‑hoc imports and quick queries.

>>> import duckdb

>>> def setup_duckdb_postgres():
...    con = duckdb.connect()
...    con.execute('INSTALL postgres')
...    con.execute('LOAD postgres')
...    con.execute(f"ATTACH '{pg_url}' AS pg (TYPE POSTGRES)")
...    return con

>>> con = setup_duckdb_postgres()

>>> con.execute(f"""
...     CREATE OR REPLACE TABLE pg.athlete_events AS 
...     SELECT * FROM read_csv_auto('{data_dir}/athlete_events.csv')
... """)

>>> con.execute("""
...     SELECT Sport, COUNT(*) as count
...     FROM pg.athlete_events 
...     GROUP BY Sport 
...     ORDER BY count DESC 
...     LIMIT 5
... """).df()

    Sport   count
--------------------
0   Athletics.  38624
1   Gymnastics. 26707
2   Swimming.   23195
3   Shooting.   11448
4   Cycling.    10859

Using JupyterHub and Kubernetes-internal services keeps latency low and configuration clean. It also scales to other internal services—object storage, analytics APIs, or anything else you’ve deployed in‑cluster.

Secrets with Vault

Cloud notebooks like Colab provide a simple way to fetch secrets in code. On Google Colab, you can store and retrieve secrets with a built‑in helper:

# Google Colab (example)
from google.colab import userdata
openai_api_key = userdata.get('OPENAI_API_KEY')

With plain Jupyter, the common pattern is to paste secrets at runtime using getpass, which is manual and error‑prone:

# Plain Jupyter (manual paste)
from getpass import getpass
openai_api_key = getpass('OpenAI API key: ')

We recreate the Colab‑style ergonomics in a self‑hosted way using Vault and a small Python class named SecretStore, included in the Buun‑stack image. When your server starts, JupyterHub creates a per‑user policy and token in Vault, then injects NOTEBOOK_VAULT_TOKEN and VAULT_ADDR as environment variables. SecretStore uses them under the hood and renews the token as needed during long sessions.

Here’s what it looks like with this stack:

from buunstack import SecretStore

secrets = SecretStore()
secrets.put('api-keys', openai='sk-...')
openai_api_key = secrets.get('api-keys', field='openai')

There’s no copy‑paste of tokens into cells, and each user has an isolated namespace in Vault with a full audit trail. To use this, enable Vault during install and choose one of the Buun‑stack kernels.

You can enable Vault before installation by adding to .env.local:

JUPYTERHUB_VAULT_INTEGRATION_ENABLED=true

Run the installer:

just jupyterhub::install

Or set up after installation:

just jupyterhub::setup-vault-jwt-auth

Implementation details (how SecretStore works)

Under the hood, the integration mirrors what you’d expect from a managed notebook platform, but with components you control:

Admin token supply: the hub uses a renewable Vault admin token stored at a well‑known path in Vault and fetched into the Hub via an ExternalSecret. A tiny sidecar container renews that token automatically at TTL/2 intervals so it never expires during normal operation.
Pre‑spawn user isolation: when a user starts a server, a pre‑spawn hook creates a user‑specific Vault policy and an orphan token bound to that policy. Orphan tokens aren’t limited by a parent token’s policy, which avoids inheritance issues. The token is injected into the notebook container as NOTEBOOK_VAULT_TOKEN along with VAULT_ADDR.
Per‑user namespaces: each policy constrains access to that user’s own secret namespace. Vault’s audit logs capture every access.
In‑notebook helper: SecretStore (backed by the buunstack Python package) reads the injected token and calls Vault. Before each operation it checks whether the token is valid and renewable; if the TTL is low it renews the token so long‑running sessions keep working.

User‑token renewal is implemented inside SecretStore._ensure_authenticated(): it checks the current token with lookup_self, renews it when the TTL is low and the token is renewable, and raises an error if the token is no longer valid so the user can restart their server. Admin‑token renewal is handled separately by the Hub sidecar and does not involve notebook code.

This design keeps the user experience simple (set/get in code) while providing strong boundaries between users and durable sessions without manual secret pasting. For deeper operational details—policies, ExternalSecret setup, user‑policy scope, orphan tokens, and renewal behavior—see docs/jupyterhub.md.

Advantages of self‑hosting JupyterHub

Running JupyterHub on your own Kubernetes cluster gives you control over the entire notebook experience while keeping data close to where it’s produced and used. You decide which images and libraries are available, how resources are allocated, and how authentication and secrets are handled.

Control and customization: curate images and profiles that match how your team works; adjust spawner settings, resources, and storage to your needs.
Data locality and performance: keep data on your network for lower latency and simpler compliance; tune storage, CPU/MEM, and even GPUs.
Team productivity: preinstalled tools reduce setup time; each user gets an isolated, reproducible server; services inside the cluster are reachable by simple DNS names.
Operations and security: SSO via Keycloak, per‑user isolation, Vault‑backed secrets with an audit trail, and backups/monitoring you own. For steady workloads, costs are predictable and often lower than equivalent cloud notebooks.

Wrap‑up

We built a practical, secure, multi‑user JupyterHub on Kubernetes and showed how to use it day to day:

Installed with Helm using Just recipes, including optional NFS storage and Vault integration
Enabled profiles and custom kernel images (Buun‑stack) for consistent environments
Connected notebooks to in‑cluster services (e.g., PostgreSQL) using Kubernetes DNS and env vars
Managed secrets directly from notebooks with Vault via the SecretStore helper

The result is a self‑hosted notebook platform with single sign‑on, in‑cluster integrations, and safe, ergonomic secret management—ideal for small teams, learning environments, and home labs.