Abstract
This report documents an exploratory attempt to install and run Unsloth (including Unsloth Studio) on an NVIDIA Jetson AGX Orin 64 GB using a Docker-based workflow with dustynv/l4t-ml:r36.4.0 as the base image.
The process successfully validated GPU-accelerated PyTorch and Unsloth’s core Python package on Jetson, but exposed substantial friction and incompatibilities in getting Unsloth Studio’s full stack (Studio backend, frontend, Triton/TorchInductor/TorchAo dependencies, and custom virtual environment) to run reliably on this ARM-based edge platform.
The goal of this write-up is to provide a precise technical account so that other practitioners (and the Unsloth team) can (a) reproduce or avoid the same pitfalls, and (b) better assess the current suitability of Unsloth Studio for Jetson-class devices.
1. Hardware and Software Environment
The experiments were conducted on the following platform:
- Device: NVIDIA Jetson AGX Orin Developer Kit (64 GB)
- OS: Ubuntu 22.04.5 LTS, aarch64
- JetPack / L4T: JetPack 6.2.2, L4T 36.5.0
- CUDA: 12.6 (nvcc 12.6.68)
- cuDNN: 9.3.0
- TensorRT: 10.3.0
-
Docker: Engine with NVIDIA Container Runtime enabled (
--runtime=nvidia) -
Base ML image:
dustynv/l4t-ml:r36.4.0(from Jetson Containers), which provides:- PyTorch compiled for Jetson (aarch64) with CUDA and TensorRT integration
- JupyterLab and common ML tooling
Host-side persistent storage for this project was centralized under:
~/unsloth/
build/ # Dockerfile and build context
work/ # notebooks, datasets, outputs
cache/ # general cache inside the container
hf/ # Hugging Face cache
jupyter/ # Jupyter config
ssh/ # SSH keys/config (optional)
This layout was bind-mounted into the container to ensure persistence across container rebuilds.
2. Docker Image Construction
2.1 Base Dockerfile
The starting point was a custom image layered on top of dustynv/l4t-ml:r36.4.0:
FROM dustynv/l4t-ml:r36.4.0
ENV DEBIAN_FRONTEND=noninteractive \
PIP_NO_CACHE_DIR=1 \
PYTHONUNBUFFERED=1 \
SHELL=/bin/bash \
JUPYTER_PORT=8888 \
STUDIO_PORT=8000 \
WORKSPACE=/workspace \
HF_HOME=/workspace/.cache/huggingface \
TRANSFORMERS_CACHE=/workspace/.cache/huggingface \
HUGGINGFACE_HUB_CACHE=/workspace/.cache/huggingface
USER root
RUN apt-get update && apt-get install -y --no-install-recommends \
curl git wget ca-certificates build-essential pkg-config \
python3-pip python3-dev python3-venv \
openssh-server sudo nano htop tmux \
libopenblas-dev libssl-dev libffi-dev \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/run/sshd /workspace/work /workspace/.cache/huggingface /root/.jupyter
RUN python3 -m pip install --upgrade pip setuptools wheel
# Remove Jetson-specific custom pip indexes to avoid transient outages
RUN python3 -m pip config unset global.index-url || true && \
python3 -m pip config unset global.extra-index-url || true
# Generic Python dependencies via PyPI
RUN PIP_INDEX_URL=https://pypi.org/simple python3 -m pip install \
fastapi "uvicorn[standard]" gradio \
accelerate transformers peft trl datasets sentencepiece protobuf safetensors \
huggingface_hub
# Install Unsloth (core + zoo) from GitHub/PyPI
RUN PIP_INDEX_URL=https://pypi.org/simple python3 -m pip install \
"unsloth @ git+https://github.com/unslothai/unsloth.git" \
"unsloth-zoo @ git+https://github.com/unslothai/unsloth.git" || true
# Optionally attempt bitsandbytes (may be fragile on Jetson)
RUN PIP_INDEX_URL=https://pypi.org/simple python3 -m pip install bitsandbytes || true
WORKDIR /workspace
EXPOSE 8000 8888 22
CMD ["/bin/bash"]
Key design choices:
- Reuse NVIDIA’s
l4t-mlstack instead of installing PyTorch/TensorRT manually, since it is tuned for Jetson. - Explicitly unset custom Jetson pip indexes before installing Unsloth, to avoid failures due to unavailable Jetson-specific mirrors while installing generic packages (e.g.
fastapi). - Install Unsloth via GitHub (or PyPI) rather than using the x86-oriented Docker image
unsloth/unsloth.
The image was built with:
cd ~/unsloth/build
sudo docker build --no-cache -t local/unsloth-studio:jetson-l4tml-r36.4.0 .
3. Container Runtime and GPU Validation
A persistent container was created with host networking and bind mounts:
sudo docker run -d \
--name unsloth-studio \
--restart unless-stopped \
--runtime nvidia \
--network host \
--shm-size=16g \
-e HF_HOME=/workspace/.cache/huggingface \
-e TRANSFORMERS_CACHE=/workspace/.cache/huggingface \
-e HUGGINGFACE_HUB_CACHE=/workspace/.cache/huggingface \
-v ~/unsloth/work:/workspace/work \
-v ~/unsloth/cache:/workspace/.cache \
-v ~/unsloth/hf:/root/.cache/huggingface \
-v ~/unsloth/jupyter:/root/.jupyter \
-v ~/unsloth/ssh:/root/.ssh \
local/unsloth-studio:jetson-l4tml-r36.4.0 \
tail -f /dev/null
Inside the container, GPU support was verified with:
python3 -c "import torch;
print(torch.__version__);
print(torch.cuda.is_available());
print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'no cuda')"
This confirmed:
-
torchversion 2.6.0 (from thel4t-mlstack), - CUDA available,
- device name reported as “Orin”.
Thus, the base ML environment inside the container was correctly accelerated on Jetson.
4. Installing Unsloth Core
Within the container:
python3 -c "import unsloth; print('unsloth ok')"
unsloth --help
The CLI output showed the main Unsloth commands:
traininferenceexportlist-checkpoints-
studio(subcommand group)
However, importing Unsloth triggered a warning stacktrace related to Triton, TorchInductor, and TorchAo:
ImportError: cannot import name 'AttrsDescriptor' from triton.compiler.compiler- Errors inside
torch._inductor.runtime.hintsandtorchao.quantization
This indicates that parts of the current Unsloth stack assume a Triton/TorchInductor/TorchAo configuration aligned with x86_64 desktop/server builds of PyTorch, which is not trivially compatible with the Jetson-specific PyTorch build shipping in l4t-ml.
Despite these warnings, the CLI remained usable for basic commands, and GPU acceleration for standard PyTorch operations was intact.
5. Attempting to Enable Unsloth Studio
5.1 CLI-Level Status
The unsloth studio subcommand was present:
unsloth studio --help
showed options such as --host, --port, --frontend, and subcommands:
stopupdatereset-password
Attempting to start Studio directly:
unsloth studio --host 0.0.0.0 --port 8000
returned:
Studio not set up. Run install.sh first.
This implies that Studio expects an auxiliary installation step that sets up its environment (frontend, backend, and venv).
5.2 Running unsloth studio setup
Unsloth documentation describes a developer mode where Studio is installed via uv and a dedicated virtual environment.
Following this pattern, the command:
unsloth studio setup
produced:
- Successful installation of
nvm, Node LTS, andbun - Successful build of the frontend (“frontend built”)
- But then:
python venv not found at /root/.unsloth/studio/unsloth_studio
Run install.sh first to create the environment:
curl -fsSL https://unsloth.ai/install.sh | sh
Thus, the CLI expects a virtual environment under /root/.unsloth/studio/unsloth_studio that appears to be normally created by the official install.sh script.
5.3 Manual Creation of the Studio Virtual Environment
Rather than relying on install.sh (which is tuned for other platforms and may interfere with the Jetson-specific PyTorch/Triton stack), a manual venv was created:
cd /root/.unsloth/studio
uv venv unsloth_studio --python 3.10
source /root/.unsloth/studio/unsloth_studio/bin/activate
uv pip install --index-url https://pypi.org/simple unsloth
This installed Unsloth (and a complete stack of dependencies) into the venv, including:
-
torch,torchao,triton -
transformers,accelerate,peft,trl bitsandbytes-
unsloth,unsloth-zoo
Within the venv, unsloth studio -H 0.0.0.0 -p 8000 still failed due to missing backend dependencies (structlog), which were then installed.
However, repeated attempts to start Studio continued to reveal issues:
-
ModuleNotFoundError: No module named 'structlog'(due to pip confusion between global and venv environments) - Friction in adding pip to the venv (pip not present or not found via
python -m pip) - A recurring tension between the
uv-managed environment and the classicalpipexpectations coming from Studio’s backend modules.
Ultimately, even after installing the necessary Python packages, the CLI still treated Studio as “not set up” and insisted on running the global install.sh script.
6. Failure Modes and Root Causes
The main failure modes observed were:
-
Triton / TorchInductor / TorchAo incompatibilities
- Errors when importing Unsloth related to
AttrsDescriptorin Triton and TorchInductor. - These components are not officially supported or tuned for the Jetson-specific PyTorch build, causing runtime import and registration issues.
- Errors when importing Unsloth related to
-
Studio’s tight coupling to its own venv and installer
- Studio expects a very particular environment layout under
~/.unsloth/studio/unsloth_studiocreated byinstall.sh. - Deviating from the installer (e.g., manual or
uv-only installation) leads to missing venv markers, which the CLI interprets as “Studio not set up.”
- Studio expects a very particular environment layout under
-
Tooling friction on Jetson (uv + venv + pip)
- The combination of
uv-managed environments with a system Python and Docker base image that already has a globalpipled to situations where:- The venv had no
pipinitially. -
python -m ensurepipinstalled pip globally rather than into the venv. - The actual
pipused to install backend dependencies was the global one, leaving the venv incomplete.
- The venv had no
- The combination of
-
Mismatch with Jetson Containers philosophy
- Jetson Containers and
l4t-mlare built around Nvidia’s optimized PyTorch/TensorRT stacks, while Unsloth Studio’s modern pipeline assumes desktop/server-class Triton and TorchInductor configurations. - This leads to a mismatch that is non-trivial to reconcile in a maintainable way.
- Jetson Containers and
7. Practical Outcomes
Despite the failure to get Unsloth Studio fully operational, the following outcomes were achieved:
- A validated GPU-accelerated Unsloth core environment on Jetson:
-
unslothCLI installed and usable. - PyTorch 2.6.0 with CUDA on Orin working correctly.
-
- A reusable Docker-based ML devbox (
local/unsloth-studio:jetson-l4tml-r36.4.0) with:- A clear persistent directory layout (
~/unsloth). - Host networking and shared volumes suitable for integration with other Jetson Containers (e.g., llama.cpp, vLLM, NanoLLM, llama-factory).
- A clear persistent directory layout (
- Empirical evidence that, as of this experiment, Unsloth Studio is not yet a drop-in web UI solution for Jetson AGX Orin, due to:
- Triton/TorchInductor/TorchAo assumptions, and
- Strong coupling to the
install.sh-managed environment.
8. Recommendations for Jetson Practitioners
For current Jetson AGX Orin users:
-
Use Unsloth core selectively
- Unsloth’s Python API and CLI can still be valuable for fine-tuning/export workflows that do not rely heavily on Triton/TorchInductor-specific optimizations.
- Prefer using the Jetson-optimized PyTorch from
l4t-mland be cautious with features that depend on TorchInductor/Triton.
-
Rely on Jetson Containers for serving and fine-tuning
- For serving and fine-tuning large models on Jetson, the containers in the Jetson Containers ecosystem (llama.cpp, vLLM, MLC, TensorRT-LLM, NanoLLM, llama-factory) are significantly more mature and better integrated with JetPack and L4T.
-
Treat Unsloth Studio on Jetson as experimental
- Until there is first-class ARM/Jetson support (or a documented variant of
install.shand Studio’s backend explicitly targeting Jetson), Studio should be considered an experimental integration on this hardware.
- Until there is first-class ARM/Jetson support (or a documented variant of
9. Suggestions for the Unsloth Team
Based on this experience, the following changes would materially improve the viability of Unsloth Studio on Jetson and similar edge platforms:
-
Documented “headless / no-Triton” mode
- A configuration profile that can disable or bypass TorchInductor/Triton/TorchAo, relying purely on standard PyTorch kernels when running on unsupported architectures such as Jetson.
-
Explicit ARM/Jetson support statement and checks
- Clear statements in the documentation regarding ARM/aarch64 support status, with runtime checks that either:
- Enable a safe, reduced feature set, or
- Fail fast with a clear, actionable message.
- Clear statements in the documentation regarding ARM/aarch64 support status, with runtime checks that either:
-
Studio installation mode for preexisting Python stacks
- A variant of
install.shorstudio setupthat:- Can attach to an existing PyTorch environment (e.g., Jetson’s
l4t-ml), and - Creates only the additional Studio-specific venv/backend/frontend without attempting to reconfigure PyTorch or Triton.
- Can attach to an existing PyTorch environment (e.g., Jetson’s
- A variant of
-
Minimal dependency profile for Studio backend
- A smaller “core backend” dependency set for Studio that avoids complex quantization stacks and heavy compiler integrations when running in constrained or embedded environments.
10. Conclusion
The experiment demonstrates that:
- Installing Unsloth core on Jetson AGX Orin via a Dockerized
l4t-mlbase image is feasible, and the resulting environment is usable for GPU-accelerated LLM workflows. - However, enabling Unsloth Studio—the full web UI for training and serving—on Jetson currently encounters significant hurdles due to the interaction between Triton/TorchInductor, TorchAo,
uv-managed venvs, and the assumptions baked intoinstall.sh.
From a practical standpoint, Jetson users are better served today by combining Unsloth core (where useful) with the existing Jetson Containers ecosystem, while treating Unsloth Studio as an experimental component on this hardware.
From a community and engineering perspective, this experiment highlights concrete areas where incremental changes and documentation from the Unsloth team could unlock a powerful edge deployment story on Jetson-class devices.
Top comments (0)