Abstract
This document describes the process of creating, tuning, and managing a large swap file on an NVIDIA Jetson AGX Orin 64 GB running Ubuntu 22.04.5 LTS aarch64. The configuration is specifically optimized for running large language models (LLMs) alongside CUDA, cuMB, and TensorRT by leveraging a fast NVMe SSD as the primary swap backing store.
The implementation was validated using a 50 GB swap file configuration alongside existing zram layers. The procedure successfully extended the usable memory capacity, allowing for the deployment of larger models without triggering immediate Out-Of-Memory (OOM) errors, provided the storage-to-RAM paging latency is acceptable.
This tutorial serves as a technical reference for advanced Jetson and Linux users. It provides a reproducible method for extending virtual memory on edge AI hardware to support demanding 34B–70B parameter models.
1. Hardware and Software Environment
The target environment is an NVIDIA Jetson AGX Orin Developer Kit equipped with 64 GB of unified memory. The system runs Ubuntu 22.04.5 LTS on an aarch64 kernel (5.15.185-tegra). The installation includes JetPack 6.2.2, providing the necessary software stack for AI inference, including CUDA 12.6, cuDNN 9.3.0, and TensorRT 10.3.0.
The primary storage for the swap file is the NVMe SSD, which serves as the root filesystem. This choice is critical for minimizing the performance penalty during memory paging operations.
| Component | Detail |
|---|---|
| Hardware | NVIDIA Jetson AGX Orin Developer Kit 64 GB |
| OS | Ubuntu 22.04.5 LTS aarch64 |
| Kernel | 5:15.185-tegra |
| RAM | 64 GB unified memory |
| JetPack | 6.2.2+b24 (nvidia-jetpack) |
| CUDA | 12.6 (nvcc 12.6.68) |
| cuDNN | 9.3.0 |
| TensorRT | 10.3.0.30-1+cuda12.5 |
Table 1 — Jetson AGX Orin environment for swap configuration
2. Swap Location Strategy
Effective swap placement is determined by the throughput and endurance of the underlying storage media. On the Jetson AGX Orin, the system utilizes eMMC for the boot partition and an NVMe SSD for the primary root filesystem.
| Storage | Approx Speed | Recommendation |
|---|---|---|
| NVMe SSD | ~2000 MB/s | Best — primary location for swap |
| eMMC | ~400 MB/s | Secondary fallback; higher wear risk |
| USB Drive | ~100 MB/s | Not recommended due to high latency |
Table 2 — Recommended swap backing storage on Jetson AGX Orin
For this configuration, the swap file is placed directly on the NVMe-backed root filesystem (/) at /swapfile. This ensures the highest possible I/O performance for paging operations.
3. Step-by-Step Swap File Creation
The following steps outline the allocation and initialization of a 50 GB swap file.
3.1 Check Devices and Free Space
Before allocation, verify the available space on the target partition. The lsblk command confirms the mount points, while df -h verifies the capacity.
# List block devices and mount points
lsblk -o NAME,SIZE,TYPE,MOSQL,ROTA
# Check free space on the root filesystem
df -h /
The current configuration shows approximately 636 GB of available space on /dev/nvme0n1p1, which is more than sufficient for a 50 GB allocation.
3.2 Create the Swap File
The fallocate utility is used to pre-allocate the file space efficiently.
# Allocate 50 GB for the swap file on the root filesystem
sudo fallocate -l 50G /swapfile
3.3 Secure and Format the Swap File
Security is paramount; the swap file must be restricted to root-only access to prevent sensitive data leakage from memory to disk.
# Restrict permissions to root read/write only
sudo chmod 600 /swapfile
# Format the file as swap space
sudo mkswap /swapfile
3.4 Enable the Swap File
Once formatted, the swap file must be activated in the running kernel.
# Enable the swap file
sudo swapon /swapfile
# Verify active swap devices
swapon --show
# Confirm memory and swap totals
free -h
4. Making Swap Persistent Across Reboots
To ensure the swap file is automatically re-enabled upon system restart, an entry must be added to the /etc/fstab configuration file.
# Append the swap file definition to /etc/fstab
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
# Verify the entry exists
grep swap /etc/fstab
5. Tuning Swappiness and zram for LLM Workloads
Optimal performance for LLM inference requires tuning the kernel to prioritize physical RAM and the compressed zram layer over the disk-backed swap file.
5.1 Adjust Swappiness and Cache Pressure
Lowering the swappiness value instructs the kernel to avoid swapping pages to the NVMe SSD unless absolutely necessary.
# Apply settings immediately
sudo sysctl vm.swappiness=10
sudo sysctl vm.vfs_cache_pressure=50
# Persist the settings across reboots
echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf
echo 'vm.vfs_cache_pressure=50' | sudo tee -a /etc/sysctl.conf
# Reload sysctl configuration
sudo sysctl -p
| Swappiness | Behavior Description |
|---|---|
| 0 | Swap only when absolutely out of RAM |
| 10 | Recommended for LLM workloads |
| 60 | Typical Linux default |
| 100 | Very aggressive swapping |
Table 3 — Swappiness values and behavior for Jetson LLM use
6. Relationship Between zram and /swapfile
The Jetson system utilizes a tiered memory architecture. The zram-config service provides several compressed RAM-based swap devices (zram0 through zram11). The hierarchy of memory allocation is as follows:
- Physical RAM (64 GB unified memory)
- zram (Compressed swap in RAM, ~31 GB total)
-
NVMe Swap File (50 GB on
/swapfile)
This tiered approach allows the kernel to handle small, compressible allocations within the highly efficient zram layer before resorting to the higher-latency NVMe disk-backed swap.
7. Removing or Reconfiguring the Swap File
If disk space needs to be reclaimed, the swap file can be decommissioned following these steps:
# Disable the swap file usage
sudo swapoff /swapfile
# Remove the entry from /etc/fstab
sudo sed -i '/\/swapfile/d' /etc/fstab
# Delete the physical file
sudo rm /swapfile
# Reload sysctl to refresh kernel state
sudo sysctl -p
8. Practical Outcomes
- Increased Capacity: Successfully established a 50 GB swap area on NVMe, expanding the total virtual memory capacity.
- Stability: Provided a critical safety margin for running 70B parameter models (e.g., Q4_K_M) that may exceed the 64 GB physical RAM limit during peak usage.
-
Optimized Hierarchy: Integrated the new disk-backed swap into the existing
zramarchitecture without disrupting the compressed RAM layer. -
Persistence: Achieved a fully automated configuration that survives system reboots via
/etc/fstabtuning.
9. Conclusions
Configuring a large, NVMe-backed swap file is a highly effective strategy for maximizing the utility of the NVIDIA Jetson AGX Orin 64 GB for large-scale AI workloads. By following the documented procedure of using fallocate, setting strict chmod 600 permissions, and tuning swappiness to 10, users can achieve a stable environment capable of handling models that exceed physical memory boundaries.
While the performance penalty of disk-based swapping is unavoidable, the use of high-speed NVMe storage and a tiered zram approach minimizes the impact on inference latency, making it a viable solution for non-interactive or batch processing of 34B–70B parameter models.
Top comments (0)