Sergio Andres Usma

Posted on Apr 5

Enabling Maximum Performance Mode on NVIDIA Jetson AGX Orin 64 GB

#nvidia #jetson #agx #ai

Abstract

This document explains how to configure an NVIDIA Jetson AGX Orin 64 GB Developer Kit running Ubuntu 22.04.5 LTS and JetPack 6.2.2 to operate in maximum performance mode for AI workloads, especially LLM inference. It describes how to select the MAXN power mode, lock system clocks at their highest frequencies, and verify that the configuration is correctly applied with built-in NVIDIA tools and simple benchmarks. The tutorial targets users who want reproducible, high-throughput inference on a Jetson AGX Orin while retaining awareness of thermal and power constraints.

It documents the practical impact of enabling MAXN and jetson_clocks, showing how GPU frequency and token generation throughput can increase roughly threefold compared to default settings. The guide also covers how to persist these settings using a systemd service so that the device consistently boots into a high-performance state suitable for heavy AI workloads. Where relevant, it notes expected frequency values and normal operating temperatures for the Jetson AGX Orin platform.

The purpose of this tutorial is to serve as a reusable reference for configuring maximum performance on Jetson-based AI systems, integrated into a larger workflow that includes swap configuration and tool installation for LLM workloads. Readers with basic Linux and Jetson familiarity can follow step-by-step commands to prepare the device, validate the configuration, and understand when to switch between performance and power-saving modes.

1. Hardware and Software Environment

Your system is an NVIDIA Jetson AGX Orin Developer Kit 64 GB running Ubuntu 22.04.5 LTS (aarch64) with JetPack 6.2.2, CUDA 12.6, cuDNN 9.3.0, OpenCV 4.8.0, and TensorRT 10.3.0.30 installed. The CPU is an ARMv8 12-core processor with a maximum clock around 2.2 GHz, and the board exposes NVIDIA’s nvpmodel and jetson_clocks tools for power and clock management.

According to NVIDIA’s specifications, the Jetson AGX Orin 64 GB configuration can achieve up to 275 TOPS when configured in MAXN mode with clocks locked to their maximum frequencies. This tutorial assumes shell access with sudo privileges and that NVIDIA JetPack components are correctly installed from the nvidia-jetpack meta-package.

2. Why Maximum Performance Matters for AI

By default, without MAXN mode and jetson_clocks, the Jetson AGX Orin keeps GPU frequencies around 600 MHz to maintain thermal and power safety margins. Under these conservative defaults, a 7B LLM typically reaches only about 8 tokens per second during inference, which limits interactivity and throughput.

When MAXN and jetson_clocks are enabled, the GPU can run at approximately 1300 MHz, and end-to-end LLM inference throughput can increase to roughly 18–25 tokens per second on the same 7B model. This represents about a 3x performance improvement and makes interactive LLM usage and larger batch workloads more practical on the device.

3. Inspecting and Selecting Power Modes

Before changing anything, check the current power mode:

sudo nvpmodel -q

The command prints the active power mode, and the integer at the bottom of the output is the current mode ID (for example, 0 for MAXN). This lets you confirm whether the system already runs in MAXN or a more restrictive power profile.

To see all available power modes for the Jetson AGX Orin 64 GB under JetPack 6.2, run:

sudo nvpmodel -q --verbose | grep -A1 "MODE_NAME"

On this platform, the mode table typically looks like:

Mode ID	Name	TDP	CPU cores active	GPU max freq
0	MAXN	No limit (~60 W)	12	1300 MHz
1	MODE_50W	50 W	12	1100 MHz
2	MODE_30W	30 W	8	854 MHz
3	MODE_15W	15 W	4	612 MHz

Use mode ID 0 (MAXN) for the high-performance configuration described here.

4. Enabling MAXN Mode and Locking Clocks

To switch the Jetson into MAXN mode, run:

sudo nvpmodel -m 0

This update is written into /etc/nvpmodel.conf and therefore persists across reboots until you select a different mode. After this step, the Jetson operates under the highest power budget supported by its cooling solution, which is ideal for compute-heavy AI tasks.

Next, lock all clocks (CPU, GPU, and memory bus) to their maximum frequencies:

sudo jetson_clocks

This command is temporary and resets after each reboot, so it must be re-applied or automated to persist. Once applied, the system stops using dynamic frequency scaling and instead pins frequencies to their highest supported values for maximum compute performance.

5. Verifying Power Mode and Clock Frequencies

To confirm that MAXN is active and clocks are locked, run:

# Confirm power mode
sudo nvpmodel -q

# Check GPU and CPU frequencies
sudo jetson_clocks --show

The jetson_clocks --show output should include lines similar to:

CPU Cluster Switching: Disabled
cpu0: Online=1 Governor=schedutil MinFreq=729600 MaxFreq=2201600 CurrentFreq=2201600 ...
GPU MinFreq=306000000 MaxFreq=1300500000 CurrentFreq=1300500000
EMC MinFreq=204000000 MaxFreq=3199000000 CurrentFreq=3199000000

For a correct configuration, CurrentFreq should match MaxFreq for CPU, GPU, and EMC entries, indicating that frequencies are pinned at their maximums. If you see lower current frequencies, reapply jetson_clocks or investigate thermal throttling conditions.

6. Making jetson_clocks Persistent with systemd

To ensure jetson_clocks runs automatically at boot, first try enabling the built-in service:

sudo systemctl enable nvargus-daemon 2>/dev/null || true
sudo systemctl enable jetson_clocks 2>/dev/null || echo "Service not found, creating..."

On some JetPack versions the jetson_clocks service may not exist, in which case you can create a custom systemd unit file. The following commands define such a service, reload systemd, and enable it:

sudo tee /etc/systemd/system/jetson_clocks.service > /dev/null << 'EOF'
[Unit]
Description=Lock Jetson clocks at maximum frequency
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/bin/jetson_clocks
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable jetson_clocks
sudo systemctl start jetson_clocks
sudo systemctl status jetson_clocks

Afterward, every boot should automatically apply jetson_clocks, and systemctl status jetson_clocks should report the service as active. This eliminates the need to manually run the command after each restart while keeping the configuration transparent and reversible via systemd.

7. Quick Performance Benchmark with LLM Inference

Once MAXN and jetson_clocks are active, you can validate real-world AI performance using an LLM benchmark. If you have an Ollama container running (as configured in a later phase of your workflow), execute:

# Benchmark: time to generate 100 tokens with a 3B model
docker exec ollama ollama run llama3.2 \
  --verbose \
  "Write a 100-word story about a robot" 2>&1 | grep -E "eval rate|tokens/s"

In MAXN mode on the Jetson AGX Orin, a llama3.2 3B Q4_K_M model is expected to reach around 25–40 tokens per second, significantly higher than default power modes. If observed throughput is substantially lower, recheck power mode, clock locking, and ensure the system is not thermally throttling or swapping heavily.

8. Monitoring Power, Temperature, and Thermal Safety

While models are running, monitor system health from a second terminal:

# Option A: tegrastats (every second)
tegrastats --interval 1000

# Option B: jtop (interactive dashboard)
jtop

In tegrastats, key fields include GPU@XXX°C for GPU temperature (targeting below about 85°C under sustained load), POM_5V_GPU Xm/Ym for GPU power draw in milliwatts, and Tboard@XXX for board temperature. MAXN mode is designed to work within the active cooling capabilities of the AGX Orin module, but blocked vents or poor airflow can still cause throttling.

Typical thermal ranges under continuous AI workloads are:

Component	Normal	Throttle starts	Emergency
GPU	50–75 °C	~85 °C	~95 °C
CPU	45–70 °C	~85 °C	~95 °C
Board	40–60 °C	—	—

If tegrastats shows throttle=1, improve ventilation or reduce workload intensity until the system stabilizes.

9. Choosing Performance vs Power-Saving Modes

Depending on your workload, you may want to switch between MAXN and more efficient modes. Common scenarios include:

Situation	Recommended mode	Command
LLM inference (7B–70B)	MAXN (0)	`sudo nvpmodel -m 0 && sudo jetson_clocks`
Vision / video processing	MAXN (0)	same as above
Compiling code (e.g., LLM)	MAXN (0)	same as above
Idle / light development	MODE_30W (2)	`sudo nvpmodel -m 2`
Background low-power tasks	MODE_15W (3)	`sudo nvpmodel -m 3`

Switching modes only changes the power envelope, while jetson_clocks controls frequency locking; together they give fine-grained control over performance versus efficiency. You can integrate these commands into your own scripts to toggle modes depending on job type or time of day.

10. Practical Outcomes

MAXN mode and jetson_clocks are enabled on the Jetson AGX Orin 64 GB, with GPU, CPU, and EMC frequencies pinned at their maximums for AI workloads.
A systemd service (built-in or custom) ensures jetson_clocks runs at boot so performance is consistent across reboots.
Simple LLM benchmarks confirm real-world throughput improvements (on the order of 3x token/sec) compared to default power modes.
Continuous monitoring with tegrastats or jtop provides visibility into temperature, power draw, and potential thermal throttling.
Clear commands exist to switch between high-performance and power-saving modes depending on workload requirements.

11. Conclusion

Configuring the Jetson AGX Orin 64 GB into MAXN mode with locked clocks is a necessary step to realize the board’s full 275 TOPS potential for LLM inference and other GPU-intensive workloads. The combination of nvpmodel for power profiles and jetson_clocks for frequency locking provides deterministic performance while staying within the cooling design limits of the developer kit.

With the steps in this tutorial, you can reproducibly enable, verify, and persist maximum performance settings, then validate them using practical AI benchmarks and runtime telemetry tools. In a larger workflow, this configuration forms the foundation for subsequent tasks such as creating swap space for very large models and installing build tools for optimized inference frameworks.

DEV Community