DEV Community

Ajeet Singh Raina
Ajeet Singh Raina

Posted on • Edited on

Install CUDA on NVIDIA Jetson Nano

Hardware Pre-requisite

  • Jetson Nano
  • A 5V 4Ampere Charger
  • 64GB SD card

Software

  1. Preparing Your Raspberry Pi Flashing Jetson SD Card Image
  • Unzip the SD card image
  • Insert SD card into your system.
  • Bring up Etcher tool and select the target SD card to which you want to flash the image.

Image1

Verifying the device

sudo lshw -C system
pico2                       
    description: Computer
    product: NVIDIA Jetson Nano Developer Kit
    serial: 1422919082257
    width: 64 bits
    capabilities: smp cp15_barrier setend swp
Enter fullscreen mode Exit fullscreen mode

DeviceQuery

/usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
ajeetraina@ajeetraina-desktop:/usr/local/cuda/samples/1_Utilities/deviceQuery$ sudo make
/usr/local/cuda-10.2/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery.o -c deviceQuery.cpp
/usr/local/cuda-10.2/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/aarch64/linux/release
cp deviceQuery ../../bin/aarch64/linux/release
ajeetraina@ajeetraina-desktop:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ls
Makefile  NsightEclipse.xml  deviceQuery  deviceQuery.cpp  deviceQuery.o  readme.txt
ajeetraina@ajeetraina-desktop:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3956 MBytes (4148387840 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
Enter fullscreen mode Exit fullscreen mode
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/sbsa/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda-repo-ubuntu1804-11-3-local_11.3.1-465.19.01-1_arm64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-3-local_11.3.1-465.19.01-1_arm64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-3-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
Enter fullscreen mode Exit fullscreen mode

In case you face any dependency issue while executing the last command, run the below fix:

sudo apt-get -o dpkg::Options::="--force-overwrite" install --fix-broken
Enter fullscreen mode Exit fullscreen mode
Setting up libcublas-11-3 (11.5.1.109-1) ...
Setting up nvidia-utils-465 (465.19.01-0ubuntu1) ...
Setting up nvidia-dkms-465 (465.19.01-0ubuntu1) ...
update-initramfs: deferring update (trigger activated)

A modprobe blacklist file has been created at /etc/modprobe.d to prevent Nouveau
from loading. This can be reverted by deleting the following file:
/etc/modprobe.d/nvidia-graphics-drivers.conf

A new initrd image has also been created. To revert, please regenerate your
initrd by running the following command after deleting the modprobe.d file:
`/usr/sbin/initramfs -u`

*****************************************************************************
*** Reboot your computer and verify that the NVIDIA graphics driver can   ***
*** be loaded.                                                            ***
*****************************************************************************

INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
Loading new nvidia-465.19.01 DKMS files...
It is likely that 4.9.253-tegra belongs to a chroot's host
Building for 4.15.0-166-generic and 4.9.253-tegra
Building for architecture arm64
Building initial module for 4.15.0-166-generic
Enter fullscreen mode Exit fullscreen mode

Verifying

 dpkg -l | grep cuda 
ii  cuda                                       11.3.1-1                                         arm64        CUDA meta-package
ii  cuda-11-3                                  11.3.1-1                                         arm64        CUDA 11.3 meta-package
ii  cuda-command-line-tools-10-2               10.2.460-1                                       arm64        CUDA command-line tools
ii  cuda-command-line-tools-11-3               11.3.1-1                                         arm64        CUDA command-line tools
ii  cuda-compiler-10-2                         10.2.460-1                                       arm64        CUDA compiler
ii  cuda-compiler-11-3                         11.3.1-1                                         arm64        CUDA compiler
ii  cuda-cudart-10-2                           10.2.300-1                                       arm64        CUDA Runtime native Libraries
ii  cuda-cudart-11-0                           11.0.194-1                                       arm64        CUDA Runtime native Libraries
ii  cuda-cudart-11-3                           11.3.109-1                                       arm64        CUDA Runtime native Libraries
ii  cuda-cudart-dev-10-2                       10.2.300-1                                       arm64        CUDA Runtime native dev links, headers
ii  cuda-cudart-dev-11-3                       11.3.109-1                                       arm64        CUDA Runtime native dev links, headers
ii  cuda-cuobjdump-10-2                        10.2.300-1                                       arm64        CUDA cuobjdump
ii  cuda-cuobjdump-11-3                        11.3.58-1                                        arm64        CUDA cuobjdump
ii  cuda-cupti-10-2                            10.2.300-1                                       arm64        CUDA profiling tools runtime libs.
ii  cuda-cupti-11-3                            11.3.111-1                                       arm64        CUDA profiling tools runtime libs.
ii  cuda-cupti-dev-10-2                        10.2.300-1                                       arm64        CUDA profiling tools interface.
ii  cuda-cupti-dev-11-3                        11.3.111-1                                       arm64        CUDA profiling tools interface.
ii  cuda-cuxxfilt-11-3                         11.3.58-1                                        arm64        CUDA cuxxfilt
ii  cuda-documentation-10-2                    10.2.300-1                                       arm64        CUDA documentation
ii  cuda-documentation-11-3                    11.3.111-1                                       arm64        CUDA documentation
ii  cuda-driver-dev-10-2                       10.2.300-1                                       arm64        CUDA Driver native dev stub library
ii  cuda-driver-dev-11-3                       11.3.109-1                                       arm64        CUDA Driver native dev stub library
ii  cuda-drivers                               465.19.01-1                                      arm64        CUDA Driver meta-package, branch-agnostic
ii  cuda-drivers-465                           465.19.01-1                                      arm64        CUDA Driver meta-package, branch-specific
ii  cuda-gdb-10-2                              10.2.300-1                                       arm64        CUDA-GDB
ii  cuda-gdb-11-3                              11.3.109-1                                       arm64        CUDA-GDB
ii  cuda-libraries-10-2                        10.2.460-1                                       arm64        CUDA Libraries 10.2 meta-package
ii  cuda-libraries-11-3                        11.3.1-1                                         arm64        CUDA Libraries 11.3 meta-package
ii  cuda-libraries-dev-10-2                    10.2.460-1                                       arm64        CUDA Libraries 10.2 development meta-package
ii  cuda-libraries-dev-11-3                    11.3.1-1                                         arm64        CUDA Libraries 11.3 development meta-package
ii  cuda-memcheck-10-2                         10.2.300-1                                       arm64        CUDA-MEMCHECK
ii  cuda-nsight-compute-11-3                   11.3.1-1                                         arm64        NVIDIA Nsight Compute
ii  cuda-nsight-systems-11-3                   11.3.1-1                                         arm64        NVIDIA Nsight Systems
ii  cuda-nvcc-10-2                             10.2.300-1                                       arm64        CUDA nvcc
ii  cuda-nvcc-11-3                             11.3.109-1                                       arm64        CUDA nvcc
ii  cuda-nvdisasm-10-2                         10.2.300-1                                       arm64        CUDA disassembler
ii  cuda-nvdisasm-11-3                         11.3.58-1                                        arm64        CUDA disassembler
ii  cuda-nvgraph-10-2                          10.2.300-1                                       arm64        NVGRAPH native runtime libraries
ii  cuda-nvgraph-dev-10-2                      10.2.300-1                                       arm64        NVGRAPH native dev links, headers
ii  cuda-nvml-dev-10-2                         10.2.300-1                                       arm64        NVML native dev links, headers
ii  cuda-nvml-dev-11-3                         11.3.58-1                                        arm64        NVML native dev links, headers
ii  cuda-nvprof-10-2                           10.2.300-1                                       arm64        CUDA Profiler tools
ii  cuda-nvprof-11-3                           11.3.111-1                                       arm64        CUDA Profiler tools
ii  cuda-nvprune-10-2                          10.2.300-1                                       arm64        CUDA nvprune
ii  cuda-nvprune-11-3                          11.3.58-1                                        arm64        CUDA nvprune
ii  cuda-nvrtc-10-2                            10.2.300-1                                       arm64        NVRTC native runtime libraries
ii  cuda-nvrtc-11-3                            11.3.109-1                                       arm64        NVRTC native runtime libraries
ii  cuda-nvrtc-dev-10-2                        10.2.300-1                                       arm64        NVRTC native dev links, headers
ii  cuda-nvrtc-dev-11-3                        11.3.109-1                                       arm64        NVRTC native dev links, headers
ii  cuda-nvtx-10-2                             10.2.300-1                                       arm64        NVIDIA Tools Extension
ii  cuda-nvtx-11-3                             11.3.109-1                                       arm64        NVIDIA Tools Extension
ii  cuda-repo-ubuntu1804-11-0-local            11.0.2-450.51.05-1                               arm64        cuda repository configuration files
ii  cuda-repo-ubuntu1804-11-3-local            11.3.1-465.19.01-1                               arm64        cuda repository configuration files
ii  cuda-runtime-11-3                          11.3.1-1                                         arm64        CUDA Runtime 11.3 meta-package
ii  cuda-samples-10-2                          10.2.300-1                                       arm64        CUDA example applications
ii  cuda-samples-11-3                          11.3.58-1                                        arm64        CUDA example applications
ii  cuda-sanitizer-11-3                        11.3.111-1                                       arm64        CUDA Sanitizer
ii  cuda-thrust-11-3                           11.3.109-1                                       arm64        CUDA Thrust
ii  cuda-toolkit-10-2                          10.2.460-1                                       arm64        CUDA Toolkit 10.2 meta-package
rc  cuda-toolkit-11-0                          11.0.2-1                                         arm64        CUDA Toolkit 11.0 meta-package
ii  cuda-toolkit-11-3                          11.3.1-1                                         arm64        CUDA Toolkit 11.3 meta-package
ii  cuda-toolkit-11-3-config-common            11.3.109-1                                       all          Common config package for CUDA Toolkit 11.3.
ii  cuda-toolkit-11-config-common              11.3.109-1                                       all          Common config package for CUDA Toolkit 11.
ii  cuda-toolkit-config-common                 11.3.109-1                                       all          Common config package for CUDA Toolkit.
ii  cuda-tools-10-2                            10.2.460-1                                       arm64        CUDA Tools meta-package
ii  cuda-tools-11-3                            11.3.1-1                                         arm64        CUDA Tools meta-package
ii  cuda-visual-tools-10-2                     10.2.460-1                                       arm64        CUDA visual tools
rc  cuda-visual-tools-11-0                     11.0.2-1                                         arm64        CUDA visual tools
ii  cuda-visual-tools-11-3                     11.3.1-1                                         arm64        CUDA visual tools
ii  graphsurgeon-tf                            8.0.1-1+cuda10.2                                 arm64        GraphSurgeon for TensorRT package
ii  libcudnn8                                  8.2.1.32-1+cuda10.2                              arm64        cuDNN runtime libraries
ii  libcudnn8-dev                              8.2.1.32-1+cuda10.2                              arm64        cuDNN development libraries and headers
ii  libcudnn8-samples                          8.2.1.32-1+cuda10.2                              arm64        cuDNN documents and samples
ii  libnvinfer-bin                             8.0.1-1+cuda10.2                                 arm64        TensorRT binaries
ii  libnvinfer-dev                             8.0.1-1+cuda10.2                                 arm64        TensorRT development libraries and headers
ii  libnvinfer-doc                             8.0.1-1+cuda10.2                                 all          TensorRT documentation
ii  libnvinfer-plugin-dev                      8.0.1-1+cuda10.2                                 arm64        TensorRT plugin libraries
ii  libnvinfer-plugin8                         8.0.1-1+cuda10.2                                 arm64        TensorRT plugin libraries
ii  libnvinfer-samples                         8.0.1-1+cuda10.2                                 all          TensorRT samples
ii  libnvinfer8                                8.0.1-1+cuda10.2                                 arm64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                       8.0.1-1+cuda10.2                                 arm64        TensorRT ONNX libraries
ii  libnvonnxparsers8                          8.0.1-1+cuda10.2                                 arm64        TensorRT ONNX libraries
ii  libnvparsers-dev                           8.0.1-1+cuda10.2                                 arm64        TensorRT parsers libraries
ii  libnvparsers8                              8.0.1-1+cuda10.2                                 arm64        TensorRT parsers libraries
ii  nvidia-container-csv-cuda                  10.2.460-1                                       arm64        Jetpack CUDA CSV file
ii  nvidia-container-csv-cudnn                 8.2.1.32-1+cuda10.2                              arm64        Jetpack CUDNN CSV file
ii  nvidia-container-csv-tensorrt              8.0.1.6-1+cuda10.2                               arm64        Jetpack TensorRT CSV file
ii  nvidia-l4t-cuda                            32.6.1-20210916211029                            arm64        NVIDIA CUDA Package
ii  python3-libnvinfer                         8.0.1-1+cuda10.2                                 arm64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                     8.0.1-1+cuda10.2                                 arm64        Python 3 development package for TensorRT
ii  tensorrt                                   8.0.1.6-1+cuda10.2                               arm64        Meta package of TensorRT
ii  uff-converter-tf                           8.0.1-1+cuda10.2                                 arm64        UFF converter for TensorRT package
ajeetraina@ajeetraina-desktop:~$ 
Enter fullscreen mode Exit fullscreen mode

Setting up PATH variable

cd /usr/local/cuda
cuda/      cuda-10/   cuda-10.2/ cuda-11/   cuda-11.0/ cuda-11.3/ 
ajeetraina@ajeetraina-desktop:~$ cd /usr/local/cuda^C
ajeetraina@ajeetraina-desktop:~$ export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}
Enter fullscreen mode Exit fullscreen mode
ajeetraina@ajeetraina-desktop:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:10_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
ajeetraina@ajeetraina-desktop:~$ 
Enter fullscreen mode Exit fullscreen mode

Top comments (3)

Collapse
 
yankoaleksandrov profile image
Yanko Alexandrov

Still a useful reference, Ajeet - the deviceQuery verification step especially. The Jetson Nano CUDA 10.2 + TensorRT 8 stack described here is battle-tested.

For anyone on a newer Jetson Orin Nano: the process is broadly similar via JetPack, but key differences:

  • CUDA is bundled in JetPack (no manual wget needed)
  • The dpkg::Options force-overwrite fix is still sometimes needed when mixing JetPack versions
  • nvcc --version path is the same sanity check
  • On Orin Nano, also run tegrastats to baseline thermal/power before loading heavy CUDA workloads

We run OpenClaw on the Jetson Orin Nano and the CUDA stack enables local inference (llama.cpp with CUDA backend). The jump from the original Nano (128 CUDA cores, 4GB) to Orin Nano (1024 CUDA cores, 8GB) is genuinely dramatic.

Saw your NanoClaw article too - great to see OpenClaw in the Docker ecosystem!

Collapse
 
byerra profile image
ByerRA

Thanks, this is exactly what I was looking for as I'm in the process of creating a custom image for my Jetson Nano cluster and like to pick what I want loaded instead of using the Jetson JetPack where you can't really decide on what will or won't be installed.

Collapse
 
anton_7c25b7a57b5b2aa830e profile image
Anton DS

Ajeet, you saving my days, thank you, this what I was looking for, finally I successed install cuda on my jetson nano