Calling CUDA from Go without cgo

#ai #softwareengineering #go #cuda

I started gocudrv with one constraint:

I wanted Go code to call CUDA without making every build depend on CUDA headers, a C compiler, or cgo.

That means loading the NVIDIA driver at runtime instead of linking against CUDA at build time.

Why avoid cgo?

cgo is the normal way to call C from Go, and often the right tool. But it also makes builds heavier.

A package that uses cgo needs:

a C compiler
platform-specific toolchains
cross-compilers for cross-platform builds

For this project, that was exactly the setup I wanted to avoid.

The goal was a normal Go build:

CGO_ENABLED=0 go build ./...

The binary still requires an NVIDIA driver on the machine where it runs.

It just does not require the CUDA toolkit on the machine where it is built.

Why the Driver API?

CUDA exposes two major APIs:

the higher-level Runtime API
the lower-level Driver API

gocudrv uses the Driver API because it is exposed directly by the NVIDIA driver itself:

libcuda.so.1 on Linux/WSL
nvcuda.dll on Windows

That allows the program to:

load the driver dynamically at startup
bind only the symbols it needs
fail gracefully if the driver is missing

The Driver API is also backward compatible, which makes it a better fit for a thin binding layer.

Where purego fits

gocudrv uses purego to open shared libraries and bind native functions without cgo.

At the top level, initialization looks pretty ordinary:

package main

import (
    "fmt"
    "log"

    "github.com/eitamring/gocudrv/cuda"
)

func main() {
    if err := cuda.Init(); err != nil {
        log.Fatal(err)
    }

    v, err := cuda.DriverVersion()
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("CUDA driver: %d.%d\n", v/1000, (v%1000)/10)
}

Underneath that small API, the package:

locates the driver library
binds functions like cuInit and cuDriverGetVersion
calls cuInit(0)
maps CUDA result codes into Go errors

What this does not buy

Skipping cgo does not remove the C boundary.

It just makes the boundary more manual.

The library still has to define:

function signatures
pointer types
struct layouts
alignment and padding

exactly as the CUDA ABI expects them.

If a native function expects a pointer to a struct, the Go side must pass memory with the exact same layout. The compiler will not rescue a bad binding.

That tradeoff is worth it for this project, but it is still a tradeoff.