DEV Community

Cover image for Calling CUDA from Go without cgo
Eitamos Ring
Eitamos Ring

Posted on

Calling CUDA from Go without cgo

I started gocudrv with one constraint:

I wanted Go code to call CUDA without making every build depend on CUDA headers, a C compiler, or cgo.

That means loading the NVIDIA driver at runtime instead of linking against CUDA at build time.

Why avoid cgo?

cgo is the normal way to call C from Go, and often the right tool. But it also makes builds heavier.

A package that uses cgo needs:

  • a C compiler
  • platform-specific toolchains
  • cross-compilers for cross-platform builds

For this project, that was exactly the setup I wanted to avoid.

The goal was a normal Go build:

CGO_ENABLED=0 go build ./...
Enter fullscreen mode Exit fullscreen mode

The binary still requires an NVIDIA driver on the machine where it runs.

It just does not require the CUDA toolkit on the machine where it is built.

Why the Driver API?

CUDA exposes two major APIs:

  • the higher-level Runtime API
  • the lower-level Driver API

gocudrv uses the Driver API because it is exposed directly by the NVIDIA driver itself:

  • libcuda.so.1 on Linux/WSL
  • nvcuda.dll on Windows

That allows the program to:

  • load the driver dynamically at startup
  • bind only the symbols it needs
  • fail gracefully if the driver is missing

The Driver API is also backward compatible, which makes it a better fit for a thin binding layer.

Where purego fits

gocudrv uses purego to open shared libraries and bind native functions without cgo.

At the top level, initialization looks pretty ordinary:

package main

import (
    "fmt"
    "log"

    "github.com/eitamring/gocudrv/cuda"
)

func main() {
    if err := cuda.Init(); err != nil {
        log.Fatal(err)
    }

    v, err := cuda.DriverVersion()
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("CUDA driver: %d.%d\n", v/1000, (v%1000)/10)
}
Enter fullscreen mode Exit fullscreen mode

Underneath that small API, the package:

  • locates the driver library
  • binds functions like cuInit and cuDriverGetVersion
  • calls cuInit(0)
  • maps CUDA result codes into Go errors

What this does not buy

Skipping cgo does not remove the C boundary.

It just makes the boundary more manual.

The library still has to define:

  • function signatures
  • pointer types
  • struct layouts
  • alignment and padding

exactly as the CUDA ABI expects them.

If a native function expects a pointer to a struct, the Go side must pass memory with the exact same layout. The compiler will not rescue a bad binding.

That tradeoff is worth it for this project, but it is still a tradeoff.

Next steps

Loading the driver is only the first step.

A machine may have:

  • zero GPUs
  • one GPU
  • several GPUs

The next step is handling devices, contexts, memory, and eventually streams and async execution cleanly from Go.

The project is still very early, but the vector-add example already works.

gocudrv on GitHub

Top comments (0)