DEV Community

Myoungho Shin
Myoungho Shin

Posted on

Profiling GPU (CUDA) — Introducing GPU Flight

Last year, I took a GPU programming course at Johns Hopkins University as part of my graduate studies, where I learned CUDA programming. For my final project, I built a lightweight GPU monitoring and profiling tool focused on CUDA.

I enjoyed the process so much that I decided to continue developing it beyond the course.

In this post, I’d like to briefly introduce the project:

GPU Flight — a 100% open-source GPU observability tool

GitHub: https://github.com/gpu-flight/gpufl-client


Why I Started GPU Flight

When profiling a CUDA application, you typically:

  • Install profiling tools such as Nsight
  • Or manually integrate CUPTI into your application, which often makes the code complex and difficult to manage
  • Deal with additional complexity in cloud or containerized environments

This workflow can be inconvenient — especially in production systems.

I wanted something lighter.

Something that works more like a flight recorder for GPUs.

So I built GPU Flight.

Instead of requiring heavy tooling at runtime, GPU Flight writes structured profiling logs directly on the host machine. A separate component (GPUFL Agent) crawls these log files and forwards them to a backend service or other destinations.

This makes GPU observability more flexible and easier to integrate into distributed systems.


What is GPU Flight?

GPU Flight is designed to be lightweight and modular.

  • If you only need monitoring, the overhead is minimal.
  • Enabling deeper profiling provides more detailed metrics.

The goal is to expose useful GPU metrics so you can clearly understand:

  • How the GPU manages resources
  • How your program utilizes GPU resources
  • Where performance bottlenecks occur

Project Structure

GPU Flight currently consists of several components:

1️⃣ gpufl-client

https://github.com/gpu-flight/gpufl-client

The client library that users embed into their applications for monitoring and profiling.


2️⃣ gpufl-agent

https://github.com/gpu-flight/gpufl-agent

Despite the name, this is not an AI agent 🙂

It tracks log files and forwards profiling data to the configured destination.


3️⃣ gpufl-desktop

https://github.com/gpu-flight/gpufl-desktop

Originally, I planned to build a desktop viewer.

Due to time constraints, I’m currently focusing on a web-based frontend.

Some repositories are still private because they are not yet production-ready. I plan to open them once the core functionality stabilizes.


What Metrics Does GPU Flight Support?

GPU Flight captures observability at multiple layers.

1️⃣ System & GPU Monitoring (NVML)

  • Host memory usage
  • GPU memory usage (used/free/total)
  • GPU utilization
  • Memory utilization
  • Temperature
  • Power consumption
  • Clock speeds (GFX / SM / Memory)
  • PCIe RX/TX bandwidth
  • Power and thermal throttling flags

Example JSON snippet:

{
  "type": "system_sample",
  "util_gpu": 57,
  "temp_c": 39,
  "power_mw": 54415,
  "clk_sm": 1740
}
Enter fullscreen mode Exit fullscreen mode

2️⃣ CUDA Device Capabilities

Static architectural information:

  • Compute capability
  • L2 cache size
  • Shared memory per block
  • Registers per block
  • SM count
  • Warp size

3️⃣ CUDA API & Kernel Events (CUPTI)

  • API enter/exit timestamps
  • Kernel execution start/end timestamps
  • Grid/block dimensions
  • Shared memory usage
  • Register usage
  • Occupancy
  • Correlation IDs
  • Memory copy events (HtoD, DtoH)

Python Support

GPU Flight is also being extended to support Python applications that use CUDA (e.g., PyTorch).

Example:

https://github.com/gpu-flight/gpufl-client/blob/main/example/python/03_pytorch_benchmark.py

This allows profiling GPU-heavy ML workloads without deeply modifying existing code.


What’s Next?

In the next post, I’ll walk through a minimal CUDA example and show how to:

  • Integrate gpufl-client
  • Run a kernel
  • Inspect generated profiling logs
  • Interpret stall reasons and metrics

Thanks for reading — this is just the beginning

Top comments (0)