Last year, I took a GPU programming course at Johns Hopkins University as part of my graduate studies, where I learned CUDA programming. For my final project, I built a lightweight GPU monitoring and profiling tool focused on CUDA.
I enjoyed the process so much that I decided to continue developing it beyond the course.
In this post, I’d like to briefly introduce the project:
GPU Flight — a 100% open-source GPU observability tool
GitHub: https://github.com/gpu-flight/gpufl-client
Why I Started GPU Flight
When profiling a CUDA application, you typically:
- Install profiling tools such as Nsight
- Or manually integrate CUPTI into your application, which often makes the code complex and difficult to manage
- Deal with additional complexity in cloud or containerized environments
This workflow can be inconvenient — especially in production systems.
I wanted something lighter.
Something that works more like a flight recorder for GPUs.
So I built GPU Flight.
Instead of requiring heavy tooling at runtime, GPU Flight writes structured profiling logs directly on the host machine. A separate component (GPUFL Agent) crawls these log files and forwards them to a backend service or other destinations.
This makes GPU observability more flexible and easier to integrate into distributed systems.
What is GPU Flight?
GPU Flight is designed to be lightweight and modular.
- If you only need monitoring, the overhead is minimal.
- Enabling deeper profiling provides more detailed metrics.
The goal is to expose useful GPU metrics so you can clearly understand:
- How the GPU manages resources
- How your program utilizes GPU resources
- Where performance bottlenecks occur
Project Structure
GPU Flight currently consists of several components:
1️⃣ gpufl-client
https://github.com/gpu-flight/gpufl-client
The client library that users embed into their applications for monitoring and profiling.
2️⃣ gpufl-agent
https://github.com/gpu-flight/gpufl-agent
Despite the name, this is not an AI agent 🙂
It tracks log files and forwards profiling data to the configured destination.
3️⃣ gpufl-desktop
https://github.com/gpu-flight/gpufl-desktop
Originally, I planned to build a desktop viewer.
Due to time constraints, I’m currently focusing on a web-based frontend.
Some repositories are still private because they are not yet production-ready. I plan to open them once the core functionality stabilizes.
What Metrics Does GPU Flight Support?
GPU Flight captures observability at multiple layers.
1️⃣ System & GPU Monitoring (NVML)
- Host memory usage
- GPU memory usage (used/free/total)
- GPU utilization
- Memory utilization
- Temperature
- Power consumption
- Clock speeds (GFX / SM / Memory)
- PCIe RX/TX bandwidth
- Power and thermal throttling flags
Example JSON snippet:
{
"type": "system_sample",
"util_gpu": 57,
"temp_c": 39,
"power_mw": 54415,
"clk_sm": 1740
}
2️⃣ CUDA Device Capabilities
Static architectural information:
- Compute capability
- L2 cache size
- Shared memory per block
- Registers per block
- SM count
- Warp size
3️⃣ CUDA API & Kernel Events (CUPTI)
- API enter/exit timestamps
- Kernel execution start/end timestamps
- Grid/block dimensions
- Shared memory usage
- Register usage
- Occupancy
- Correlation IDs
- Memory copy events (HtoD, DtoH)
Python Support
GPU Flight is also being extended to support Python applications that use CUDA (e.g., PyTorch).
Example:
https://github.com/gpu-flight/gpufl-client/blob/main/example/python/03_pytorch_benchmark.py
This allows profiling GPU-heavy ML workloads without deeply modifying existing code.
What’s Next?
In the next post, I’ll walk through a minimal CUDA example and show how to:
- Integrate
gpufl-client - Run a kernel
- Inspect generated profiling logs
- Interpret stall reasons and metrics
Thanks for reading — this is just the beginning
Top comments (0)