In previous chapters, we learned how GPU engines execute commands and how drivers feed commands to the hardware through different submission methods (Ringbuffer, Execlists, GuC). However, GPU execution is highly asynchronous. When you call the execbuffer IOCTL, the CPU simply throws the task into a queue and returns immediately.
How do you know when a task has completed? How do you ensure that "Task B" starts only after "Task A" finishes? How do you get content rendered by an NVIDIA discrete GPU to display on an Intel integrated GPU (cross-driver synchronization)? This is the core subject of this lecture: the art of synchronization in i915.
1. The Asynchronous Nature of GPU Execution
In the Linux graphics stack, the CPU and GPU are decoupled parallel worlds:
- CPU: Responsible for logic control, building command sequences, and managing memory allocation.
- GPU: Responsible for heavy parallel computation.
To avoid having the CPU wait idly for the GPU to finish (which would severely degrade system responsiveness), all submission operations follow the "Fire and Forget" principle. The driver needs a mechanism to track these "in-flight" tasks. This mechanism is known in i915 as the i915_request.
2. i915_request: The Container for Asynchronous Tasks
The i915_request (often abbreviated as rq in code) is the most important tracking unit in i915. Every task sequence submitted to the hardware is wrapped in an rq.
2.1 Lifecycle
- Create: Calls
i915_request_create(). A uniqueseqno(sequence number) is allocated for the request at this point. - Await: If the current task depends on other tasks (e.g., needing to wait for a previous rendering task to finish before beginning a display write), the driver calls
i915_request_await_dma_fence(). - Emit: The driver writes the actual operational instructions into the Ringbuffer and appends a special "Breadcrumb" instruction at the end.
- Add: Calls
i915_request_add(). The task is formally handed over to the scheduler and waits for hardware execution. - Signal: When the hardware executes the "Breadcrumb" instruction, it writes its
seqnoto a specific location in memory (HWSP) and triggers a hardware interrupt. - Retire: The driver processes the interrupt, discovers the task is complete, wakes up waiting processes, and releases related resources.
2.2 The Breadcrumbs Mechanism
How does the hardware tell the software how far it has progressed? i915 uses the "Breadcrumbs" technique:
- The driver inserts an
MI_STORE_DWORD_IMMinstruction at the end of the Ringbuffer task. - This instruction causes the GPU to write the current request's
seqnointo the Hardware Status Page (HWSP). - The software simply needs to monitor the HWSP value. If
HWSP_value >= rq->seqno, it indicates the task has completed.
3. The Bridge Across Boundaries: dma_fence
While the i915_request is powerful, it is private to the i915 driver. To achieve cross-driver synchronization (e.g., using an Intel GPU for rendering and then handing off to a DisplayLink USB graphics card for display), a common synchronization language in the Linux kernel is needed: dma_fence.
3.1 What is a dma_fence?
dma_fence is a generic synchronization object defined by the kernel. It has only two states: Pending and Signaled.
In i915_request.h, you can see that the first member of struct i915_request is struct dma_fence fence. This means every i915 request is inherently also a dma_fence.
3.2 Cross-Process and Cross-Driver Synchronization
-
Sync File (Android Synchronization Model): Passes synchronization signals between processes via a file descriptor (FD). An FD can be obtained using
I915_EXEC_FENCE_OUTinexecbuffer. - DRM Syncobj (Vulkan/Modern Approach): A more flexible synchronization container that can hold multiple fences, commonly used for Vulkan Timeline Semaphores.
When a user-space application passes the FD of a sync_file to another process, the kernel automatically handles the underlying dma_fence dependencies.
4. Internal Dependency Management: sw_fence
In addition to the generic dma_fence, i915 internally uses a more lightweight i915_sw_fence. It is primarily used for internal driver pipeline orchestration.
For example, when executing execbuffer, the driver might first need to wait for memory page-in. This internal wait is implemented via a sw_fence, ensuring that hardware requests are never dispatched to the scheduler before all software prerequisites are met.
Summary
Synchronization is where bugs most easily arise in a driver, and it is also a key point for performance optimization. i915 encapsulates asynchronous tasks through i915_request, communicates with hardware using the "Breadcrumbs" instruction, and achieves seamless integration with the Linux ecosystem via the standard dma_fence interface.


Top comments (0)