Part 6: Hardware Engines & Contexts

#linux #graphics

After completing the previous two explorations on initialization and memory management, we have finally arrived at the most dynamic part of the i915 driver: GT (Graphics Technology).

If GEM is the GPU's "warehouse manager," then GT is the GPU's "execution factory." In this lecture, we will unveil the mystery of GPU hardware engines and understand how the driver manages multitasking parallelism through "contexts."

1. Hardware Engines: Execution Units with Clear Division of Labor

Modern Intel GPUs are not a single processor but a collection of hardware engines with clearly defined roles. Each engine has its own independent Command Streamer (CS) and can work in parallel.

In the i915 source code, an engine's identity is defined by enum intel_engine_id, with common types including:

RCS (Render Command Streamer): The most powerful engine, responsible for 3D rendering, GPGPU computing (such as OpenCL/L0), and programmable shaders.
VCS (Video Command Streamer): Also known as BSD (Bitstream Decoder), specifically responsible for video encoding and decoding (e.g., H.264, HEVC, AV1).
BCS (Blitter Command Streamer): Responsible for high-speed memory copying (we mentioned it in Lecture 4 on TTM migration).
VECS (Video Enhancement Command Streamer): Responsible for video post-processing, such as denoising and color enhancement.
CCS (Compute Command Streamer): A dedicated compute engine added in modern architectures (like Xe), stripping away 3D rendering logic and focusing solely on high-performance computing.

The driver abstracts these hardware instances through the intel_engine_cs structure, which contains the engine's state, pending request queues, and various callback functions for communicating with the hardware.

2. Logical Contexts: The GPU's "Process" Concept

Since the GPU is shared by multiple applications, we cannot allow an error in one application to crash another. To this end, i915 introduces the concept of contexts, which serve as a "process switching" mechanism within the GPU.

In i915, there are two core structures related to contexts, and their hierarchical relationship is crucial:

2.1 i915_gem_context (Software/User Perspective)

This is the structure bound to a user-space File Descriptor. It represents a logically isolated environment.

It contains the context's dedicated VM (PPGTT), ensuring address space isolation.
It records scheduling priorities, quotas, and user-defined engine mapping tables.

2.2 intel_context (Hardware/Engine Perspective)

This is an "intermediate layer" representing an instance of i915_gem_context on a specific engine.
If you have a GEM Context and submit tasks simultaneously on both RCS and VCS, there will be two different intel_context instances corresponding to them at the lower level. It holds the key resources required for hardware execution: the Ring Buffer and the Context Image.

3. Hardware Context Image

When the hardware switches between two contexts, it must save the current state of all registers so they can be restored next time. The memory area that holds this saved state is the Context Image (pointed to by ce->state).

LRC (Logical Ring Context): This is the modern context format introduced starting from Gen8 (Broadwell).
Contents: Includes general-purpose registers, instruction pointers, stack pointers, and various hardware flags.
Switching Process: When the scheduler decides to switch, the hardware automatically writes a snapshot of the current registers into the current context's Image, then loads the register values from the new context's Image. This process is highly automated at the hardware level.

4. Isolation and Security

Through the context mechanism, i915 achieves strong isolation:

Address Isolation: Each context switch also switches the corresponding PPGTT page table base address, making it impossible for Context A to access Context B's memory.
Fault Isolation: If a context submits an illegal instruction causing a GPU Hang, the driver can reset only that specific hardware context without affecting other running tasks (this is what is known as "non-destructive reset").

Summary

The GT architecture achieves GPU multitasking parallelism by dividing the hardware into multiple dedicated engines and isolating task states using intel_context. Every task submitted to the GPU is essentially a stream of commands running on a specific "engine" under the protective umbrella of a "context."

In the next lecture, we will explore how these commands are passed layer by layer from software and finally reach the hardware engine—this is the evolutionary path from Ringbuffer to GuC.