In the previous lecture, we explored how the i915 driver initializes the GPU and establishes a communication bridge with the hardware. Now, we dive into one of the most critical components of i915 — Video Memory Management.
In modern graphics drivers, everything — textures, vertex data, framebuffers, and GPU command buffers — is abstracted as a block of memory at the lowest level. Managing these massive, variably sized memory blocks that are frequently shared between the CPU and GPU is key to driver performance. This is the responsibility of the GEM (Graphics Execution Manager).
1. What is GEM (Graphics Execution Manager)?
GEM was originally proposed and introduced into the Linux kernel by Intel to address the chaotic memory management in the early DRI (Direct Rendering Infrastructure) architecture.
The core philosophy of GEM is: Abstract all GPU memory allocation and management into "Objects."
Userspace programs (e.g., OpenGL/Vulkan drivers) request memory allocation from the kernel via IOCTL, and the kernel returns a handle (typically a 32-bit integer). Userspace programs can then use this handle for mapping (mmap), data read/write (pread/pwrite), and submission to the GPU for execution.
GEM is responsible not only for memory allocation but, more importantly, for managing:
- Lifecycle: Reference counting.
- Synchronization: Preventing data races when the CPU and GPU access the same memory concurrently (via Domain and Fence mechanisms).
- Cache Coherency: Handling consistency between CPU caches and GPU caches.
2. Core Data Structure: Deep Dive into drm_i915_gem_object
In the i915 driver source, all GEM objects are represented by struct drm_i915_gem_object. Its definition resides in i915/gem/i915_gem_object_types.h. This structure is extremely large, encompassing all the state required for an object's lifecycle.
We can break it down into several core parts:
2.1 Base and Polymorphic Abstraction
struct drm_i915_gem_object {
union {
struct drm_gem_object base;
struct ttm_buffer_object __do_not_access;
};
const struct drm_i915_gem_object_ops *ops;
// ...
};
-
base: Inherits from the DRM core layer's genericdrm_gem_object, which contains basic information such as reference count, object size, and file handle. The i915 GEM object is a superset of the DRM GEM object. -
ops: Vtable of operations (Object-Oriented Design). It defines interfaces likeget_pages(),put_pages(),shrink(), etc., allowing the same GEM logic to interface with different underlying storage backends.
2.2 Memory Management (The mm struct)
struct {
atomic_t pages_pin_count;
struct intel_memory_region *region;
struct sg_table *pages;
struct list_head link; // Used for LRU and Shrinker lists
// ...
} mm;
An object's physical pages are not allocated at creation time. Instead, they are lazily allocated via ops->get_pages() the first time they are truly needed (e.g., CPU access or binding to a GPU address space). The allocated physical pages are organized into a Scatter-Gather table (sg_table). pages_pin_count records how many times these pages have been pinned, preventing the kernel's memory reclamation mechanism (Shrinker) from swapping them out.
2.3 Cache & Coherency
unsigned int pat_index:6;
unsigned int cache_coherent:2;
unsigned int cache_dirty:1;
Due to Intel's complex SoC architecture — sometimes featuring a shared LLC (Last Level Cache), other times featuring dedicated discrete memory (LMEM) — the driver must precisely track whether a block of memory has become "dirty" in the CPU cache (cache_dirty). Before submitting it to a non-coherent display engine for scanout, a clflush operation must be performed.
2.4 Virtual Address Binding (VMA Tracking)
struct {
struct list_head list;
struct rb_root tree;
} vma;
A single physical GEM object can be mapped into multiple different GPU virtual address spaces (PPGTT). The vma structure tracks which address spaces this object is bound into, enabling rapid unbinding when the object is freed or migrated.
3. Backend Memory Abstraction: A Diverse Array of Physical Foundations
To adapt to different application scenarios, i915 implements a highly modular object system. Although the upper layers always see a drm_i915_gem_object, its actual physical storage medium (Backend) comes in several implementations, differentiated by the ops pointer:
3.1 shmem (Shared Memory)
This is the most common and widely used backend.
-
Implementation:
i915_gem_shmem_ops(i915/gem/i915_gem_shmem.c). -
Characteristics: Objects are backed by the system's page cache (anonymous shared memory,
tmpfs). - Advantages: When system memory is under pressure, this video memory can be gracefully swapped out to a disk swap partition by the Linux kernel. When the GPU needs to access it again, a page fault is triggered and it is swapped back in. Most ordinary textures and buffers use this method.
3.2 phys (Physical Contiguous Memory)
-
Implementation:
i915_gem_phys.c. - Characteristics: Allocates physically contiguous system memory.
- Use Cases: Typically used for older graphics engines whose hardware design does not support Scatter-Gather addressing, or for specific hardware pipelines that strictly require physically contiguous addresses.
3.3 stolen (Stolen Memory)
-
Implementation:
i915_gem_object_stolen_ops(i915/gem/i915_gem_stolen.c). - What is Stolen Memory?: During system boot, the BIOS "steals" (reserves) a fixed-size block of memory from the total system RAM, dedicating it exclusively to the GPU. This memory is invisible to the Linux system.
- Use Cases: This memory is typically used for allocating framebuffers, cursor images, and early boot splash screens. Because the physical address of this memory is determined early in hardware power-up, it is highly suitable for direct scanout by the display engine.
3.4 Modern Expansion: TTM and Local Memory
With the advent of Intel's high-performance discrete GPUs (like DG2, featuring dedicated VRAM/Local Memory), management based purely on shmem is no longer sufficient. i915 introduced the TTM memory manager to manage this local video memory located on the other side of the PCIe bus. You will notice a ttm_buffer_object cleverly embedded within drm_i915_gem_object, which we will explore in detail in the next lecture.
Summary
The emergence of GEM standardized chaotic video memory management into a rigorous object-oriented model. Through drm_i915_gem_object, the i915 driver not only hides the differences between various underlying storage media (shmem, phys, stolen) but also elegantly handles thorny issues like lazy allocation, cache coherency, and multi-space mapping.
Once you understand the GEM Object, you understand the "soul" of GPU data. But before these souls can truly run on hardware, they need to be placed in the correct GPU address space — that is the realm we will explore next.


Top comments (0)