Part 5: GPU Address Spaces: GTT and PPGTT

#architecture #linux #systems #tutorial

In the previous lecture, we explored how TTM moves data between different types of physical memory (system memory and dedicated video memory). However, GPU hardware itself does not directly access physical addresses. Like the CPU, it has its own "virtual memory" system.

Today, we will delve into the most fundamental addressing model in i915 and see how data is "assembled" from the GPU's perspective.

1. Why Does a GPU Need Virtual Addresses?

In early graphics systems, GPUs often used physical addresses directly for DMA (Direct Memory Access) transfers. But this introduced a series of problems:

Memory Fragmentation: After a system has been running for a while, it becomes difficult to find large, contiguous blocks of physical memory. Graphics rendering, however, often requires contiguous address space (for example, a large texture).
Security: Without address isolation, a malicious application could craft instructions that allow the GPU to arbitrarily read or even overwrite the memory of other applications, or even the system kernel.

To solve these problems, Intel GPUs introduced a mechanism similar to the CPU's MMU (Memory Management Unit), called the GTT (Graphics Translation Table). Through the GTT, the driver can map scattered physical pages into a contiguous virtual address space from the GPU's perspective.

2. The Evolution from GGTT to PPGTT

In the i915 codebase, the address space is abstracted as struct i915_address_space, which has two core derivative implementations.

2.1 The Globally Shared Village: GGTT (Global GTT)

GGTT is the oldest addressing method. As the name implies, "Global" means all processes and GPU hardware engines see the same address space.

In the code, the GGTT is managed by i915_ggtt.c.
Primary Use: Due to its global visibility, the GGTT is now mainly reserved for low-level structures that must be accessed by global hardware units, such as: the Display Engine (the video memory used to scan out the screen image must be in the GGTT), some older hardware command rings (Ringbuffers), and early contexts.

2.2 The Isolated Private Territory: PPGTT (Per-Process GTT)

With the evolution of modern operating systems, the GGTT exposed obvious security and isolation shortcomings. Starting with Gen6 (Sandy Bridge), Intel introduced PPGTT and continued to refine it in subsequent generations (like Gen8 Broadwell), supporting true 48-bit addressing.

Isolation: Every user-space process that opens the /dev/dri/renderD128 node is allocated a unique i915_address_space instance (i.e., a PPGTT) within the driver (i915_gem_context.c).
Parallelism: When the GPU executes rendering commands for Process A, the hardware's page table register points to Process A's PPGTT; when switching to Process B, the page table switches as well. It is as if each application has a parallel universe, completely eliminating unauthorized access.

3. The Bridge Between Objects and Spaces: VMA

Having understood the memory pool (GEM Object) and the address space, the next question is: How is a GEM Object placed into an address space?

In i915, this bridge is the VMA (Virtual Memory Area), and its core structure, struct i915_vma, is defined in i915_vma_types.h:

struct i915_vma {
    // ...
    struct i915_address_space *vm;    // Points to the address space where this VMA resides (GGTT or a PPGTT)
    struct drm_i915_gem_object *obj;  // Points to the underlying physical data (GEM Object)
    struct sg_table *pages;           // Scatter-gather table of physical pages
    // ...
};

A GEM Object can exist during its lifetime without a VMA (physical memory allocated but not yet used by the GPU), or it can have multiple VMAs (for instance, mapped simultaneously into Process A's PPGTT and Process B's PPGTT, where their virtual addresses can even be completely different).

3.1 The Bind Process

When the GPU is about to execute a command involving that GEM Object, the driver must ensure it has been mapped into the correct address space.
The call chain eventually reaches i915_vma_bind() in i915_vma.c:

Allocate Virtual Space: Use the kernel's drm_mm allocator to carve out a free virtual address range within this address space.
Populate Page Tables: The driver fills the physical pages of the GEM Object (local memory physical pages for LMEM, or DMA-mapped bus addresses for SMEM) one by one into the page table structure of this address space (e.g., filling PDE/PTE entries).
Flush TLB: Notify the GPU hardware that the page tables have been updated and invalidate old caches.

3.2 The Unbind Process

When memory pressure requires eviction (which we mentioned in Lecture 4) or when a process exits, i915_vma_unbind() is called:

Clear the hardware page table entries, pointing them to an invalid or default Scratch Page (to prevent hardware hangs due to out-of-bounds access).
Release the occupied virtual address range back to the drm_mm allocator.
Only after this can the underlying GEM physical object be safely moved or freed.

Summary

The GPU's virtual memory system is the foundation of security and efficiency in modern graphics drivers. By separating the physical layer (GEM Object) from the virtual address layer (Address Space), i915 uses i915_vma to achieve extremely flexible management:

GGTT supports global operations like display;
PPGTT gives each process an independent, vast, and secure private rendering domain.

With this, our Video Memory Management Core comes to an end. We have learned what data is (GEM), where data resides (TTM), and how data is addressed (GTT/VMA).

In the next lecture, we will enter the "brain" of the driver—GT (Graphics Technology) Execution and Scheduling—to explore how user rendering commands are queued and ultimately fed into the hardware engines.