DEV Community

Deleon Karen
Deleon Karen

Posted on

Part 11: Extreme Power Control: RPM, RC6, and RPS

In previous chapters, we explored how i915 manages video memory, schedules tasks, and lights up the display. However, for modern GPUs, "how to run fast" is merely the baseline; "how to save power" is the real technical barrier.

Whether in a thin-and-light laptop or a power-constrained data center, the GPU is a notorious power hog. If left unchecked, it will not only drain the battery but also cause severe thermal throttling. To squeeze out every drop of energy efficiency, the i915 driver has built an extremely sophisticated power control system within the kernel.

Today, we will focus on the three core pillars of i915 power management: RPM (Runtime PM), RC6, and RPS.

1. Putting the Device to Sleep as a Whole: Runtime PM (RPM)

Imagine this scenario: your laptop screen is on, displaying a static, plain-text article. At this moment, apart from the display controller periodically reading the framebuffer, all of the GPU's compute engines are essentially idle. In a more extreme case, if you have an external eGPU (docking station) connected and no program is currently using it, should it still be running at full power?

Runtime Power Management (RPM) is designed to solve this problem. It is a standard power management framework provided by the Linux kernel, and its core implementation in the i915 driver resides in intel_runtime_pm.c.

1.1 Core Mechanism: Wakeref (Wake Reference)

i915's RPM management relies on a mechanism called Wakeref (Wake Reference Counting).

  • When the driver needs to access GPU hardware (e.g., writing instructions to registers, handling interrupts, or when userspace initiates a rendering request), it must first acquire a Wakeref: calling intel_runtime_pm_get(&i915->runtime_pm).
  • If this is the first Wakeref, the driver triggers a device wake-up (waking from PCI D3hot/D3cold sleep state to D0 full-power state).
  • Once the operation is complete, the driver releases the reference: calling intel_runtime_pm_put().
  • Once the Wakeref count drops to zero, the driver considers the GPU idle and allows the device to enter deep sleep.

This mechanism is extremely strict. Assertions like assert_rpm_wakelock_held() are often seen in the code, forcing developers to "hold a permit" before touching the hardware. Otherwise, reading or writing to silicon that has already been powered down will directly cause the entire system to hang.

2. Dynamic Sleep of the Render Engine: RC6 (Render C-States)

RPM controls the life and death of the entire GPU device, but its granularity is too coarse. If the screen is refreshing, the device cannot completely enter RPM sleep. At this point, we need finer-grained control — RC6.

2.1 What is RC6?

In the CPU world, there are C-States (e.g., C0 is running, C6 is deep sleep). Intel GPUs introduced a similar concept called Render C-States (RC). The most critical among them is RC6.

When RC6 is enabled, the hardware's PCU (Power Control Unit) constantly monitors the busyness of each engine (like the render engine RCS, video engine VCS). If it finds an engine has been idle for more than a few milliseconds, the hardware automatically cuts off that engine's clock and even power, while other parts (like the display output) remain operational.

2.2 The Driver's Role

Entering and exiting RC6 is primarily handled automatically by the hardware, but in gt/intel_rc6.c, the i915 driver is responsible for:

  1. Initialization and Enabling: Configuring the hardware sleep thresholds and policies in intel_rc6_enable().
  2. Status Monitoring: Reading hardware registers via intel_rc6_residency_ns() to count what proportion of the recent past the GPU spent in the RC6 (sleep) state. This data is often used to evaluate whether the driver's power-saving optimizations are effective.
  3. Deeper Sleep: Besides RC6, the hardware also supports deeper RC6p and RC6pp states. Entering these states saves even more power but comes with greater wake-up latency.

3. Dynamic Frequency Scaling (GPU Turbo): RPS (Render P-States)

If RC6 is about making the GPU "sleep" smartly, then RPS (Render P-States) is about making the GPU "work" smartly. It is the equivalent of dynamic frequency scaling in the CPU world (cpufreq / Turbo Boost).

3.1 Three Anchors of P-States

In gt/intel_rps.c, you will often see three terms that define the boundaries of GPU frequency:

  • RP0: The maximum turbo frequency supported by the hardware (Max Turbo Frequency), offering the highest performance but consuming the most power.
  • RP1: The guaranteed base frequency (Guaranteed Frequency), the sustained maximum frequency under good thermal conditions.
  • RPn: The minimum frequency supported by the hardware (Minimum Frequency), offering the lowest performance but consuming the least power.

3.2 Load-Based Frequency Scaling

By default, the GPU sits at RPn when idle. When the workload increases, hardware counters trigger an interrupt (Up Threshold), notifying the driver that compute power is insufficient. The driver's intel_rps_set() then intervenes, stepping up the GPU frequency; the reverse happens when load decreases.

3.3 Sacrificing Power for Latency: Wait Boost

i915 features an extremely interesting mechanism called RPS Boost.
In i915_gem_wait.c, when userspace (like X Server or a game) urgently needs a rendering result and calls a system call to wait for a dma_fence, the driver realizes: "The user is anxiously waiting for this frame."

At this moment, the driver calls intel_rps_boost(), ignoring the current load ramp-up curve, and directly forces the GPU frequency to the maximum RP0, completing the current task as quickly as possible to reduce visual latency for the user. Once the task is finished, the frequency rapidly drops back down. This is a classic strategy of "trading instantaneous high power consumption for ultimate experience."

Summary

i915's power control is a relay race from macro to micro:

  1. RPM controls the big picture, decisively cutting power when the entire device is idle.
  2. RC6 seizes every opportunity, letting the render engine sneak a nap in the millisecond gaps when the display is on but the image is static.
  3. RPS manages the rhythm, dynamically adjusting frequency based on load during operation, and instantly delivering a "shot in the arm" when the user demands it.

It is the tight coordination of these three that enables Intel GPUs to achieve excellent battery life across a range of devices.

In the next lecture, we will explore the darkest yet most life-saving part of the GPU driver: how i915 pulls the GPU back from the brink when it truly "crashes" (Hang Detection and Reset).

Top comments (0)