vinish kumar

Posted on Jan 24

Understanding and Exploring GPUs: Architecture, Stages to Render a Game, and Rendering Pipeline Stages.

#gpu #opengl #gamedev #beginners

In this article, we’ll cover the architecture of an GPU and how it works, the stages that works parallelly to render a game, and we’ll be seeing, the stages in the Rendering Pipeline, and difference between Integrated and Dedicated Graphics cards.

Table of Contents:

GPU Chip Architecture Overview
Overview on how GPU Chip performs
Stages of Rendering a Game
GPU vs CPU Comparison
Rendering Pipeline Stages (OpenGL)
Integrated Vs Dedicated Graphics card

Let’s see how Graphics Cards works (GPU chip):

Let’s explore the computational architecture and see how the GPU process mountains of data, and why they are ideal for running video game graphics, Bitcoin mining, neural network and AI.

GPU-GA102 Architecture (RTX-3090TI):
Center of the graphics card contains the Printed Circuit Board or PCB, with all the various components mounted on it.

When we open GPU, we find large chip or die named GA102 built from 28.3 billion transistors, the majority of the area of the chip is taken up by the Processing cores which have a hierarchical organization.

This chip is divided into 7 Graphic processing clusters or GPC’s and within each processing clusters are 12 Streaming multiprocessors or SM’s

And next, inside each of these Streaming Multiprocessors (SM’s) has 4 Wraps and 1 Ray tracing core

And then inside each warp contains 32 CUDA cores or Shading cores and 1 Tensor core.

So, across the entire GPU there are 10,752 CUDA cores, 336 Tensor cores, 84 Ray tracing cores.

These three types of cores execute all the calculations of the GPU and each has a different function.

CUDA Cores:
These CUDA core can be thought of a simple binary calculator with an addition button, a multiply button and a few others and are used the most when running a video game.

Tensor Cores:
These are the Matrix multiplication and addition calculators and are used for geometric transformation and working with neural and AI.

Ray-tracing Cores:
These are the largest but the fewest and are used to execute ray tracing algorithms.

A fact is that the 3080, 3090, 3080ti, and 3090ti graphic cards all use the same GA102 chip design.

They have different prices and releases on different time, why this?

During the manufacturing process sometimes patterning errors, dust particles or other manufacturing error cause damage in the circuit. The engineer finds the defective region that is damaged in the chip and permanently isolate and deactivate the nearby circuitry.

These chips are tested and categorized according to the number of defects, the 3090ti Graphic card has flawless GA102 chip with having 0 defective streaming processor, with all 10752 CUDA cores working properly.

The NVIDIA RTX series GPUs do not all use the same chips. NVIDIA uses different chips across the lineups to cater to different performance levels. Here’s a breakdown of the primary chips:

Example on 40 series:

-AD107: Used in entry-level models like the RTX 4050 and some versions of the RTX 4060. This chip is designed for low-power and efficiency-oriented performance.

-AD106: Typically found in mid-range models like the RTX 4060Ti and some configurations of the RTX 4060, providing a step up in power.

-AD104: Found in upper mid-range models like the RTX 4070 and RTX 4070Ti, designed to offer a balance between performance and efficiency.

-AD103: Used in higher-end models like the RTX 4080, offering substantial improvements in power and features.

-AD102: NVIDIA’s top-end chip in the RTX 40 series, powering the flagship RTX 4090, which is designed for the highest performance tier in gaming and productivity.

The other sections of the GPU chip around the edges we find 12 Graphic memory controllers, the NVLink controllers and the PCIe interface.

NVLink: Is a high-speed communications protocol that allow graphics cards to connect directly to each other, enabling faster data transfer and improved system performance.

PCIe interface: Is a standardized interface that enables high speed data transfer between electronic components, such as computers motherboard and expansion cards.

Note: All the example is provided on GA102 chip that used in RTX 30 series, the performance of the chip will be changing as per the chip used and also the architecture will also be changing.

All those calculations that are done in the background while playing a video game, in that most of the essential calculations in a game focus on the Rendering pipeline stages, where they are processed to visually represent the game’s world to the audience.

This pipeline systematically transforms 3D models, Textures, Lighting and effects through sequential stages, ultimately creating the dynamic, immersive visuals.

While we are playing a game in our computer, in the background the graphics card performs numerous complex calculations to render everything in the game visuals in real time.

The Stages it goes through to render a game:

Vertex Processing:

Vertex processing is the first stage in the Rendering pipeline of computer graphics, where vertices are processed by a series of shaders. Lets say a 3D model consists of vertices (points in 3D space). Each vertex has attributes like position, color, and texture coordinates. So the GPU processes these vertices using Vertex Shaders, transforming their positions in 3D space to a 2D space (screen space) via matrix operations (like model, view, and projection matrices). This also includes lighting calculations like Phong Shading.

(Phong Shading refers to an alternative lighting model used in computer graphics, which combine gradient shading and edge outlining to create visually stunning results. It describes the way a surface reflects light as a combination of the diffuse reflection of rough surface with the specular reflection of shiny surface).

What exactly are Shaders?

-Shaders are small programs that tells your graphics card how to draw a thing or model on your screen.

-Think of it as an artist for each pixel or part of a 3D model, it decides things like color, light, shadow and texture.

-This article covers only the basic concepts of shaders. Advanced topics will be explored in upcoming articles.

For example, a Shader can make a ball look shiny , rough, or transparent by controlling how light interacts with it, shaders work behind the scene in games and animations to make objects look more realistic or give them cool effects like going or blurring.

And there are Vertex shader, Geometry shader, Fragment shader these are different types of shaders that work together in stages to help build and color what you see on the screen.

Rasterization:
In a simple way to explain what is Rasterization is that the process of converting a vector-based image or object into a raster or bitmap format.

Transformed vertices(Let’s say that the vertices that have been moved to their final position in clip space i.e. Camera space, so that the object can be rendered on the screen). Now the GPU converts the 3D primitives(like triangles formed by vertices) into 2D pixels (or Fragments) on the screen. This process determines which pixels on the screen correspond to which part of the triangle.

For example, in simple terms let’s say that we have created an 3D triangle in the view space (screen space) so that it is visible to see. The process of taking and figuring out which pixels it covers is called Rasterization.

Fragment (Pixel) Shading:
Pixels from Rasterization, where the vector data is converted into a raster image (made up of pixels). The GPU calculates the color of each pixels using Fragment Shaders. These shaders take into account textures, lighting, shadows, reflections, and other effects.

Operations like texture sampling (getting color from a texture) and applying lighting models are performed here.

Texture Mapping:

Texture data (image that wrap around 3D objects). The GPU retrieves texture coordinates for each pixel and uses them to fetch the corresponding color from the texture map. It applies filtering (like Bilinear or Trilinear filtering) to smooth out the textures at different distances.

Bilinear filtering: is a process that smooths out a texture by calculating the average color of the four closest texels (i.e. pixels on the texture) to a given pixel on the screen, and also reduces the blockiness.

Trilinear filtering: is a process that takes this step further by also averaging the color values from the two closest mipmap levels (different resolutions of the same textures), resulting in a smoother transition between different texture scales, especially when viewing an object at different distance.

Lighting and Shading:

Scene lighting information and material properties are the important factors in how light interacts with a scene, and how that light is perceived.

The GPU performs lighting calculations, such as determining how light interacts with surfaces (diffuse, specular, ambient lighting). This may include advanced effects like Dump mapping (for roughness), Normal mapping (to simulate surface detail), and Global illumination (for realistic light behavior).

Lighting:

Lighting refers to the simulation of light source and how they illuminate objects in a scene. It plays an important role in creating depth, mood, and realism. The GPU performs lighting calculations, such as determining how light interacts with everything in a space.

There are different types of lights are these:

-Ambient light (Simulates indirect light that fills the environment uniformly),

-Directional light (Represents a distant light source like the sun),

-Point light (Emits light in all direction from a single point like a bulb),

-Spot light (Emits light in a cone shape, often used for focused illumination),

-Area light (Emits light from a surface area, producing softer shadows).

And also, there are different type of lighting Models we can use:

-Phong Lighting Model: Includes ambient, diffuse and specular components.

-Blinn-Phong Model: An optimized version of Phong with better specular highlights.

-PBR (Physically Based Rendering): Simulates real-world lighting behavior for greater realism.

Shading:

Shading is the process of determining the surface color and appearance of an object based on lighting material properties and viewer position.

As same lighting there are different types of shading techniques are there:

Flat Shading, Gouraud Shading, Phong Shading, Physically Based Shading.

These Shading techniques determine how the interaction between light and surface is computed and visualized. These techniques work with along with the different types of light to produce the final appearance of a surface in a scene.

Z-Buffering (Depth Testing):

A Z-Buffer is a type of data buffer used in computer graphics to represent depth information of objects in 3D space from a particular perspective. The depth is stored as a height map of the scene, where each value represents the distance from the camera, with 0 being the closest.

The GPU uses the Z-Buffer to ensure that closer objects are rendered in front of farther ones. It compares the depth of each new pixels with the value already stored in the Z-buffer and determines whether to overwrite the pixel.

Anti-Aliasing:

Anti-Aliasing is used to remove the aliasing effect. The aliasing effect refers to the appearance of jagged edges or “jaggies” after the rasterization process.

The GPU smooths out the jagged edges (aliasing) of objects using Anti-Aliasing techniques like MSAA (Multi-Sample Anti-Aliasing). This involves sampling multiple points in each pixel and averaging the results to produce a smoother image.

Blending and Transparency:

Multiple layers or transparent objects, the GPU blends different objects or pixels together. For transparent objects, it calculates how much of the background should show through by combining the colors of overlapping pixels.

In simple terms it combines pixel colors from different objects to create a single color. For example, a colored glass window is transparent because the glass has its own color, but the resulting color also contains the colors of the objects behind it.

Shadow Mapping:

Light source positions and scene geometry, The GPU calculates shadows by determining which parts of the scene are blocked from a light source. It does this by rendering the scene from the perspective of the light and using a shadow map to test whether each pixel is in a shadow.

Post-Processing:

A final image before display, the GPU applies post-processing effects like motion-blur, bloom, color grading, and depth of field to enhance the visual appearance of the final frame. These are additional pixel calculations done after the main rendering pipeline.

Physics and Geometry Calculations:

In modern games, GPU can also handle physics simulations (like fluid dynamics, collisions, and cloth simulations) using frameworks like NVIDIA PhysX, leveraging the parallel processing capabilities of the GPU.

Some advanced techniques include tessellation, where the GPU dynamically increases the geometric detail of the model (e.g. Subdividing polygons for smoother surfaces)

(**Tessellation: **This is a Vertex Processing stage in the OpenGL rendering pipeline where patches of vertex data are subdivided into smaller primitives. We will explore this in more detail in an upcoming article.)

So, all these calculations are performed parallelly, and the todays GPU, which is optimized for handling millions and trillions of calculations simultaneously. This parallelism allows for real-time rendering of high-quality graphics at high frame rates.

How many calculations do you think that your graphic card performs every second while running a video game?

Maybe 100 million, it’s what required to run a Super Mario 64 from 1996.

And 540 million calculations are done to run Half-life from 1998.

And 2.25 billion calculations to run World of Warcraft from 2004 and 100 billion calculations to run Minecraft from 2011.

And now to render most realistic video games such as Call of Duty: Modern Warfare III from 2023 it does 30–40 trillion calculations in the background to render the game.

Let’s see the Difference between GPU and CPU:

GPU: The GPU has over 10,000 cores, however when we look at the integrated chip Central Processing Unit (CPU)has only 24 cores.

We might think that GPU is more powerful than CPU, however its more complicated than you think.

For example:

Let’s think GPU as a Cargo ship, The amount of cargo capacity is the number of calculations and data that can be processed, and the speed of the ship is the rate at which how quickly those calculations and data are being processed.

It’s a trade off as a massive number of calculations that are being executed at a shorter rate.

A giant cargo ship only contains with bulk contents inside and are limited to travelling between ports (like from port A to port B). Similarly, GPU are a lot less flexible than CPU’s and can only run simple instructions like basic arithmetic.

GPU can’t run Operating system or interface with input devices or networks.

CPU: Let’s think CPU as a jumbo jet airplane, the speed of the airplane is the rate at which how quickly those calculations and data are being processed.

A few calculations that can be performed at a much faster rate. The key difference from a Cargo ship is that airplanes are lot more flexible since they can carry passengers, packages, or containers and take off and land at any one of tens of thousands of airports, like wise CPU’s are flexible in that they can run a variety of programs and instructions (multi-processing).

Note: Although technically CPU’s do have a faster clock speed, it’s more accurate to focus on the CPU’s shorter latency. Specifically, CPU’s uses more complicated memory hierarchy and branch prediction to reduce this latency.

If you have a lot less data that needs to be evaluated quickly then a CPU will be faster furthermore if you need to run an operating system or support network connections and a wide range of different applications and hardware, then you’ll need a CPU.

The Rendering Pipeline Stages:

The rendering pipeline is a sequence of steps that transforms 3D data, like models and textures, into 2D images displayed on the screen.

The specific stages and their implementation may vary depending on the graphics API used, such as OpenGL, Vulkan, or DirectX, but the overall process follows the same basic principles to render visuals efficiently.

There are 9 stages, they are:

Vertex Specification

Vertex Shader

Tessellation

Geometry Shader

Vertex Post processing

Primitive Assembly

Rasterization

Fragment Shader

Per Sample Operations

Note:In this article, we have covered the basics of how GPUs work, including the rendering pipeline and its stages. In later articles, we will delve deeper into the concepts of the rendering pipeline, exploring its stages.

And finally let’s discuss about the difference between Integrated and Dedicated graphics card:

We all know that Dedicated graphics card is more powerful than integrated graphics card, the dedicated graphics cards has built in separate hardware components with their own GPU and memory (VRAM), designed specifically for graphics-intensive workloads.

Whereas the Integrated graphics card is built into the same chip as the CPU, sharing the system resource like memory (RAM) and power. As the Integrated GPU’s uses the system’s main memory for graphics tasks, which limits performance and reduces the memory available for other applications. Integrated GPUs are cheaper because they don’t require additional hardware and consume less power. They are sufficient for basic computing task like web browsing, streaming, and light gaming, they struggle with resource-intensive tasks such as modern AAA gaming or 3D rendering.

Let’s take an example from Intel Iris Xe Integrated Graphic card, it has significantly lower number of Execution Units (EUs) (the equivalent of cores in CPUs) compared to dedicated GPUs.

These EUs are responsible for processing graphical data, but their architecture is optimized for lower power consumption and sharing resources with the CPU rather than raw power.

The Intel Iris Xe has up to 96 Execution Units (EUs) in higher-end configurations (e.g., 11th Gen Intel processors). The EUs are specialized cores that handle tasks like rendering, shading and computation.

When compared to Dedicated GPUs, A dedicated GPU (e.g. NVIDIA RTX 4050) will be having thousands of CUDA cores, designed for parallel processing and handling complex graphical tasks at much higher speeds.

Let’s see that how Integrated Graphics works: Instead of dedicated VRAM, integrated GPUs like the Intel Iris Xe use system RAM.

For example: If your system has 16GB of RAM, the integrated GPU might allocate 2GB for graphics. This limits the GPUs performance because system RAM is slower and not optimized for graphics data.

Integrated GPUs prioritizes power-saving and lightweight tasks, making them ideal for laptops and Ultrabook’s.

For more understanding on Intel Iris Xe graphics card you can check the official site of intel which has all the information on this chip.

Intel Iris Xe GPU Architecture

Conclusion:

I hope this article has provided you a clear understanding on the topics that we have been discussed. These basics serve as a stepping stone for diving deeper into the intricate concepts of graphics rendering.

Finally, I thank whoever reading, for spending your valuable time on my article.

Much of the explanation on GPU architecture and the GPU overview was inspired by the excellent content from Branch Education’s YouTube channel. This has greatly contributed to my understanding of GPU architecture and functionality.

For a more detailed and in-depth explanation of how GPUs work, I highly recommend checking out their channel
How Graphics cards work?

Contact: Vinish Kumar