DEV Community: vinish kumar

Understanding OpenGL Rendering Pipeline Stages

vinish kumar — Sun, 26 Jan 2025 02:17:00 +0000

As we discussed before about How GPU works and the processes involved when playing a game, what are the process goes through to render. In this article lets dive into OpenGL Rendering pipeline stages that goes through to rendering an object in a 3D/2D space.

First let's understand what is OpenGL?

OpenGL is a cross-platform API that allows developers to create 2D and 3D graphic applications. It's used to interact with a graphics processing unit (GPU) to achieve hardware-accelerated rendering. And what it basically is that it's a bunch of functions that developers can call to handle graphics rendering. Specifically, OpenGL enables us to access the GPU and instruct it to perform rendering tasks.

What OpenGL Is (and isn't):

Let's clear up some common misconceptions about OpenGL. A lot of people refer to OpenGL as a library, an engine, or a framework - but it's none of those things. At its core, OpenGL is a specification, similar to how the C++ language is defined. It does not include any code or implementation itself but outlines functions, parameters, and expected outputs.
For instance, the specification defines that a function should exist, accept certain parameters, and return specific values. The actual implementation of OpenGL comes from GPU manufacturers, such as NVIDIA, AMD, or Intel. If you're wondering where to download OpenGL, it's included in your GPU drivers provided by the manufacturer. Each vendor creates its own implementation of OpenGL based on the specification.

Why Use OpenGL?

OpenGL has been a popular choice for rendering graphics due to its cross-platform support and extensive functionality. While newer APIs like Vulkan and DirectX exist, OpenGL remains relevant for many applications, especially those that prioritize simplicity and widespread support.

The Stages:

Vertex Specification

The vertex specification stage is the first step in the OpenGL rendering pipeline, where vertex data is defined and sent to the GPU for processing. This stage is crucial because it lays the foundation for all subsequent steps in the rendering pipeline. It involves preparing and organizing the data that represents the geometric structure of the objects you want to render.
Technically, each vertex retrieved from the vertex arrays (defined as VAO, VBO, and IBO) is acted upon by a vertex shader. The vertex data is stored in arrays, often structured as matrices, and contains all the necessary information for rendering. Each vertex may include:

· Position
· Color
· Normal
· Texture Coordinates
· Custom Attributes

Note: Attributes are small data sets that describe an object to be rendered.

Buffers and Their Roles:

Vertex Buffer Objects (VBOs):

VBOs store vertex data in GPU memory, allowing efficient data transfer and access during rendering. They ensure that the data is readily available for the GPU.

Vertex Array Objects (VAOs):

VAOs act as containers that store the state of vertex attribute configurations. In simple terms, they act as a container for VBOs (attributes). This organization simplifies switching between different sets of vertex data without reconfiguring attributes.

Element Buffer Objects (EBOs) / Index Buffer Objects (IBOs):

EBOs or IBOs store indices for indexed drawing. They allow the reuse of vertex data for multiple primitives, reducing memory usage and improving performance. For example, when defining a 3D triangle with multiple sides, instead of redefining vertices for each side, IBOs can reference existing vertex data efficiently.

Practical Example:

Vertex Processing:

Vertex processing is the stage of the graphics pipeline where individual vertices of a 3D/2D model are transformed and prepared for rendering. As explained in vertex specification, each vertex retrieved from the vertex arrays (VAOs) is processed by a vertex shader. Vertex processing includes multiple stages, some fixed functions and other programmable, depending on the graphics API, (e.g. OpenGL, DirectX, Vulkan). In OpenGL there are vertex shader, tessellation, & geometry shader.

Vertex Shader:

The vertex shader performs basic processing of each individual vertex. It's a programmable stage in the graphics rendering pipeline. Vertex shaders receive attribute inputs from vertex specification and process each incoming vertex to produce a single outgoing vertex.
In simple terms, the vertex shader's primary role is:

Transformation: Transforming a vertex's position from the object's local space to clip space.

Attribute Handling: Passing vertex-specific attributes (e.g., color, texture coordinates, normals) to subsequent pipeline stages.

Example:

Vertex shaders can have user-defined outputs, but they must also output a special value representing the final position of the vertex. This clip-space position is crucial for rendering and is mandatory, as vertex shaders are not optional in the pipeline.

Key Features of Vertex Shaders:

** Programmability:**
Developers can write custom vertex shaders to implement specific transformations or effects.

Parallel Processing:
Each vertex is processed independently, making vertex shaders highly parallelizable on modern GPUs.

Flexible Outputs:
Apart from clip-space positions, vertex shaders can output other interpolated data, such as texture coordinates or lighting calculations, for use in later stages.

Tessellation:

Tessellation is a process in the graphics pipeline that subdivides polygons into smaller, finer primitives, such as triangles. This allows for the creation of smoother surfaces, detailed geometry, or adaptive level-of-detail rendering without requiring high-detail models directly from artists.

Tessellation in OpenGL consists of two programmable shader stages and a fixed-function tessellator:

Tessellation Control Shader (TCS):

· Determines the amount of tessellation to apply to a primitive.
· Ensures connectivity between adjacent tessellated primitives.
· Dynamically adjusts the level of detail based on factors like distance from the camera or application-specific criteria.

Tessellation Evaluation Shader (TES):

· Computes the final positions and attributes of the subdivided primitives.
· Applies interpolation and other user-defined operations to generate smooth and detailed geometry.

Applications of Tessellation:

Terrain Rendering:
· Generate highly detailed terrains from a coarse height map
Character Models:
· Adding fine details to character models, like wrinkles or muscle definition
** Adaptive Detail:**
· Automatically adjusting the level of detail based on camera proximity to optimize performance

Geometry Shader:

The geometry shader is a programmable and optional stage in the graphics pipeline that processes entire primitives (points, lines, or triangles) as input. Unlike vertex shaders, which operate on individual vertices, the geometry shader works on entire primitives and can:

Generate New Primitives:
· For example, creating additional triangles for tessellation or adding detail.

Modify Existing Primitives:
· Changing the shape or attributes of input primitives.

Discard Primitives Entirely:
· Eliminating unnecessary primitives to optimize rendering.

The input primitives for geometry shaders are the output primitives from a subset of the Primitive Assembly process. So, if you send a triangle strip as a single primitive, what the geometry shaders will see is a series of triangles.

However, there are a number of input primitive types that are defined specifically for geometry shaders. These adjacency primitives give geometry shaders a larger view of the primitives, they provide access to vertices of primitives adjacent to the current one.

The output of a geometry shader is zero or more simple primitives, much like the output of the primitive assembly. The geometry shader is able to remove primitives, or tessellate them by outputting many primitives for a single input. The geometry shader can also tinker with the vertex values themselves, either doing some of the work for the vertex shader, or just to interpolate the value when tessellating them. Geometry shaders can even convert primitives to different types: input point primitives can become triangles, or line can become points.

Vertex post-processing:

The vertex post-processing step is primarily responsible for preparing the processed vertex data for the rasterization process. The key operations in vertex post-processing are:

Perspective Division:

· After the vertex shader processes a vertex, it outputs the position in clip space, represented in homogeneous coordinates (x, y, z, w).
· To transform clip space coordinates into normalized device coordinates (NDC), the GPU performs perspective division
· This ensures that vertices closer to the camera appear larger, creating the effect of perspective.

Viewport Transformation:

· NDC ranges from [-1, 1] along all axes. These coordinates need to be mapped to window space or screen space (pixel coordinates) based on the viewport settings.
· This transformation involves scaling and translating the coordinates to fit the current viewport.

Clipping:

· Clipping removes vertices or primitives that fall outside the view frustum (the visible region of the 3D scene defined by the camera's perspective or orthographic projection).

· Clipping ensures that only the visible portions of the geometry are processed further. If a primitive partially intersects the view frustum, the GPU may generate new vertices at the intersection points (clipping the primitives).

Face Culling:

· Triangle primitives can be culled (i.e., discarded without rendering) based on the triangle's orientation in window space. Back-face culling is an optional operation that removes faces of geometry not visible to the camera (e.g., the back sides of triangles).
· This helps improve performance by not rendering unnecessary geometry.

Depth Range Mapping:

Depth range mapping in OpenGL is a crucial concept that deals with how depth values are mapped from normalized device coordinates (NDC) to the depth buffer. It ensures proper depth testing and determines visibility within a 3D scene.

· In NDC, after vertex processing and perspective division, the depth values range between -1 (near clip plane) and 1 (far clip plane).
· OpenGL requires depth values to be clamped between 0 and 1 for storage in the depth buffer.

Primitive Assembly:

Primitive assembly is a critical stage in the graphics pipeline that occurs just before rasterization. During this stage, the GPU takes the processed vertices and assembles them into primitives, such as points, lines, or triangles, based on the rendering mode specified in the drawing command.

Why Primitive Assembly is Important:

Defines Shape Connectivity:
· Determines how vertices are grouped into meaningful shapes.
** Handles Optimizations:**
· Reuses vertices with indexed drawing, reducing redundant calculations.
· Eliminates unnecessary primitives through clipping and culling.
Prepares for Rasterization:
· Outputs fully assembled and valid primitives, ready for fragment generation.

Primitive assembly ensures the integrity and connectivity of geometry before it moves to the rasterization stage, where it's converted into fragments for pixel shading.

Rasterization:

Rasterization is responsible for converting geometric primitives into fragments that can be processed further to produce the final image on the screen. A fragment is a set of state that is used to compute the final data for a pixel in the output framebuffer. The state for a fragment includes its position in screen-space, the sample coverage if multisampling is enabled, and a list of arbitrary data that was output from the previous vertex or geometry shader.

In basic, it bridges the gap between the vector-based representation of objects in 3D space and the grid of pixels in 2D screen space.

Fragment Shader:

The fragment shader is a programmable stage in the rendering pipeline and the Fragments are the data elements produced during the rasterization stage when primitives are converted into pixel-sized chunks.

Each fragment corresponds to a pixel on the screen but includes additional data such as:

· Interpolated attributes (e.g. colors, texture coordinates, normals) from the vertices of the primitives.
· Depth (z-coordinates in window space).

Fragment Shaders are not able to set the stencil data for fragment, but they do have control over the color and depth values.

This shader is an optional. If you render without a fragment shader, the depth values of the fragment get their usual values. But the values of all the colors that a fragment could have been undefined. Rendering without a fragment shader is useful when rendering only a primitive's default depth information to the depth buffer.

Per-Sample Operations:

Per-sample operations are the final set of stages in the graphics pipeline that occur after the fragment shader has executed and before the rendered results are written to the framebuffer. These operations are essential for refining the final appearance of the image and ensuring it adheres to rendering requirements like depth, stencil, blending and anti-aliasing.

What are Samples:
A sample represents a point within a pixel that is used to calculate the final color of that pixel.

For multi-sampling anti-aliasing (MSAA), each pixel can have multiple samples to capture more detail about overlapping primitives. Each sample may have its own depth, stencil, and color data.

Steps in Per-Sample Operations:

Pixel Ownership Test:

· Determines whether a pixel belongs to the current window or render target.
· If the pixels fail this test (e.g. due to being outside the window or overlapping an obscured area), it is discarded. Always passes when using a Framebuffer Object. Failure means that the pixel contains undefined values.

Scissor Test:

· When enabled, the test fails if the fragment's pixel lies outside of a specified rectangle of the screen (the scissor box).
· Only fragments inside the scissor box are processed further.
Stencil Test:
· When enabled, Compares the fragments stencil value with a reference value using a stencil function.
· Depending on the result, the fragment can be discarded, or the stencil buffer can be updated.
· Useful for making areas of the screen or implementing effects like outlining or decals.

Depth Test (Z-Test):

· When enabled, compares the fragment's depth value (output by the fragment shader on modified during rasterization) with the existing value in the depth buffer.
· If the test fails, the fragment is discarded.
· Ensures correct occlusion and depth-based ordering of primitives.

After this color blending happens. For each fragment color value, there is a specific blending operation between it and the color already in the framebuffer at that location.
Logical Operations may also take place in lieu of blending, which perform bitwise operations between the fragment colors and framebuffer colors.

This article provides an overview of the OpenGL Rendering Pipeline. It emphasizes the importance of each stage, the flexibility of OpenGL as a specification, and its relevance in graphics programming.

For more detailed look at the code, I've created GitHub repository where you can access and explore the entire basic project. You can find the repository here: Github

You can further explore on OpenGL here: OpenGL

Finally, I thank whoever reading, for spending your valuable time on my article.

Understanding and Exploring GPUs: Architecture, Stages to Render a Game, and Rendering Pipeline Stages.

vinish kumar — Fri, 24 Jan 2025 16:33:00 +0000

In this article, we’ll cover the architecture of an GPU and how it works, the stages that works parallelly to render a game, and we’ll be seeing, the stages in the Rendering Pipeline, and difference between Integrated and Dedicated Graphics cards.

Table of Contents:

GPU Chip Architecture Overview
Overview on how GPU Chip performs
Stages of Rendering a Game
GPU vs CPU Comparison
Rendering Pipeline Stages (OpenGL)
Integrated Vs Dedicated Graphics card

Let’s see how Graphics Cards works (GPU chip):

Let’s explore the computational architecture and see how the GPU process mountains of data, and why they are ideal for running video game graphics, Bitcoin mining, neural network and AI.

GPU-GA102 Architecture (RTX-3090TI):
Center of the graphics card contains the Printed Circuit Board or PCB, with all the various components mounted on it.

When we open GPU, we find large chip or die named GA102 built from 28.3 billion transistors, the majority of the area of the chip is taken up by the Processing cores which have a hierarchical organization.

This chip is divided into 7 Graphic processing clusters or GPC’s and within each processing clusters are 12 Streaming multiprocessors or SM’s

And next, inside each of these Streaming Multiprocessors (SM’s) has 4 Wraps and 1 Ray tracing core

And then inside each warp contains 32 CUDA cores or Shading cores and 1 Tensor core.

So, across the entire GPU there are 10,752 CUDA cores, 336 Tensor cores, 84 Ray tracing cores.

These three types of cores execute all the calculations of the GPU and each has a different function.

CUDA Cores:
These CUDA core can be thought of a simple binary calculator with an addition button, a multiply button and a few others and are used the most when running a video game.

Tensor Cores:
These are the Matrix multiplication and addition calculators and are used for geometric transformation and working with neural and AI.

Ray-tracing Cores:
These are the largest but the fewest and are used to execute ray tracing algorithms.

A fact is that the 3080, 3090, 3080ti, and 3090ti graphic cards all use the same GA102 chip design.

They have different prices and releases on different time, why this?

During the manufacturing process sometimes patterning errors, dust particles or other manufacturing error cause damage in the circuit. The engineer finds the defective region that is damaged in the chip and permanently isolate and deactivate the nearby circuitry.

These chips are tested and categorized according to the number of defects, the 3090ti Graphic card has flawless GA102 chip with having 0 defective streaming processor, with all 10752 CUDA cores working properly.

The NVIDIA RTX series GPUs do not all use the same chips. NVIDIA uses different chips across the lineups to cater to different performance levels. Here’s a breakdown of the primary chips:

Example on 40 series:

-AD107: Used in entry-level models like the RTX 4050 and some versions of the RTX 4060. This chip is designed for low-power and efficiency-oriented performance.

-AD106: Typically found in mid-range models like the RTX 4060Ti and some configurations of the RTX 4060, providing a step up in power.

-AD104: Found in upper mid-range models like the RTX 4070 and RTX 4070Ti, designed to offer a balance between performance and efficiency.

-AD103: Used in higher-end models like the RTX 4080, offering substantial improvements in power and features.

-AD102: NVIDIA’s top-end chip in the RTX 40 series, powering the flagship RTX 4090, which is designed for the highest performance tier in gaming and productivity.

The other sections of the GPU chip around the edges we find 12 Graphic memory controllers, the NVLink controllers and the PCIe interface.

NVLink: Is a high-speed communications protocol that allow graphics cards to connect directly to each other, enabling faster data transfer and improved system performance.

PCIe interface: Is a standardized interface that enables high speed data transfer between electronic components, such as computers motherboard and expansion cards.

Note: All the example is provided on GA102 chip that used in RTX 30 series, the performance of the chip will be changing as per the chip used and also the architecture will also be changing.

All those calculations that are done in the background while playing a video game, in that most of the essential calculations in a game focus on the Rendering pipeline stages, where they are processed to visually represent the game’s world to the audience.

This pipeline systematically transforms 3D models, Textures, Lighting and effects through sequential stages, ultimately creating the dynamic, immersive visuals.

While we are playing a game in our computer, in the background the graphics card performs numerous complex calculations to render everything in the game visuals in real time.

The Stages it goes through to render a game:

Vertex Processing:

Vertex processing is the first stage in the Rendering pipeline of computer graphics, where vertices are processed by a series of shaders. Lets say a 3D model consists of vertices (points in 3D space). Each vertex has attributes like position, color, and texture coordinates. So the GPU processes these vertices using Vertex Shaders, transforming their positions in 3D space to a 2D space (screen space) via matrix operations (like model, view, and projection matrices). This also includes lighting calculations like Phong Shading.

(Phong Shading refers to an alternative lighting model used in computer graphics, which combine gradient shading and edge outlining to create visually stunning results. It describes the way a surface reflects light as a combination of the diffuse reflection of rough surface with the specular reflection of shiny surface).

What exactly are Shaders?

-Shaders are small programs that tells your graphics card how to draw a thing or model on your screen.

-Think of it as an artist for each pixel or part of a 3D model, it decides things like color, light, shadow and texture.

-This article covers only the basic concepts of shaders. Advanced topics will be explored in upcoming articles.

For example, a Shader can make a ball look shiny , rough, or transparent by controlling how light interacts with it, shaders work behind the scene in games and animations to make objects look more realistic or give them cool effects like going or blurring.

And there are Vertex shader, Geometry shader, Fragment shader these are different types of shaders that work together in stages to help build and color what you see on the screen.

Rasterization:
In a simple way to explain what is Rasterization is that the process of converting a vector-based image or object into a raster or bitmap format.

Transformed vertices(Let’s say that the vertices that have been moved to their final position in clip space i.e. Camera space, so that the object can be rendered on the screen). Now the GPU converts the 3D primitives(like triangles formed by vertices) into 2D pixels (or Fragments) on the screen. This process determines which pixels on the screen correspond to which part of the triangle.

For example, in simple terms let’s say that we have created an 3D triangle in the view space (screen space) so that it is visible to see. The process of taking and figuring out which pixels it covers is called Rasterization.

Fragment (Pixel) Shading:
Pixels from Rasterization, where the vector data is converted into a raster image (made up of pixels). The GPU calculates the color of each pixels using Fragment Shaders. These shaders take into account textures, lighting, shadows, reflections, and other effects.

Operations like texture sampling (getting color from a texture) and applying lighting models are performed here.

Texture Mapping:

Texture data (image that wrap around 3D objects). The GPU retrieves texture coordinates for each pixel and uses them to fetch the corresponding color from the texture map. It applies filtering (like Bilinear or Trilinear filtering) to smooth out the textures at different distances.

Bilinear filtering: is a process that smooths out a texture by calculating the average color of the four closest texels (i.e. pixels on the texture) to a given pixel on the screen, and also reduces the blockiness.

Trilinear filtering: is a process that takes this step further by also averaging the color values from the two closest mipmap levels (different resolutions of the same textures), resulting in a smoother transition between different texture scales, especially when viewing an object at different distance.

Lighting and Shading:

Scene lighting information and material properties are the important factors in how light interacts with a scene, and how that light is perceived.

The GPU performs lighting calculations, such as determining how light interacts with surfaces (diffuse, specular, ambient lighting). This may include advanced effects like Dump mapping (for roughness), Normal mapping (to simulate surface detail), and Global illumination (for realistic light behavior).

Lighting:

Lighting refers to the simulation of light source and how they illuminate objects in a scene. It plays an important role in creating depth, mood, and realism. The GPU performs lighting calculations, such as determining how light interacts with everything in a space.

There are different types of lights are these:

-Ambient light (Simulates indirect light that fills the environment uniformly),

-Directional light (Represents a distant light source like the sun),

-Point light (Emits light in all direction from a single point like a bulb),

-Spot light (Emits light in a cone shape, often used for focused illumination),

-Area light (Emits light from a surface area, producing softer shadows).

And also, there are different type of lighting Models we can use:

-Phong Lighting Model: Includes ambient, diffuse and specular components.

-Blinn-Phong Model: An optimized version of Phong with better specular highlights.

-PBR (Physically Based Rendering): Simulates real-world lighting behavior for greater realism.

Shading:

Shading is the process of determining the surface color and appearance of an object based on lighting material properties and viewer position.

As same lighting there are different types of shading techniques are there:

Flat Shading, Gouraud Shading, Phong Shading, Physically Based Shading.

These Shading techniques determine how the interaction between light and surface is computed and visualized. These techniques work with along with the different types of light to produce the final appearance of a surface in a scene.

Z-Buffering (Depth Testing):

A Z-Buffer is a type of data buffer used in computer graphics to represent depth information of objects in 3D space from a particular perspective. The depth is stored as a height map of the scene, where each value represents the distance from the camera, with 0 being the closest.

The GPU uses the Z-Buffer to ensure that closer objects are rendered in front of farther ones. It compares the depth of each new pixels with the value already stored in the Z-buffer and determines whether to overwrite the pixel.

Anti-Aliasing:

Anti-Aliasing is used to remove the aliasing effect. The aliasing effect refers to the appearance of jagged edges or “jaggies” after the rasterization process.

The GPU smooths out the jagged edges (aliasing) of objects using Anti-Aliasing techniques like MSAA (Multi-Sample Anti-Aliasing). This involves sampling multiple points in each pixel and averaging the results to produce a smoother image.

Blending and Transparency:

Multiple layers or transparent objects, the GPU blends different objects or pixels together. For transparent objects, it calculates how much of the background should show through by combining the colors of overlapping pixels.

In simple terms it combines pixel colors from different objects to create a single color. For example, a colored glass window is transparent because the glass has its own color, but the resulting color also contains the colors of the objects behind it.

Shadow Mapping:

Light source positions and scene geometry, The GPU calculates shadows by determining which parts of the scene are blocked from a light source. It does this by rendering the scene from the perspective of the light and using a shadow map to test whether each pixel is in a shadow.

Post-Processing:

A final image before display, the GPU applies post-processing effects like motion-blur, bloom, color grading, and depth of field to enhance the visual appearance of the final frame. These are additional pixel calculations done after the main rendering pipeline.

Physics and Geometry Calculations:

In modern games, GPU can also handle physics simulations (like fluid dynamics, collisions, and cloth simulations) using frameworks like NVIDIA PhysX, leveraging the parallel processing capabilities of the GPU.

Some advanced techniques include tessellation, where the GPU dynamically increases the geometric detail of the model (e.g. Subdividing polygons for smoother surfaces)

(**Tessellation: **This is a Vertex Processing stage in the OpenGL rendering pipeline where patches of vertex data are subdivided into smaller primitives. We will explore this in more detail in an upcoming article.)

So, all these calculations are performed parallelly, and the todays GPU, which is optimized for handling millions and trillions of calculations simultaneously. This parallelism allows for real-time rendering of high-quality graphics at high frame rates.

How many calculations do you think that your graphic card performs every second while running a video game?

Maybe 100 million, it’s what required to run a Super Mario 64 from 1996.

And 540 million calculations are done to run Half-life from 1998.

And 2.25 billion calculations to run World of Warcraft from 2004 and 100 billion calculations to run Minecraft from 2011.

And now to render most realistic video games such as Call of Duty: Modern Warfare III from 2023 it does 30–40 trillion calculations in the background to render the game.

Let’s see the Difference between GPU and CPU:

GPU: The GPU has over 10,000 cores, however when we look at the integrated chip Central Processing Unit (CPU)has only 24 cores.

We might think that GPU is more powerful than CPU, however its more complicated than you think.

For example:

Let’s think GPU as a Cargo ship, The amount of cargo capacity is the number of calculations and data that can be processed, and the speed of the ship is the rate at which how quickly those calculations and data are being processed.

It’s a trade off as a massive number of calculations that are being executed at a shorter rate.

A giant cargo ship only contains with bulk contents inside and are limited to travelling between ports (like from port A to port B). Similarly, GPU are a lot less flexible than CPU’s and can only run simple instructions like basic arithmetic.

GPU can’t run Operating system or interface with input devices or networks.

CPU: Let’s think CPU as a jumbo jet airplane, the speed of the airplane is the rate at which how quickly those calculations and data are being processed.

A few calculations that can be performed at a much faster rate. The key difference from a Cargo ship is that airplanes are lot more flexible since they can carry passengers, packages, or containers and take off and land at any one of tens of thousands of airports, like wise CPU’s are flexible in that they can run a variety of programs and instructions (multi-processing).

Note: Although technically CPU’s do have a faster clock speed, it’s more accurate to focus on the CPU’s shorter latency. Specifically, CPU’s uses more complicated memory hierarchy and branch prediction to reduce this latency.

If you have a lot less data that needs to be evaluated quickly then a CPU will be faster furthermore if you need to run an operating system or support network connections and a wide range of different applications and hardware, then you’ll need a CPU.

The Rendering Pipeline Stages:

The rendering pipeline is a sequence of steps that transforms 3D data, like models and textures, into 2D images displayed on the screen.

The specific stages and their implementation may vary depending on the graphics API used, such as OpenGL, Vulkan, or DirectX, but the overall process follows the same basic principles to render visuals efficiently.

There are 9 stages, they are:

Vertex Specification

Vertex Shader

Tessellation

Geometry Shader

Vertex Post processing

Primitive Assembly

Rasterization

Fragment Shader

Per Sample Operations

Note:In this article, we have covered the basics of how GPUs work, including the rendering pipeline and its stages. In later articles, we will delve deeper into the concepts of the rendering pipeline, exploring its stages.

And finally let’s discuss about the difference between Integrated and Dedicated graphics card:

We all know that Dedicated graphics card is more powerful than integrated graphics card, the dedicated graphics cards has built in separate hardware components with their own GPU and memory (VRAM), designed specifically for graphics-intensive workloads.

Whereas the Integrated graphics card is built into the same chip as the CPU, sharing the system resource like memory (RAM) and power. As the Integrated GPU’s uses the system’s main memory for graphics tasks, which limits performance and reduces the memory available for other applications. Integrated GPUs are cheaper because they don’t require additional hardware and consume less power. They are sufficient for basic computing task like web browsing, streaming, and light gaming, they struggle with resource-intensive tasks such as modern AAA gaming or 3D rendering.

Let’s take an example from Intel Iris Xe Integrated Graphic card, it has significantly lower number of Execution Units (EUs) (the equivalent of cores in CPUs) compared to dedicated GPUs.

These EUs are responsible for processing graphical data, but their architecture is optimized for lower power consumption and sharing resources with the CPU rather than raw power.

The Intel Iris Xe has up to 96 Execution Units (EUs) in higher-end configurations (e.g., 11th Gen Intel processors). The EUs are specialized cores that handle tasks like rendering, shading and computation.

When compared to Dedicated GPUs, A dedicated GPU (e.g. NVIDIA RTX 4050) will be having thousands of CUDA cores, designed for parallel processing and handling complex graphical tasks at much higher speeds.

Let’s see that how Integrated Graphics works: Instead of dedicated VRAM, integrated GPUs like the Intel Iris Xe use system RAM.

For example: If your system has 16GB of RAM, the integrated GPU might allocate 2GB for graphics. This limits the GPUs performance because system RAM is slower and not optimized for graphics data.

Integrated GPUs prioritizes power-saving and lightweight tasks, making them ideal for laptops and Ultrabook’s.

For more understanding on Intel Iris Xe graphics card you can check the official site of intel which has all the information on this chip.

Intel Iris Xe GPU Architecture

Conclusion:

I hope this article has provided you a clear understanding on the topics that we have been discussed. These basics serve as a stepping stone for diving deeper into the intricate concepts of graphics rendering.

Finally, I thank whoever reading, for spending your valuable time on my article.

Much of the explanation on GPU architecture and the GPU overview was inspired by the excellent content from Branch Education’s YouTube channel. This has greatly contributed to my understanding of GPU architecture and functionality.

For a more detailed and in-depth explanation of how GPUs work, I highly recommend checking out their channel
How Graphics cards work?

Contact: Vinish Kumar