DEV Community

Cover image for GPU Compute - Parallelism in Action
Ben Lyon
Ben Lyon

Posted on • Edited on

GPU Compute - Parallelism in Action

What is GPU computing?

Today, we want speed. We want things done quickly, and we want them done well. In terms of computing, you're probably thinking that the latest AMD or Intel CPU is just the ticket to that fast lane. Not so fast there (ha, get it?), because you're definitely right in thinking that a powerful, multi-core CPU is your answer for a faster experience.

A major component in the speed benefits we enjoy today come in the form of parallelism. Parallelism is the method by which several processor cores are assigned tasks, where each task is broken into several similar sub-tasks that can be processed independently. The final result is cobbled back together once each core has done it's work. The work can be divided by multiple methods, depending on the data being processed.

Image description
GPU vs CPU architecture

Parallel Compute Methods

There are four types of parallel compute processes.

Bit level computation is where a fixed-data sized piece of data is handled by the instruction set of the CPU. Take for example, the 16-bit CPU, the Intel 8086. For a 32-bit sized chunk of data to be processed, the CPU must operate over the first 16 bits of data, and then operate over the second 16 bits of data. Then, it needs to offload that data to where it needs to go. So that's three operations for a single piece of data. This level of parallelism can be managed by having either two 8086's, or by a 32-bit processor, which can then operate over the entire piece of data in a single go.

Instruction level computation is the simultaneous execution of a sequence of instructions. For example, consider the following equations.

1 2 3
e = a + b f = c + d m = e * f

Set 3 is dependent on the resolution of set 1 and 2. 1 and 2 are completely independent of each other, and can be run simultaneously. If each single line is run at one unit of time, we can run 1 + 2 in one unit of time, and
line 3 in one unit of time, giving us a instruction level parallelism of 3 instructions to 2 cycles.

Data level parallelism is the use of multiple processors across a single set of data. If we have an array of 'n' many indexes, the time complexity involved for a single processor core to run this is O(n) (linear), where if the data is divided evenly across multiple cores, then our time complexity reduces for each additional core. Still linear, but less bad.

Finally, task level parallelism is where the code is run across multiple processors, but unlike data level parallelism, it distributes different tasks across multiple processors, over the same set of data. This is similar to data level parallelism, except that data level takes the same task on different components of the same data. This level is closely related to pipelining, in which a single set of data is run through a series of separate tasks, where each task can execute independently of each other.

Necessity of Speed

So let's round this back to GPU's. Why would I want a GPU to manage my tasks? Well, what if you could offload a good lot of the processing of a task to another CPU? Another CPU with say, a magnitude more cores than your current CPU?

Image description
Screenshot of a Blender Render, a program that benefits highly from GPU rendering.

This is where GPU compute processing comes in. While I'll say this now to get it out of the way - your GPU is not going to replace your CPU (for now). When your CPU takes in a program, it will pass along data that can be better handled by the numerous core clusters of the GPU. For example - video games have two main components when it comes to what the user sees and interacts with - the display of the visual environment of the game world, and the interaction of the user with that game world. For a CPU to process a game, it will assign the task of image creation to the GPU, while the CPU continues working on the scripting/AI elements of the game. GPU's are very good at crunching blocks of data in a fast sequence, especially when that data is present with the intent of doing a single thing. This advantage helps in multiple fields - notably in games, but also in professional 3d rendering applications, large-scale database management, scientific calculations, and medical imaging.

Sources:
Exploring GPU Architecture
Fermi Microarchitecture
General Purpose Graphic Processing Units

Top comments (0)