DEV Community

Manish Aradwad
Manish Aradwad

Posted on

Learn HPC with me: CPU vs GPU

A Ryzen CPU

20 Nov 2024

So I wasn’t able to provide much time for learning HPC in the last few(lots of) days. The reason is mostly my laziness but also I am in the phase of preparing and giving interviews. Anyways, I finally convinced myself to start the learning once again wherein I started reading the 4th chapter from beginning although I had stopped the reading after some pages of 4th chapter.

The first sentence of the chapter:

In Chapter 1, Introduction, we saw that CPUs are designed to minimize the latency of instruction execution and that GPUs are designed to maximize the throughput of executing instructions.

made me think deeper about the difference between CPU and GPU.

It’s a general consensus that GPUs are better than CPUs, although this is true only in the sense of speed of execution of a problem statement whose solution can be coded as a parallel program. But does this mean any parallel program in general can be coded as a sequential program?

When I asked Claude:

Consider a problem statement whose solution can be a implemented as a parallel solution. For ex. conversion of an image from RGB to grayscale can be one such problem. \
My question is “In general, can any parallel solution be written as a sequential solution?” In other words, “Are there problems whose solution is strictly parallel in nature and cannot be solved using sequential instructions?” \
Please let me know the correct answer with the proper logic.

it gave me the ultimate answer as:

Church-Turing Thesis states that any computational problem solvable by a parallel algorithm is also solvable by a sequential algorithm. No computational problem exists whose solution is strictly parallel and cannot be sequentially implemented. The differences lie in efficiency, not fundamental solvability.

which was convincing enough for me. So, after understanding this, coming back to the original argument, “are CPUs worse than GPUs?”. The answer is obviously no because there are lots of things which CPUs can do better than GPUs. So, both CPUs and GPUs are equally important in a good gaming rig.

CPU design is to minimize the latency of single instruction while GPU design is to maximize the throughput of instruction execution. The way both the above goals affect the design philosophy of both types of processors:

  1. Control Unit Design
    — CPU: Complex control unit with sophisticated branch prediction and speculation
    — GPU: Simple control units replicated many times, focusing on parallel execution

  2. Cache and Memory
    — CPU: Large caches to reduce memory latency for individual operations
    — GPU: Smaller caches but higher memory bandwidth for parallel data access

  3. Execution Units
    — CPU: Few but complex ALUs optimized for diverse operations
    — GPU: Many simple ALUs designed for parallel floating-point operations

  4. Pipeline Design
    — CPU: Deep pipelines with out-of-order execution to minimize stalls
    — GPU: Simpler pipelines with in-order execution, compensating with thread-level parallelism

  5. Thread Management
    — CPU: Optimized for few high-performance threads
    — GPU: Massive thread parallelism with hardware thread scheduler

  6. Instruction Handling
    — CPU: Complex instruction decoder, branch prediction, speculative execution
    — GPU: SIMD (Single Instruction Multiple Data) architecture for parallel execution

So, after understanding this, I was thinking if it’s possible to create an entity which improves the performance of this whole big system. This system is a big system of Computation wherein:

We start from the silicon to make the chips which combine with the Von-neumann architecture to create the processors on which the problem statement solution is ran as a program written using various programming paradigms.

I wanted to know how the different chips-for-AI hardware startups as well as companies like Cerebras, grok, apple, intel, graphcore, etc. are making changes at different stages of this system to make things faster. Even programming language like Mojo target another stage of the system.

Hope I find enough time in the future to understand how these things work, but for now, I think this much wandering is enough.


Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay