CPUs, GPUs, TPUs, DPUs, why?

#cpu #gpu #deeplearning #computerscience

For me (I mean at that time), things were as simple as this: There would be only two types of Processing Units: GPU and CPU. CPUs are the standard and GPUs are exclusively used for everything graphics related (like video games).

But then I slid into the Machine learning field and started hearing my pairs talking about TPU and sometimes about a mysterious DPU. Sometimes they are even struggling deciding which one to use to train their Machine learning models.

Which ones are those two again? And how are they different from the other ones.
I had to find answers to those questions. So I did some googling on the matter and found out MY WHOLE LIFE WAS A LIE.

CPUs, GPUs, TPUs, DPUs, none of those is technically better than the other. But simply each of them is built or optimized for a particular purpose which it excels at, compared to the rest.

So what is the purpose of having a CPU?

First you have to understand that Processing Units have what we call cores and each of those cores can perform only one task at the same time. So the more a PU has cores, the more tasks it is able to perform in parallel (i.e at the same time).

From far, CPUs have always been the best suited for the most logic heavy computer operations you can think of. That's because the high performance cores of a CPU are optimized for sequential (step by step) processing and low-latency branching.
So CPUs are the best option when it comes to complex operations like Database Query Execution, Operating System Management, Compiling Code, Encryption and Decryption, Data Serialization/Deserialization, File System Operations, Real-Time System Operations, Error Detection and Correction, Simulation of Rule-Based Systems, etc., or any task requiring extensive if-else or switch-case branching logic, which must be executed step-by-step.

The annoying thing about sequential processing though is that it can take time, so much time… especially for massive tasks like graphics rendering.
Imagine having to rely on a few CPU’s cores for a high graphics video game like Cyberpunk 2077. It would lag so much that you’d never want to play a video game in your life, lol.
Around millions of CPUs cores would be needed to render the graphics of such a game in real time avoiding any lagging and achieving an ideal User Experience.

“Why don't we add more cores to the CPU then?” You may ask.
Well, like we (implicitly) stated earlier, the cores of a CPU are very large and resource-intensive having quite a complex architecture and making the amount that can fit on a chip very limited (typically 4–32). Scaling such cores would increase power consumption and heat generation, leading to diminishing returns.

But there is good news! Many computer operations like graphics rendering involve tasks that are highly repetitive and can be broken down into many smaller, independent operations that do not require a lot of branching or conditional logic.

If we cannot scale the number of the large cores, How about building tinier cores of simpler architecture so that we could gather thousands of them to complete repetitive tasks in parallel? That would solve many problems like the graphics rendering one. Yes, and we would call that the Graphic Processing Unit.

The need of a Graphic Processing Unit

Enter the Graphics Processing Unit (GPU), a chip built for parallel computing. While CPUs may have up to 24 cores, modern GPUs like NVIDIA's RTX 4080 boast nearly 10,000 cores. Each core can perform a computation simultaneously.
And while they were originally built to solve the graphics rendering problem (calculating lights, shadows, and textures in real-time), the massive amounts of cores of GPUs and their parallelism make them an option for other tasks like
AI and deep learning (performing massive matrix multiplications for training models)

Then Why the TPU?

The Tensor Processing Unit (TPU) is a specialized chip designed for deep learning. Created by Google in 2016, TPUs focus on tensor operations (like matrix multiplication) required for training AI models. Unlike GPUs, TPUs eliminate the need to access shared memory or registers, making them far more efficient for neural network training.

If you’re working with massive datasets or training large models, TPUs can save you significant time and resources.

DPUs: The Data Center Workhorse

The Data Processing Unit (DPU) is the newest member of the computing family, optimized for data-intensive tasks in cloud environments. These chips handle:

Networking: Packet processing, routing, and security.
Data storage: Compression and encryption.

DPUs offload these tasks from CPUs, allowing the CPU to focus on general-purpose computing. While you won’t find a DPU in your laptop, they are essential for modern data centers where efficiency is paramount.

That's it!

If you didn't, now you know the difference between those proccessing units.CPUs remain the backbone of general-purpose computing, excelling in logic-heavy, sequential tasks. GPUs, with their massively parallel architecture, dominate in rendering, machine learning, and any workload requiring high-throughput computations. TPUs are purpose-built for deep learning, offering unmatched efficiency in training and inference for neural networks. DPUs, on the other hand, cater to data-centric operations, offloading tasks like networking and storage from CPUs to boost efficiency in large-scale data centers.

Next article,

on this subject will be for machine learning engineers, where we will discuss how to choose the right processing unit for your AI workload. We'll explore the roles of GPUs, TPUs, and CPUs in training and inference, compare their performance, cost, and scalability, and dive into practical tips for optimizing your machine learning pipelines for maximum efficiency.
Keep in touch!