DEV Community

Cover image for It's Time to Learn about Google TPUs in 2026
Nikita Dmitriev
Nikita Dmitriev

Posted on

It's Time to Learn about Google TPUs in 2026

Gemini, Veo, and Nano Banana are impressive, but they are just software. Let's talk about the hardware that makes them possible

Prerequisites

A computer only needs two things to function:

  1. Processor (The Brain)
  2. RAM (The Workbench)

When you open your cool AI IDE:

  1. The app data moves from the SSD to RAM
  2. You do something
  3. The Processor fetches instructions from RAM
  4. It executes them and writes the result back to RAM
  5. The Video Card (GPU) reads from RAM to show it on your screen

A computer can work without a GPU (using a terminal), but it cannot work without a CPU & RAM

CPU and RAM

PU

Processing Unit — an electronic circuit that manipulates data based on instructions. Physically, it is billions of transistors organized into logical gates (AND, OR, NOT)

Key Components:
– ALU (Arithmetic Logic Unit): The calculator. It does the math (addition, multiplication)
– Control Unit: The traffic cop. It tells data where to go
– Registers/Cache: Ultra-fast internal memory (small, usually 10–200 MB) to keep data close to the ALU

Processing Unit inside

The Three Types

  1. CPU (Central Processing Unit) — The Generalist

Architecture: Few cores, but very complex and smart
Role: Serial processing. Great for logic, operating systems, and sequential tasks
Motto: "I can do anything, but one thing at a time"

  1. GPU (Graphics Processing Unit) — The Parallelist

Architecture: Thousands of small, simple cores
Role: Parallel processing. Designed for graphics and simple math tasks performed on massive datasets simultaneously
Motto: "I can't run an OS, but I can solve 10,000 easy math problems at once"

  1. ASIC (Application-Specific Integrated Circuit) — The Specialist

Definition: A chip designed for exactly one task. It cannot run Windows or render a video game. It is "hardwired" logic
Role: Maximum efficiency for a specific algorithm
Motto: "I do one thing, but I do it faster and cheaper than anyone else"

CPU vs GPU vs TPU

History

In 2013, Jeff Dean and Jonathan Ross at Google recognized that CPUs and GPUs were structurally inefficient for the coming AI scale. A single metric made the problem clear: three minutes of daily Android Voice Search per user would force Google to double its data center capacity

While GPUs were faster than CPUs, they were still general-purpose devices carrying architectural baggage that made them energy-inefficient for AI

So, they decided to build their own custom silicon (ASIC). 15 months later, the TPU (Tensor Processing Unit) was born

history timeline of TPU

TPU vs GPU

The main bottleneck in AI computing is Memory Access. Moving data is expensive and slow

The Classical Approach (GPU/CPU):

  1. Read Number A from memory into Register
  2. Read Number B from memory into Register
  3. ALU. Multiply A × B
  4. Write result back to memory

The chip spends more time moving data than doing math

The TPU Approach (Systolic Array):

  1. Data loads from memory once
  2. It "flows" into the first row of ALUs
  3. The ALU performs the math and passes the result directly to its neighbor in the next cycle instead of writing intermediate results back to memory
  4. Data moves in a rhythmic wave (like a heart systole) through the entire array

Extremely high throughput for matrix multiplication (the core of AI) with drastically lower power consumption

CPU vs GPU vs TPU deeply

Impact

If you want to build an AI startup today, you usually pay NVIDIA. You can buy their chips or rent them from almost any provider

Google's model is cloud-based. You can't buy a TPU to put in your server. Instead, Google keeps them in their own data centers and rents access exclusively through this. This allows Google to control the entire stack and they don't have to pay the "NVIDIA Tax"

– In 2024, Apple released a technical paper revealing that "Apple Intelligence" was trained on TPUs, bypassing NVIDIA entirely
– Top AI models like Claude (Anthropic), Midjourney, and Character.ai rely heavily on Google because they offer better performance-per-dollar for massive Transformer models

Future

Google's success with Gemini, Nano Banana, and Veo speaks for itself. The industry has realized that general-purpose hardware is not sustainable for AGI scale, so now everyone is trying to copy Google's homework:

– Microsoft is building Maia
– Amazon is building Trainium
– Meta is building MTIA

However, it's important to understand that the TPU ecosystem does not guarantee a Google monopoly or the downfall of NVIDIA:

Google is a competitor. For many tech giants, using Google Cloud means helping a rival perfect their own AI models
NVIDIA is universal and too deeply entrenched in the industry infrastructure to be replaced easily

Apple

I wrote the previous part of the article right after the release of Gemini 3. Now Apple has partnered with Google for its AI, which confirms my arguments. Apple realized they could not close this gap in time, so they are letting Google handle AI operations while they focus on selling hardware

Many say this is a defeat for Apple. But I would say this is a win-win situation. Apple integrates the current best AI into its devices, and Google generates revenue. The deal is valued at $1 billion, suggesting it may be a temporary solution or a bridge while Apple continues training its own models. Notably, Apple has not ended its partnership with OpenAI, but you never know...

apple and google partnership

watch the stock

Top comments (0)