Nikita Dmitriev

Posted on Jan 12

It's Time to Learn about Google TPUs in 2026

#google #tpu #programming #ai

Gemini, Veo, and Nano Banana are impressive, but they are just software. Let's talk about the hardware that makes them possible

Prerequisites

A computer only needs two things to function:

Processor (The Brain)
RAM (The Workbench)

When you open your cool AI IDE:

The app data moves from the SSD to RAM
You do something
The Processor fetches instructions from RAM
It executes them and writes the result back to RAM
The Video Card (GPU) reads from RAM to show it on your screen

A computer can work without a GPU (using a terminal), but it cannot work without a CPU & RAM

PU

Processing Unit — an electronic circuit that manipulates data based on instructions. Physically, it is billions of transistors organized into logical gates (AND, OR, NOT)

Key Components:
– ALU (Arithmetic Logic Unit): The calculator. It does the math (addition, multiplication)
– Control Unit: The traffic cop. It tells data where to go
– Registers/Cache: Ultra-fast internal memory (small, usually 10–200 MB) to keep data close to the ALU

The Three Types

CPU (Central Processing Unit) — The Generalist

– Architecture: Few cores, but very complex and smart
– Role: Serial processing. Great for logic, operating systems, and sequential tasks
– Motto: "I can do anything, but one thing at a time"

GPU (Graphics Processing Unit) — The Parallelist

– Architecture: Thousands of small, simple cores
– Role: Parallel processing. Designed for graphics and simple math tasks performed on massive datasets simultaneously
– Motto: "I can't run an OS, but I can solve 10,000 easy math problems at once"

ASIC (Application-Specific Integrated Circuit) — The Specialist

– Definition: A chip designed for exactly one task. It cannot run Windows or render a video game. It is "hardwired" logic
– Role: Maximum efficiency for a specific algorithm
– Motto: "I do one thing, but I do it faster and cheaper than anyone else"

History

In 2013, Jeff Dean and Jonathan Ross at Google recognized that CPUs and GPUs were structurally inefficient for the coming AI scale. A single metric made the problem clear: three minutes of daily Android Voice Search per user would force Google to double its data center capacity

While GPUs were faster than CPUs, they were still general-purpose devices carrying architectural baggage that made them energy-inefficient for AI

So, they decided to build their own custom silicon (ASIC). 15 months later, the TPU (Tensor Processing Unit) was born

TPU vs GPU

The main bottleneck in AI computing is Memory Access. Moving data is expensive and slow

The Classical Approach (GPU/CPU):

Read Number A from memory into Register
Read Number B from memory into Register
ALU. Multiply A × B
Write result back to memory

The chip spends more time moving data than doing math

The TPU Approach (Systolic Array):

Data loads from memory once
It "flows" into the first row of ALUs
The ALU performs the math and passes the result directly to its neighbor in the next cycle instead of writing intermediate results back to memory
Data moves in a rhythmic wave (like a heart systole) through the entire array

Extremely high throughput for matrix multiplication (the core of AI) with drastically lower power consumption

Impact

If you want to build an AI startup today, you usually pay NVIDIA. You can buy their chips or rent them from almost any provider

Google's model is cloud-based. You can't buy a TPU to put in your server. Instead, Google keeps them in their own data centers and rents access exclusively through this. This allows Google to control the entire stack and they don't have to pay the "NVIDIA Tax"

– In 2024, Apple released a technical paper revealing that "Apple Intelligence" was trained on TPUs, bypassing NVIDIA entirely
– Top AI models like Claude (Anthropic), Midjourney, and Character.ai rely heavily on Google because they offer better performance-per-dollar for massive Transformer models

Future

Google's success with Gemini, Nano Banana, and Veo speaks for itself. The industry has realized that general-purpose hardware is not sustainable for AGI scale, so now everyone is trying to copy Google's homework:

– Microsoft is building Maia
– Amazon is building Trainium
– Meta is building MTIA

However, it's important to understand that the TPU ecosystem does not guarantee a Google monopoly or the downfall of NVIDIA:

– Google is a competitor. For many tech giants, using Google Cloud means helping a rival perfect their own AI models
– NVIDIA is universal and too deeply entrenched in the industry infrastructure to be replaced easily

Apple

I wrote the previous part of the article right after the release of Gemini 3. Now Apple has partnered with Google for its AI, which confirms my arguments. Apple realized they could not close this gap in time, so they are letting Google handle AI operations while they focus on selling hardware

Many say this is a defeat for Apple. But I would say this is a win-win situation. Apple integrates the current best AI into its devices, and Google generates revenue. The deal is valued at $1 billion, suggesting it may be a temporary solution or a bridge while Apple continues training its own models. Notably, Apple has not ended its partnership with OpenAI, but you never know...

watch the stock

DEV Community