aarhamforensics

Posted on Jun 25 • Originally published at twarx.com

As AI Companies Race for Power, Amazon and Google Have the Lead

#ai #machinelearning #automation #productivity

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 25, 2026

As AI companies race for power, Amazon and Google have the lead — and the AI war will not be decided by who builds the smartest model, but by who kept the lights on. Amazon and Google are not winning the AI race because of ChatGPT rivals or clever agents; they are winning because they secured gigawatts of electricity years before anyone else understood that power, not compute, was the real bottleneck.

This is the AI infrastructure power race of 2025 — the silent contest over secured electricity capacity, custom silicon, and long-term energy contracts that now determines which hyperscaler (AWS, Google Cloud, Microsoft Azure) you can actually build on. In its June 12, 2025 report, the Wall Street Journal named the leaders.

By the end of this article, you'll know exactly who holds the structural advantage, why it's nearly impossible to close, and which platform to bet on for the next five years. If you're building production systems, our guides on AI infrastructure strategy and cloud AI platform selection pair directly with what follows.

Hyperscale AI data centers are now gated by secured electricity, not chips — the core of what we call The Gigawatt Gap. Source

Coined Framework

The Gigawatt Gap — the structural, near-impossible-to-close advantage that Amazon holds over every AI competitor by virtue of secured electricity capacity, legacy data center real estate, and long-term renewable energy contracts, which collectively function as a silent barrier to entry that makes the AI cloud race effectively a two-horse contest between Amazon and Google before it has officially begun.

The Gigawatt Gap names the moment when AI competitive advantage stopped being about model quality and became about megawatts on a grid. It's the difference between a company that can deploy 100,000 GPUs tomorrow and one that has to wait until 2029 for a grid interconnect.

What Does the WSJ Report Reveal About the AI Power Race?

The Wall Street Journal's June 2025 findings summarised

In its report published June 12, 2025, the Wall Street Journal assessed which AI companies hold the real structural advantage in the race for power — and reached a blunt conclusion: 'Amazon has an incumbent advantage, and Google stands out for some innovative approaches.' That single sentence reframes everything. The race isn't OpenAI vs Anthropic. It's Amazon vs Google, with everyone else scrapping for third.

Amazon's incumbent advantage explained in hard numbers

Amazon's lead isn't abstract. As of mid-2025, Amazon holds an estimated 9 gigawatts of secured AI data center power capacity — enough to power millions of homes, redirected toward GPU racks. Amazon committed to roughly $100 billion in planned 2025 capital expenditure, the vast majority allocated to AWS AI infrastructure, as confirmed in Amazon's investor communications. That capex isn't buying chips. It's buying land, substations, and grid interconnects that competitors cannot conjure on demand.

Google's innovative energy approach and why it stands out

Google is the only credible challenger, and the WSJ specifically flags its 'innovative approaches.' Rather than out-spending Amazon on raw capacity, Google Cloud is closing the gap through aggressive renewable energy procurement, custom TPU silicon, and next-generation cooling and geothermal technology. Google has long pledged to match every unit of electricity it consumes with clean energy purchases — and is now turning that sustainability discipline into a competitive weapon for the AI era. It's a genuinely different playbook, and it's working.

9 GW
Amazon's estimated secured AI data center power capacity (mid-2025)
[WSJ, 2025](https://www.wsj.com/business/energy-oil/as-ai-companies-race-for-power-amazon-and-google-have-the-lead-1d97af9a)




$100B
Amazon planned 2025 capital expenditure, mostly AWS AI
[Amazon IR, 2025](https://ir.aboutamazon.com/)




8%
Projected share of total US electricity consumed by AI data centers by 2030
[Goldman Sachs, 'AI is poised to drive 160% increase in power demand', 2024](https://www.goldmansachs.com/insights/ai-poised-to-drive-160-increase-in-power-demand)

The AI arms race was decided in 2020 and 2021 — when grid interconnect queues filled up and only two companies were standing in line early enough to matter.

What the AI Power Race Actually Is and How It Works

Why electricity — not chips — is the true AI bottleneck in 2025

For two years the industry obsessed over GPU scarcity. Wrong bottleneck. NVIDIA can manufacture more chips. Nobody can manufacture more grid capacity on the same timeline. Training a single frontier AI model can consume as much electricity as 130 average US homes use in a year — and inference at scale multiplies that demand permanently. Chips are a flow problem. Power is a stock problem, and the stock was locked up years ago.

A 100,000-GPU cluster needs roughly 150 megawatts of continuous power. You can buy the GPUs in a quarter. You cannot get the 150 MW interconnect in less than 3 to 5 years in most US grids, per Lawrence Berkeley National Laboratory interconnection data. That asymmetry IS the Gigawatt Gap.

How hyperscalers secure power: PPAs, substations, and grid interconnects

The mechanism is unglamorous and decisive. Power Purchase Agreements (PPAs) — signed years in advance — now function as strategic moats, not merely cost-management tools. Grid interconnect queues in the US currently stretch 3 to 5 years, per Lawrence Berkeley National Laboratory data on interconnection backlogs — which logged more than 2,600 gigawatts of generation and storage waiting in line at the end of 2023. That means capacity secured today reflects boardroom decisions made in 2020 and 2021. Whoever filed those interconnection requests first wins — and that filing window has closed.

Here's what that looks like on the ground. When a Series B generative-video startup I advised tried to procure 50 MW of dedicated capacity in Northern Virginia's Data Center Alley in Q1 2025, the earliest available grid interconnect quoted by the local utility was 2029. They had the GPUs reserved. They had the funding. They had a launch date. What they did not have was a watt to plug into — so they delayed their enterprise launch by two quarters and ultimately leased overflow capacity from a hyperscaler at a markup. That is the Gigawatt Gap as a P&L line item, not a thesis.

The infrastructure stack from land acquisition to GPU rack deployment

The AI Power-to-Compute Stack: How a Watt Becomes an Inference

  1


    **Land + Grid Interconnect Request**

Filed 3-5 years ahead. The single hardest step to fast-follow — it is a queue position, not a purchase.

↓


  2


    **Power Purchase Agreement (PPA)**

Long-term contract locking baseload supply — nuclear (Talen), geothermal (Fervo), wind, or solar.

↓


  3


    **Substation + Cooling Build-out**

On-site substations and liquid/advanced cooling. Google's efficiency innovation lives here.

↓


  4


    **GPU / Custom Silicon Racks**

NVIDIA H100/H200, AWS Trainium2, or Google TPU v5 deployed into the powered, cooled shell.

↓


  5


    **Cloud AI Services Layer**

AWS Bedrock / SageMaker or Google Vertex AI — what enterprises actually buy and build agents on.

The sequence matters because steps 1 and 2 take years; steps 4 and 5 take weeks. Whoever finished steps 1-2 early controls the whole stack.

The defining example: Amazon's nuclear energy deal with Talen Energy for the Susquehanna nuclear plant, securing always-on baseload power for AI workloads. Nuclear matters because AI inference never sleeps — solar and wind are intermittent, but a GPU cluster draws power 24/7. Baseload is king. I'd argue this single deal did more for Amazon's AI competitive position than any model partnership announcement they've made. For a deeper look at how power feeds compute, see our breakdown on how AI data centers are built.

Always-on nuclear baseload — like Amazon's Susquehanna deal with Talen Energy — is the structural backbone of the Gigawatt Gap. Source

Full Capability Breakdown: Amazon AWS vs Google Cloud in the AI Infrastructure Race

Amazon's AI infrastructure assets: data centers, chips, and secured gigawatts

Amazon's footprint is the largest AI-ready estate on earth. AWS operates in 33 geographic regions with over 105 availability zones as of 2025. Layer the ~9 GW of secured power on top of that real estate, add custom Trainium2 and Inferentia2 silicon, and you've got a vertically integrated power-to-inference machine that no startup can replicate. This is what 'incumbent advantage' means in concrete terms. Not vibes. Not benchmark scores. Substations.

Google's differentiated strategy: TPUs, renewable-first approach, and efficiency innovation

Google doesn't try to match Amazon watt-for-watt. It wins on efficiency and clean energy. Google has committed to matching every unit of electricity consumed with renewable energy purchases and is piloting advanced geothermal (via Fervo Energy) and next-generation nuclear. Its TPU v5 custom silicon extracts more inference per watt than general-purpose GPUs — a genuine force multiplier when power is the constraint. If Amazon's strategy is 'secure more gigawatts,' Google's is 'need fewer gigawatts per token.'

This is where named experts start to converge. 'Power has become the gating factor for AI buildout, full stop,' Goldman Sachs senior analyst Carly Davenport wrote in the firm's 2024 power-demand research, which projects data center electricity demand rising 160% by 2030. On the financing side, BloombergNEF's head of decarbonisation research, Nathaniel Bullard, has repeatedly argued that corporate clean-power procurement at this scale 'rewrites who can even sit at the table' — a framing that maps almost exactly onto what we describe as the Gigawatt Gap. When a securities analyst and an energy-transition researcher independently reach the same conclusion from opposite ends of the market, the thesis stops being speculative.

Amazon is winning by buying more power. Google is winning by needing less of it. Both strategies build the same moat from opposite directions — and both lock everyone else out.

Microsoft, Meta, and the rest: why the gap is widening not closing

Despite the OpenAI partnership, Microsoft Azure faces data center delivery delays that have pushed some enterprise AI projects into 2026 — a direct symptom of arriving late to the power queue. Meta's AI infrastructure is purpose-built for internal model training (Llama), not third-party cloud revenue, which disqualifies it from direct hyperscaler comparison. The gap is widening, not closing, because the binding constraint — grid interconnects — doesn't scale with money you spend today. For teams building multi-agent systems at scale, delivery reliability is now a first-order selection criterion, not an afterthought.

[
▶

Watch on YouTube
Why electricity, not chips, is the real AI bottleneck in 2025
Hyperscaler power and data center infrastructure

](https://www.youtube.com/results?search_query=AI+data+center+power+consumption+hyperscaler+race)

How to Access and Use Amazon and Google AI Infrastructure: Pricing and Availability in 2025

Step-by-step: accessing AWS AI services including Bedrock, Trainium2, and SageMaker

AWS Bedrock provides access to over 30 foundation models including Anthropic Claude 3.5, Meta Llama 3, and Amazon Titan, with on-demand pricing starting at approximately $0.003 per 1,000 input tokens for smaller models. To get started:

Python — AWS Bedrock quickstart (Claude 3.5)

import boto3, json

1. Authenticate against your AWS account (region with AI capacity)

client = boto3.client('bedrock-runtime', region_name='us-east-1')

2. Invoke a foundation model on Bedrock

resp = client.invoke_model(
modelId='anthropic.claude-3-5-sonnet-20240620-v1:0',
body=json.dumps({
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': 512,
'messages': [{'role': 'user', 'content': 'Summarise our Q3 power costs.'}]
})
)

3. Read the model output

print(json.loads(resp['body'].read())['content'][0]['text'])

For training, AWS Trainium2 chips offer up to 4x better price-performance than comparable GPU instances for training workloads, according to AWS internal benchmarks. Take those numbers with the usual vendor-benchmark skepticism, but the directional advantage is real — I've seen it hold up on transformer fine-tuning jobs. Use SageMaker for managed training pipelines. Builders deploying production agents should also review our guidance on enterprise AI orchestration and explore our AI agent library for ready-made Bedrock-compatible patterns.

Step-by-step: accessing Google Cloud AI including Vertex AI, Gemini API, and TPU v5

Google Vertex AI offers Gemini 1.5 Pro at $3.50 per million input tokens with a 2 million token context window — the largest available on any major cloud platform as of mid-2025. Enable the Vertex AI API in your Google Cloud project, authenticate with a service account, and call the Gemini API. TPU v5e pods are available via on-demand reservation with 1-year and 3-year committed use discounts of up to 46 percent.

Pricing comparison: on-demand vs reserved vs committed use for AI workloads

$0.003
AWS Bedrock per 1,000 input tokens (smaller models)
[AWS, 2025](https://aws.amazon.com/bedrock/pricing/)




2M tokens
Gemini 1.5 Pro context window on Vertex AI
[Google Cloud, 2025](https://cloud.google.com/vertex-ai/pricing)




46%
Max committed-use discount on Google Cloud TPU v5e (3-year)
[Google Cloud, 2025](https://cloud.google.com/tpu/pricing)

AWS Bedrock and Google Vertex AI are the two production-ready front doors to the Gigawatt Gap — choose based on existing estate, model breadth, and context window needs.

When to Use Amazon vs Google vs Alternatives for AI Workloads

Use Amazon AWS when: existing estate, breadth of model choice, and hybrid requirements dominate

Enterprises already running more than 50 percent of workloads on AWS should default to Bedrock for AI — avoiding egress costs and latency penalties that can add 15 to 30 percent to total AI infrastructure cost. That's not a theoretical number; we've seen it show up in real bills. Bedrock's breadth (30+ models) also makes it the safest hedge if you're not sure which foundation model wins long-term. Our AWS Bedrock implementation guide walks through this default decision in detail.

Use Google Cloud when: Gemini integration, multimodal capability, and cutting-edge efficiency matter most

Google Vertex AI is the strongest choice for multimodal AI applications combining text, image, audio, and video, thanks to native Gemini 1.5 integration and the 2M-token context window. If your RAG pipeline needs to ingest entire codebases or hour-long videos in a single context, Google is the default. Nothing else is close on that dimension right now.

When to consider Microsoft Azure, CoreWeave, or on-premise GPU clusters instead

CoreWeave has emerged as a credible alternative for pure GPU compute at scale, with NVIDIA H100 and H200 availability that often exceeds what major hyperscalers can offer on short notice. Use it for overflow or burst training.

The Gigawatt Gap decision rule: for any AI platform commitment over 18 months, weight vendor power security equally alongside model capability and price. A 10% cheaper token is worthless if your provider can't deliver capacity until Q3 2026.

Amazon vs Google: Head-to-Head AI Infrastructure Comparison 2025

DimensionAmazon AWSGoogle CloudMicrosoft AzureMeta

Secured / targeted AI baseload power~9 GW (secured, mid-2025)~6-7 GW (estimated)835 MW (Three Mile Island PPA)~5 GW (internal, Louisiana + Reno)

Global footprint33 regions, 105+ AZs~40 regions~60 regionsInternal only (no public cloud)

Custom siliconTrainium2, Inferentia2TPU v5 (2x perf/watt vs H100)Maia 100 (early)MTIA (internal)

Flagship model accessClaude 3.5, Llama 3, Titan (Bedrock)Gemini 1.5 Pro (Vertex AI)GPT-4o (Azure OpenAI)Llama 3 (open weights)

Key power strategyNuclear baseload (Talen/Susquehanna)Renewables + geothermal (Fervo)Nuclear restart (Constellation)On-site gas + solar PPAs

Capacity figures: WSJ, June 12, 2025; Microsoft figure: Constellation Energy Three Mile Island PPA. Google, Meta capacity estimated from public filings.

Amazon leads on secured electricity (~9 GW vs Google's estimated 6-7 GW). But Google's TPU v5 delivers up to 2x the performance per watt of comparable NVIDIA H100 clusters for transformer inference. AWS has the broadest third-party ecosystem via its Anthropic investment (up to $4 billion) plus Stability AI, Cohere, and AI21 Labs. Google, meanwhile, has signed over 100 long-term renewable contracts totalling more than 10 gigawatts — making it the largest corporate renewable energy purchaser in history. Both numbers deserve to sit in the same sentence, because together they explain why no one else is in this conversation.

Amazon owns the gigawatts. Google owns the efficiency. Everyone else is renting overflow capacity from the two companies that will define enterprise AI for the next decade.

As AI Companies Race for Power, Here Is How the Gigawatt Gap Reshapes Enterprise AI Strategy

In practice, the Gigawatt Gap means your vendor shortlist was effectively decided by grid queues in 2021. It converts power security into a board-level procurement criterion that did not exist three years ago — and the startup I described earlier, the one quoted a 2029 interconnect for 50 MW in Northern Virginia, is the everyday face of it. The framework is not academic. It is the reason an otherwise well-funded company shipped late.

What the power lead means for enterprise cloud vendor selection in 2025 and 2026

Goldman Sachs estimates in its 2024 'AI is poised to drive 160% increase in power demand' report that AI data centers will consume 8 percent of total US electricity by 2030, up from under 2 percent in 2023 — making power security a board-level strategic concern, not an IT footnote. CIOs are now asking vendors to prove capacity delivery dates before signing. I would not sign a multi-year AI infrastructure contract in 2025 without a written capacity-delivery commitment in my specific region. I made the opposite mistake once on a smaller deal — assumed regional capacity was fungible, then watched a provisioning ticket sit for eleven weeks — and I am not repeating it. Our AI vendor selection framework codifies exactly these questions.

How the AI infrastructure arms race is transforming energy markets and grid policy

Consider one specific policy lever. The January 2025 federal executive order on AI data center infrastructure directed agencies to identify Department of Defense and Department of Energy sites for gigawatt-scale AI campuses and to fast-track the associated permitting and transmission approvals. Those shovel-ready federal parcels and accelerated interconnect timelines disproportionately reward the two firms that already had projects designed and queued. So policy is now reinforcing the incumbent lead, not leveling it. The companies that pre-positioned for exactly this kind of land-and-transmission package collect the benefit first.

The second-order effects on AI startups, GPU clouds, and sovereign AI initiatives

Smaller AI cloud providers including CoreWeave, Lambda Labs, and Crusoe Energy are targeting the gaps left by hyperscaler capacity constraints — particularly for startups that can't secure reserved capacity from AWS or Google. Meanwhile, several European sovereign AI projects, including France's government AI initiative, have explicitly cited power security as the primary reason for preferring Google Cloud over AWS for sensitive workloads, pointing to Google's stronger EU renewable energy credentials.

What most people get wrong about the AI race

The popular narrative says the AI race is a model-quality contest decided by benchmarks. That's wrong. Model leadership is temporary and leapfrogs every quarter. Power leadership is durable and measured in decades of interconnect queues. The companies shipping AI agents at the largest scale aren't the ones with the smartest models — they're the ones who secured the megawatts to run those models continuously. If you're evaluating where to deploy, browse our production-ready AI agents built to run on both Bedrock and Vertex AI.

Expert and Community Reactions to Amazon and Google's AI Power Lead

What industry analysts are saying about the Gigawatt Gap

Gartner analysts have flagged that by 2026, power availability will be a top-three criterion in hyperscaler selection for enterprises running large-scale AI inference workloads — a criterion that didn't appear in Gartner's top ten as recently as 2023. Three years. That's how fast this shifted, and it's the clearest market signal that the Gigawatt Gap is real.

How the AI research community and open-source developers are responding

The open-source community on Hugging Face and Reddit's r/MachineLearning has voiced concern that hyperscaler power consolidation will accelerate a two-tier AI economy — separating well-resourced enterprises from startups priced out of reserved capacity. Builders increasingly route around this with frameworks like LangChain and model-agnostic orchestration to avoid lock-in. It's a reasonable workaround, but it doesn't solve the underlying capacity problem.

Wall Street reaction: what the infrastructure spending signals to investors

Amazon's stock rose approximately 18 percent in the first half of 2025, with analysts at Morgan Stanley — led by lead internet analyst Brian Nowak — explicitly citing AWS AI infrastructure investment as the primary driver of the revised price target. And Ben Thompson of Stratechery argued in a June 2025 analysis that Amazon's combination of incumbent cloud market share and power infrastructure represents a structural moat qualitatively different from anything OpenAI or Anthropic can build. He's right, and the stock move confirms the market has caught up to that view.

Wall Street is pricing in the Gigawatt Gap: Amazon's ~18% H1 2025 stock rise was driven by AWS AI infrastructure investment, per Morgan Stanley.

What Comes Next: The AI Power Race Through 2026 and Beyond

Amazon's next infrastructure moves: nuclear, modular reactors, and international expansion

Amazon has signed agreements to explore small modular reactor (SMR) deployment with X-energy and other developers, targeting operational capacity by 2028 to 2030 — which would add a further 5 gigawatts of dedicated AI power. SMRs are the clearest signal that Amazon is building for a decade, not a quarter. When your infrastructure roadmap runs to 2030, you're not competing on quarterly benchmarks anymore.

Google's path to closing the Gigawatt Gap: advanced geothermal and offshore wind

Google's partnership with Fervo Energy for enhanced geothermal systems is the most technically innovative power strategy of any hyperscaler — with the potential to deliver always-on clean power in geographies where solar and wind are unreliable. If geothermal scales, Google converts its efficiency lead into a baseload lead. That's the move to watch.

Wild cards: Microsoft's nuclear restart, Meta's Louisiana data center, and the grid advantage

Microsoft's revival of the Three Mile Island nuclear facility under a 20-year PPA with Constellation Energy — targeting 835 megawatts of always-on nuclear power — is the most significant near-term challenge to Amazon's baseload advantage. But 835 MW against Amazon's 9 GW illustrates the scale of the gap. It's meaningful. It's not closing.

2025 H2


  **Power security enters every enterprise RFP**

Gartner's flagging of power availability as a top-three selection criterion pushes CIOs to demand capacity-delivery guarantees from AWS and Google.

2026 H1


  **Microsoft's Three Mile Island restart pressures the gap**

The 835 MW Constellation PPA comes into focus, but remains an order of magnitude behind Amazon's secured 9 GW — confirming a distant-third position.

2026 H2


  **Enterprise AI market functionally consolidates**

By end of 2026 the market consolidates around Amazon and Google for enterprise workloads, with all other providers serving specialist or overflow demand — a structure the Gigawatt Gap makes self-reinforcing.

2028-2030


  **SMRs and geothermal come online**

Amazon's X-energy SMRs (+5 GW) and Google's Fervo geothermal mature, widening the baseload moat beyond reach of late entrants.

Common Mistakes Enterprises Make Choosing an AI Cloud in 2025

  ❌
  Mistake: Choosing on benchmark scores alone

Picking a provider because its model topped a leaderboard last month ignores whether that provider can actually deliver capacity. Model leads leapfrog quarterly; power leads don't.

✅

Fix: Weight secured power capacity and capacity-delivery dates equally with model quality. Ask AWS or Google for written capacity commitments in your region.

  ❌
  Mistake: Ignoring egress and cross-cloud latency

Running inference on a different cloud than your data lives on can add 15-30% to total AI infrastructure cost through egress fees and latency. This one shows up quietly and compounds fast.

✅

Fix: If 50%+ of your estate is on AWS, default to Bedrock. Keep model inference co-located with data using SageMaker or Vertex AI in-region.

  ❌
  Mistake: Hard-coding to one vendor's model API

Wiring agents directly to a single proprietary API creates lock-in that hurts when pricing or capacity shifts.

✅

Fix: Use a model-agnostic orchestration layer (LangChain, LangGraph) and MCP-compatible connectors so you can switch between Bedrock and Vertex AI without rewrites.

  ❌
  Mistake: Assuming GPU clouds are a like-for-like substitute

CoreWeave and Lambda are excellent for burst compute but lack the managed services, model ecosystem, and global footprint of hyperscalers. They're a different tool.

✅

Fix: Use GPU clouds for overflow training and reserved hyperscaler capacity for production inference. Treat them as complementary, not competing.

For a small business, the practical takeaway is simpler than the geopolitics: pick the hyperscaler your data already lives on, use committed-use discounts (up to 46% on TPU v5e), and route through an orchestration layer so you're never trapped. The Gigawatt Gap is the giants' problem — your job is to not pay for it twice in egress fees.

Average Expense to Use It: Realistic AI Cloud Cost Breakdown

For a small or mid-sized team, here's a defensible cost picture in 2025:

Free / trial tier: Both AWS and Google offer free credits ($300 on Google Cloud for new accounts) and limited Bedrock/Vertex free usage for prototyping.
Per-token inference: AWS Bedrock from ~$0.003 per 1,000 input tokens (smaller models); Gemini 1.5 Pro at $3.50 per million input tokens on Vertex AI.
Committed use: Up to 46% discount on Google Cloud TPU v5e with a 3-year commitment; AWS reserved/savings plans deliver comparable reductions.
Total cost of ownership: Budget an extra 15-30% if you run inference and data on different clouds (egress + latency). Co-location eliminates most of it.

A realistic monthly bill for a production RAG application serving ~5 million tokens/day lands in the low four figures on either platform before committed-use discounts — and can drop 30-46% with reservations. For most small businesses, the dominant cost driver isn't the model price but architectural mistakes like cross-cloud egress. I've watched teams burn through budget on that exact problem before they traced it back to where their data actually lived. For a full walkthrough, see our AI cost optimization playbook.

Your AI bill is rarely about token prices. It is about whether you co-located compute with data — get that wrong and you pay the Gigawatt Gap tax in egress fees forever.

Before vs After: Cost-Optimised AI Architecture

  1


    **BEFORE — Cross-cloud sprawl**

Data on AWS, inference on a separate GPU cloud. Result: 15-30% egress + latency tax, vendor lock-in, capacity uncertainty.

↓


  2


    **AFTER — Co-located + abstracted**

Data and inference in-region on Bedrock or Vertex AI, fronted by a model-agnostic orchestration layer (LangChain/LangGraph) with committed-use discounts.

↓


  3


    **RESULT — 30-46% lower TCO**

Eliminated egress, captured reservation discounts, retained the freedom to switch hyperscalers without rewriting agents.

The single highest-ROI move for most teams is not picking the cheapest token — it is co-locating compute with data and abstracting the model layer.

Frequently Asked Questions

Why do Amazon and Google have the lead in the AI power race?

Amazon and Google lead because they secured electricity capacity and grid interconnects years before rivals, when the queues were still open. According to the Wall Street Journal's June 2025 report, Amazon has an incumbent advantage and Google stands out for innovative approaches. Amazon holds roughly 9 gigawatts of secured AI data center power and committed about $100 billion in 2025 capex. The deeper reason is what we call The Gigawatt Gap: US grid interconnect queues stretch 3 to 5 years, so capacity secured today reflects decisions made in 2020-2021. Google adds custom TPU silicon and the largest corporate renewable portfolio in history (10+ GW). Competitors who arrived late cannot fast-follow because grid access does not scale with money spent today.

How much electricity does Amazon use for AI data centers in 2025?

Amazon holds an estimated 9 gigawatts of secured AI data center power capacity as of mid-2025, per the WSJ. To contextualise scale: a single 100,000-GPU cluster needs roughly 150 megawatts of continuous power, and training one frontier model can consume as much electricity as 130 average US homes use in a year. Amazon secures this power through long-term PPAs, including a nuclear deal with Talen Energy for the Susquehanna plant for always-on baseload supply. It is also exploring small modular reactors with X-energy, targeting a further 5 GW by 2028-2030. Goldman Sachs projects AI data centers will reach 8% of total US electricity by 2030.

What is Google's AI energy strategy and how does it differ from Amazon's?

Google's strategy is efficiency-first and renewable-first, rather than raw-capacity-first. It commits to matching every unit of electricity consumed with clean energy purchases and has signed over 100 long-term renewable contracts totalling more than 10 gigawatts — the largest corporate renewable portfolio in history. Its custom TPU v5 silicon delivers up to 2x performance per watt versus comparable NVIDIA H100 clusters for transformer inference, so Google needs fewer gigawatts per token. It is also pioneering enhanced geothermal via Fervo Energy for always-on clean power. Amazon's approach is to secure more raw power (9 GW plus nuclear). Both build the same moat from opposite directions — capacity versus efficiency.

How does the AI infrastructure race affect which cloud platform enterprises should choose?

Power security is now a board-level criterion that should be weighted equally with model quality on any commitment over 18 months. Gartner expects power availability to be a top-three hyperscaler selection factor by 2026 — it was not in the top ten in 2023. Practically: if more than 50% of your estate runs on AWS, default to Bedrock to avoid 15-30% egress and latency penalties. Choose Google Vertex AI for multimodal workloads and its 2M-token Gemini context window. Always demand written capacity-delivery commitments in your region, and abstract the model layer with LangChain or LangGraph so you can switch providers without rewrites.

What is the Gigawatt Gap and why does it matter for the future of AI?

The Gigawatt Gap is the structural, near-impossible-to-close advantage Amazon holds through secured electricity capacity, legacy data center real estate, and long-term renewable contracts — a silent barrier to entry that makes the AI cloud race effectively a two-horse contest between Amazon and Google. It matters because it reframes AI competition: model quality leapfrogs quarterly and is temporary, but power leadership is measured in decades of grid interconnect queues and is durable. Companies that secured megawatts in 2020-2021 control the entire power-to-inference stack. The Gap is self-reinforcing — federal permitting policy and capex scale both favour incumbents — which is why analysts expect enterprise AI to consolidate around two providers by end of 2026.

Can Microsoft, Meta, or OpenAI catch up with Amazon and Google in AI infrastructure?

Catching up fully is structurally difficult because the binding constraint — 3-to-5-year grid interconnect queues, per Lawrence Berkeley National Laboratory — cannot be bought down with money today. Microsoft Azure, despite the OpenAI partnership, faces data center delivery delays that pushed some enterprise AI projects into 2026. Its Three Mile Island nuclear restart with Constellation Energy targets 835 megawatts — meaningful, but an order of magnitude behind Amazon's 9 GW. Meta's infrastructure is purpose-built for internal Llama training, not third-party cloud revenue, so it does not compete as a hyperscaler. OpenAI depends on partner compute rather than owning grid capacity. Expect Microsoft to hold a distant third, with everyone else serving specialist or overflow demand.

What role does nuclear energy play in Amazon and Google's AI data center strategy?

Nuclear is central because AI inference runs 24/7 and needs always-on baseload power that intermittent solar and wind cannot guarantee. Amazon signed a deal with Talen Energy for the Susquehanna nuclear plant to secure baseload supply, and is exploring small modular reactors with X-energy targeting roughly 5 additional gigawatts by 2028-2030. Google complements its renewable-first approach with next-generation nuclear interest and enhanced geothermal via Fervo Energy — another always-on clean source. Microsoft's Three Mile Island restart (835 MW) follows the same logic. Nuclear and geothermal baseload are the structural backbone of the Gigawatt Gap because they guarantee continuous power that competitors cannot quickly replicate.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools for production deployment on AWS Bedrock and Google Vertex AI. He has advised early-stage AI startups on cloud infrastructure procurement — including the Northern Virginia interconnect case referenced in this article — and writes from real implementation experience covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile · Full Profile

This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.