We Built a Data Gravity Calculator for AI Infrastructure Placement — Here's the Methodology

#llm #infrastructure #machinelearning #devops

Most AI infrastructure decisions get made on hourly GPU rates. That's the wrong input variable.

Where your data lives determines what your AI costs. A 50TB dataset sitting in S3 doesn't move to CoreWeave for free — and the cost of moving it can exceed the compute savings before you've run a single training job.

We built the AI Gravity & Placement Engine to make that friction calculable before the architecture is committed.

What It Does

The engine calculates Token TCO for running Llama 3 70B at BF16 precision across six infrastructure tiers:

AWS (p5.48xlarge — 8x H100)
GCP (A3-High — 8x H100)
CoreWeave HGX (bare-metal InfiniBand)
Lambda H100
Nutanix AHV (H100, 36-mo CapEx amortized)
Cisco UCS M7 (H100, 36-mo CapEx amortized)

All providers are normalized to cost-per-GPU-hour at the 8-GPU BF16 configuration. On-prem providers use 36-month CapEx amortization plus a configurable OpEx Adder (default 20%) for power, cooling, and maintenance.

Why BF16 — Not INT4

BF16 requires approximately 145GB of VRAM just for Llama 3 70B model weights. That forces a multi-GPU configuration on every provider and reveals which platforms have the high-speed interconnects (InfiniBand or NVLink equivalent) needed to bridge those GPUs without introducing latency penalties.

INT4 quantization fits on a single 48GB GPU. BF16 tells you what the architecture actually costs at production fidelity — and which providers can handle it without fabric limitations.

The Data Gravity Score

This is the differentiator. The Gravity Score (G) measures egress cost as a fraction of monthly compute cost:

G = (Dataset Size in GB × Egress Rate) ÷ Monthly Compute Cost

G > 0.5: Egress exceeds 50% of compute cost. The data is too heavy to move economically. Verdict: Stay Put or Full Repatriation.
G < 0.1: Data is effectively weightless. Cheapest compute wins. Verdict: Hybrid Burst.
Between 0.1 and 0.5: The architectural decision space — where provider selection actually matters.

At 50TB with AWS egress at $0.09/GB, the Gravity Score against AWS compute lands around 19.6%. GCP's higher egress rate ($0.12/GB) pushes its score to 34.2% on the same dataset. CoreWeave's near-zero egress ($0.01/GB) drops to 1.4% — making it effectively weightless despite being the highest per-GPU-hour provider.

Provider Table (April 2026, Normalized)

Provider	Unit Rate ($/GPU-hr)	Egress/GB	Note
AWS (p5.48xlarge)	$3.93	$0.09	On-demand US-East-1
GCP (A3-High)	$3.00	$0.12	Post-2025 price reduction
CoreWeave HGX	$6.16	$0.01	Bare-metal InfiniBand
Lambda H100	$2.99	$0.00*	*Bandwidth caps apply
Nutanix AHV	$2.15	$0.00	36-mo amort + 20% OpEx
Cisco UCS M7	$2.45	$0.00	36-mo amort + 20% OpEx

The Placement Verdict

The output is not a table. It's a verdict:

Stay Put — data gravity makes migration economically irrational
Hybrid Burst — keep data on-prem, burst compute to cloud for training
Full Repatriation — steady-state 24/7 inference favors CapEx ownership

Each verdict includes reasoning against your specific inputs and an Architect Tip — the Day 2 operational consideration the cost comparison alone doesn't surface.

For example, at 50TB steady-state 100% duty cycle, the verdict is Full Repatriation to Nutanix AHV at $125.56/1M tokens vs $274.51 on AWS. The Architect Tip: configure Nutanix Metro Availability on Cisco UCS to match cloud-native SLA expectations without the hyperscaler dependency.

Additional Controls

OpEx Adder — adjustable from 20% to 35% for older facilities or full staff allocation
Sovereign Mode — excludes all public cloud providers, constrains verdict to Nutanix and Cisco only
Duty Cycle — model burst training (20–40%) vs steady-state inference (100%)

Below 70% duty cycle, on-prem CapEx begins losing its cost advantage versus elastic cloud pricing. The engine identifies that crossover dynamically.