DEV Community

David Aronchick
David Aronchick

Posted on • Originally published at distributedthoughts.org

Multi-Everything: Why Your Data Strategy Is Harder Than Your Cloud Strategy

An Uber engineer gave a great talk at Kubecon I have wanted to write about: “...we end up having to think about use cases that can either reside entirely within one cloud provider so that I can put training and serving together, or I need to think about the use cases where it makes sense to actually pull the data from one provider to another, in order to facilitate being able to leverage that compute. It doesn’t make it quite as seamless as it could be, and you have to be purposeful in how you think about what workloads you’re going to be converging together.”

Two sentences. They explain why multicloud is complicated. It works! But it's not just "spread stuff everywhere and balance."

And when you're thinking about it for yourself, put this in context. Uber has dedicated platform engineering teams, specialized GPU infrastructure groups, and the budget to build custom observability solutions. They still struggle with this. If you're not dedicated just as many resources to address these challenges, they reality of the complexity is going to hit you even harder.
The Conversation We've Been Having
For the past decade, the infrastructure industry focused relentlessly on making compute portable. Kubernetes runs anywhere. Containers abstract away the underlying machine. We've built elaborate systems to ensure that a workload running in AWS can, in theory, run identically in GCP or Azure.

And it worked! The compute layer genuinely is more portable than it was in 2015.

But what Uber was talking about here reveals a the immutable truth. Compute can PRETTY READILY between clouds because the commodity layer is portable. But when it comes to running in multiple locations, it's not the containers you need to worry about, it's the data.
Why Data Gravity Wins
Data gravity, the phenomenon where data accumulates mass and attracts applications toward it, isn't a new concept. Dave McCrory coined the term back in 2010, and there have been many other great pieces covering it. But the implications have become dramatically more severe as AI workloads have grown.

Uber's engineering teams maintain a data lake on one cloud provider. They run inference workloads on a different provider. Training happens somewhere else entirely. Each choice was rational in isolation; optimize for the best GPU availability here, the best storage economics there.

The result? "You have to be purposeful in how you think about what workloads you're going to be converging together."

That's a diplomatic way of describing a constraint that dominates every architectural decision. When considering whether a use case can leverage GPUs efficiently, the first question isn't "do we have the compute?" It's "where does the data live, and what does it cost to move it?"

This isn't a Kubernetes problem. It's not even really a cloud provider problem. It's physics meeting economics.

Moving a petabyte of training data from one cloud to another isn't JUST a technical challenge, it's a business calculation. Egress fees, while expenalone can run into six figures. AWS charges $0.09 per GB for the first 10 TB transferred to the internet. Do the math on a 50TB training dataset and you're looking at $4,500 just in network transfer—before you've stored anything, processed anything, or extracted a single insight.

Add latency considerations for real-time inference. Add compliance requirements that may prohibit certain data from crossing certain boundaries. Add the simple fact that a model optimized with TensorRT for a specific NVIDIA GPU configuration doesn't just "run" on different hardware.

The container is portable. Everything the container needs is not.
You Don't Have Uber's Teams
Uber's response to these challenges involves building custom metrics APIs to abstract away GPU vendor differences. Their teams revamped their entire observability stack when they discovered that CAdvisor-based GPU metrics didn't support newer models. They're actively working on making GPU capacity more fungible across clusters.

They have the engineering headcount to do this. Data quality issues are consistently cited as a primary cause of AI-project failure, and when your data lives in multiple places with different access patterns, different compliance requirements, and different cost structures, maintaining quality becomes exponentially harder.

Uber has spent 15+ years building dedicated teams to solve this problem. Most organizations are asking their existing platform engineers to figure it out alongside everything else they're already doing.
The Cluster Was the Wrong Abstraction
Uber made an observation that cuts to the heart of how we've been thinking about infrastructure: "We ended up with a Kubernetes infrastructure focused on batch and a Kubernetes infrastructure focused on microservices... The distinction has been that the hardware was segregated at the cluster level."

Their teams built dedicated GPU clusters for GPU workloads. Dedicated CPU clusters for CPU workloads. The cluster became the organizing principle for hardware allocation.

This made sense when clusters were the primary unit of deployment. But it created silos. GPU capacity sat isolated from CPU capacity. When GPU clusters were underutilized, that capacity couldn't easily flow to other workloads. When CPU-bound services accidentally landed on GPU nodes, expensive hardware sat wasted running authentication checks.

"We've been over-indexing on a Kubernetes cluster as an abstraction for hardware," Uber noted, "rather than leveraging a lot of what we can do internally from Kubernetes itself."

The cluster was supposed to abstract away infrastructure complexity. Instead, it became another boundary; another wall between resources that could, in principle, be fungible but in practice are not.

If this is happening at Uber, with their dedicated platform teams and infrastructure budgets, what does your cluster architecture actually look like?
GPUs Make Everything Harder
The challenges with multi-cloud compute get significantly worse when GPUs enter the picture. CPUs are MOSTLY fungible (the entire stack may not be, but it's fairly close). An x86-compatible chip in AWS behaves essentially the same as a x86-compatible chip in Azure (MOSTLY). You can pack multiple workloads onto a single CPU. Failovers are straightforward.

None of this applies to GPUs.

"GPU workloads aren't quite as fungible as CPU workloads. I can't as easily just dynamically pack eight workloads onto one GPU now, where I could have just squeezed things onto a single CPU."

Choosing the right GPU for training versus inference is already complex enough when you're working with a single provider. Training requires massive compute throughput and high memory bandwidth. Inference optimizes for latency and cost-per-query. The hardware choices are fundamentally different.

Now multiply that complexity across providers. Different cloud GPU offerings have different availability, different pricing models, different networking characteristics. An H100 on AWS isn't quite the same as an H100 on GCP when you factor in interconnect speeds, memory configurations, and the software stack surrounding it.

And worse, disaster recovery math changes even further. With CPUs, you might provision 20% overhead for failover capacity. With GPUs, given their cost, their scarcity, and the fact that models are often optimized for specific hardware configurations, that overhead becomes genuinely painful to justify.

If the workload was optimized for this GPU, with this memory configuration, using this specific NVIDIA architecture. Moving it isn't just a scheduling decision; it's potentially a retraining decision.
Observability in a Multi-Vendor World
One thing that PARTICULARLY stood out for me was observability. In Uber's case, they are exploring building their own metrics API to abstract away GPU vendor differences.

Why? Because they use NVIDIA hardware but are also evaluating AMD. Each vendor exposes different metrics. Teams built dashboards around low-level Cadvisor metrics that don't even support newer GPU models. When they tried to migrate to updated metrics, they discovered the entire organization had built dependencies on the old metric set.

"You're going to end up with a mix of a variety of different metrics and with nuances about what each of them means." They're now trying to build "metrics almost as an API" => a platform-level abstraction that can source data from vendor-specific implementations without requiring every team to understand GPU model differences.

This is the kind of problem that doesn't show up in multicloud architecture diagrams. It's the accumulated weight of real decisions made by real teams trying to get actual work done.

And again: Uber has dedicated teams building custom solutions for this. What's your plan?
The Real Problem: Multi-Everything
What Uber's experience reveals is that "multicloud" was always the wrong frame for this conversation.

The challenge isn't running compute across multiple cloud providers. Kubernetes solved that. The challenge is that modern AI workloads exist in a multi-everything environment:

Multi-region: Data generated in Europe may have different residency requirements than data generated in Asia. Training might happen in a region with GPU availability. Inference might need to happen close to users.

Multi-provider: Not just AWS vs. GCP vs. Azure, but also on-premises data centers that still hold sensitive datasets, edge locations that generate real-time data, and specialized AI clouds that offer unique hardware.

Multi-compliance-zone: Regulatory boundaries don't align with cloud provider boundaries. GDPR, HIPAA, financial regulations, and industry-specific requirements create a patchwork of constraints that have nothing to do with where your Kubernetes clusters run. Some EU member states have enacted additional residency requirements beyond GDPR for specific sectors like healthcare and public services.

Multi-format: Data lakes, data warehouses, streaming platforms, feature stores, vector databases. Each optimized for different access patterns. Each with its own replication and consistency guarantees.

Over 80% of enterprises with multicloud environments experience interoperability and connectivity problems. The compute layer was the easiest part of this puzzle to solve. We've been celebrating that victory while the harder problems compound.
What Comes Next
Uber's agentic AI workflows are still experimental, representing a minority of GPU usage. But they noted that if they "unlock some agentic workflow and put it everywhere," it would represent "a considerable increase in what they need to support with GPUs."

That's the trajectory the entire industry is on. More AI workloads. More models. More demand for training and inference capacity. And every one of those workloads will inherit all the multi-everything constraints that already make enterprise architecture so complex.

The industry spent a decade making compute portable. The next decade's problem is fundamentally different: making data accessible without necessarily making it mobile.

That's not a Kubernetes upgrade. It's not a new cloud service. It's a rethinking of how we architect systems when the data—not the compute—is the constraint that matters.

Uber's engineers are living in that future right now, with dedicated teams and substantial budgets to figure it out. The rest of us need to start thinking about how we'll solve the same problems with a fraction of the resources.

Because the data isn't going to move itself. And frankly, given the economics, you probably don't want it to.

Want to learn how intelligent data pipelines can reduce your AI costs? Check out Expanso. Or don't. Who am I to tell you what to do.*

NOTE: I'm currently writing a book based on what I have seen about the real-world challenges of data preparation for machine learning, focusing on operational, compliance, and cost. [I'd love to hear your thoughts**](https://github.com/aronchick/Project-Zen-and-the-Art-of-Data-Maintenance?ref=distributedthoughts.org)!**


Originally published at Distributed Thoughts.

Top comments (0)