How Beam runs GPUs anywhere

#objectstorage #s3 #ai #casestudy

What do you do when you need to serve up a completely custom, 7+ billion parameter model with sub 10 second cold start times? And without writing a Dockerfile or managing scaling policies yourself. It sounds impossible, but Beam's serverless GPU platform provides performant, scalable AI infrastructure with minimal configuration. Your code already does the AI inference in a function. Just add a decorator to get that function running somewhere in the cloud with whatever GPU you specify. It turns on when you need it, it turns off when you don't. This can save you orders of magnitude over running a persistent GPU in the cloud.

Tigris tiger watching a beam from a ground satellite. Image generated with Flux [dev] from Black Forest Labs on fal.ai.

Fast iteration loops for hyper productive developers

Beam was founded to reduce cost and iteration time when developing AI / ML
applications. Using \$bigCloud and Docker directly is too slow for iterative
development, and waiting for a new image to build and redeploy to a dev
environment just takes too long. Let alone the time it takes to setup and manage
your own GPU cluster. Beam hot reloads your code to a live inference server for
testing and provides single command deployments, beam deploy. And they
integrate with workflow tools like ComfyUI so development is even easier.

AI workloads are at the bleeding edge for basically every part of the stack. For
the best results you need top-of-the line hardware, drivers that should work
(but are still being proven), the newest kernel and userland you can get, and
management of things that your sysadmin/SRE teams are not the most familiar with
yet. Much like putting CPU-bound workloads into the cloud, serverless GPUs
offload the driver configuration, hardware replacement, and other parts of how
it all works to the platform so you can focus on doing what you want to.

The big tradeoff with serverless GPUs is between cost and latency. If you want
to use the cheapest GPU available, it's almost certainly going to not be close
to your user or the data. The speed of light is only so fast, sending 50GB to
another continent is going to be slow no matter what you do. Every time you have
to do that you rack up longer cold start times, burn more cloud spend on
expensive GPU minutes, and incur yet more egress fees.

Challenge

The challenge? Beam’s serverless GPU platform ran largely on one cloud, and they
needed a cost-efficient way to more flexibly spread their compute across many
clouds. All while offering consistent developer experience: <10s cold starts,
as fast as 50ms warm starts, and limitless horizontal scale. Luke Lombardi, CTO
& Co-Founder of Beam, shares, “We’re a distributed GPU cloud: one of the core
things we need is cross region consistency. We don't want to think about it when
we move to new regions. It's critical that all of our storage infrastructure is
just there and the latency is predictable.”

Another notable challenge was strategic: Beam wanted to reduce lock-in to the
major cloud providers so they could run their compute anywhere. And they wanted
to open source much of their codebase so their platform interface could be used
by anyone, anywhere, on any bare metal server. Flexibility was key. They needed
decouple compute from storage and build a highly flexible storage layer with
cross region replication and global distribution.

Solution

The Beam engineering team knew the architectural importance of separating the
storage layer, and they knew egress costs were a non-starter. Their primary need
was performant data distibution across clouds. However, none of the object
storage solutions they tried made it easy to manage multi-cloud, or even
multi-region, workloads without extensive configuration, coding around file type
restrictions, or accounting for dips in availability and latency. Using Tigris
eliminated this undifferentiated work, and enabled Beam to ship faster.

QUOTE
Before finding Tigris, we were planning on implementing our own data
distribution and replication for our training and inference workloads. Now we
just point our code to a single endpoint and Tigris handles it all. We've saved
months of work, not to mention the maintenance cost & peace of mind.
—Luke Lombardi, Co-founder & CTO, Beam

Choosing Flexibility

Tigris provided a reliable and performant storage layer to complement Beam’s compute layer, enabling them to spread their workloads across many public clouds to take advantage of cost and GPU availability. Luke shared, “We're mostly benefiting from moving our object storage outside of the major cloud providers and having it as a separate managed service. The egress costs are important, but even just the fact that we're not locked into S3 is a good thing for us too.“

As a result of re-platforming and separating their storage and compute, Beam was able to open source their platform, beta9. Now users can self-host Beam and add their own compute with the same CLI experience across any cloud, on-prem, or hybrid architecture. This flexibility is a key difference between Beam and other serverless GPU providers: traditionally serverless has been a high lock-in architecture, but with beta9 and the self-hosted CLI, your workloads are portable.

Designing the Storage Layer to be Object Storage Native

After using another filesystem solution that wasn’t backed by object storage,
Luke said, “When we were switching to another filesystem, we knew it had to be
backed by object storage.” For AI workloads especially, data expands rapidly,
sometimes beyond the available ram or even the available volume. Beam handles
very large models and training data sets with ease, using Tigris to dynamically
move the data based on access patterns. Since the objects are already in the
region where you need them, it’s quick to load models and hydrate training data
sets to fast-access volumes as needed.

Do One Thing, and Do It Right

When designing their storage layer, Luke didn’t want to worry about
troubleshooting intermittent 500s or hunting down unexpected bottlenecks. Storage should just work ™️ and be consistent across clouds. He said, “I felt much more comfortable knowing that we’re using a product where the team just does object storage all day.”