Understanding Google Cloud’s Dynamic Workload Scheduler

#cloud #googlecloud #tpu #gpu

In the age of artificial intelligence and machine learning, there is a constant need for powerful hardware like GPUs and TPUs. Ideally, access to this hardware should be predictable and reliable. Resource availability shouldn’t be a blocker for your projects. If customers want to use a GPU, they should be provided with a GPU! After all, this is supposed to be one of the ideas behind cloud computing: to have resources available on demand. But with a limited supply of hardware, there is a need for a solution more sophisticated than simple “first come, first serve.”

Introducing DWS

Dynamic Workload Scheduler (DWS) is Google Cloud's innovative solution designed to optimize the allocation of high-demand, finite resources like GPUs and TPUs, ensuring that customer workloads can access the necessary hardware when needed. It directly addresses the supply and demand imbalance problem. On one hand, Google Cloud has customers asking for GPUs and TPUs to run their workloads. On the other hand, there’s a limited number of hardware resources that can be assigned to the customers. DWS is what balances customer demands against the finite resources of the cloud (which wants to feel infinite).

To the traditional model of on-demand provisioning, Spot instances and reservations, DWS adds two simple, yet powerful provisioning methods:

In this article, I’ll explain the benefits of each of these DWS methods and provide practical scenarios for when you might want to use them, helping you choose the best provisioning strategy for your specific workloads. Both methods are still in preview, so you can expect their availability and scope to improve once they enter general availability later this year.

If you’d rather watch a video about Dynamic Workloads Scheduler — I’ve got you covered:

Calendar mode

Let’s start with Calendar mode, which is a bit simpler to understand. DWS Calendar Mode allows you to create future reservations for the hardware you know you will need in advance. Booking rooms in a hotel is a great analogy here. You specify the range of dates, location, type and quantity of the hardware you need and you submit your request. Like a hotel, the system checks resource availability. It then books the resources you want to reserve. Once your future reservation is approved, all you need to do is wait for the starting date. Google Cloud creates a reservation for you on the start date that you can then consume however you want (GCE, GKE, Vertex AI, Vertex AI Workbench and Batch - they can all consume reservations).

Once the reservation time runs out, the system will reclaim the resources, so they can be allocated to other customers. Just like in a hotel, you pay for the time you had your reservation, even if you didn’t use it 100% of the time.

Here are some facts about the DWS Calendar Mode reservations:

The reservation period has a fixed length of 1 to 90 days.
Currently, GPUs require a 4 day lead time before the reservation can start. TPU reservations can be submitted 24 hours in advance of the desired start time.
Once your request is accepted, you will have to pay for the full reservation period, even if not used.
Once the reservation period ends, the resources are reclaimed.
Reserved resources are physically close to each other to minimize network latency.
Calendar Mode reservations can be shared with other projects.
DWS has its own pricing, separate from other provisioning methods. (Usually cheaper than on-demand pricing).
No quota is consumed while using resources booked through Calendar Mode reservations.

So, what are the best scenarios for Calendar mode? If you…

Know how much resources you need
Know how long you need them for
Know when you want to start and finish your project

…then DWS Calendar Mode is the solution for you. Whether it’s an ML training job, HPC simulation or expected spike in the number of inference requests (isn’t Black Friday great?) - the Calendar Mode has you covered.

So what’s the difference between regular future reservations and Calendar Mode?

You might have seen that in Google Cloud, there are also future reservations that are not related to DWS Calendar Mode. You can think of Calendar Mode reservations as a subset of the more generic future reservations. Every Calendar Mode reservation is a Future Reservation, but for a Future Reservation to be a Calendar Mode reservation, it needs to be:

Configured to auto-delete the reservation on expiry, even if it’s not consumed.
No longer than 90 days.
Limited to certain types of resources (see documentation for up to date list)

Additionally, Calendar Mode comes with a handy assistant that helps you find available capacity.

Flex start mode

With Calendar mode being so great, what more may you need? Well, you don’t always have a schedule you need to keep. Sometimes you want your job finished as soon as possible. At other times, you don’t know how long it will take to complete the work. This is where Flex Start mode comes in. If Calendar mode works similar to a hotel, you can compare Flex Start mode to a restaurant.

How does it work? You tell DWS that you need hardware, let’s say 10x A4 machines, to run a job that will take at most 6 days. With that knowledge, DWS goes out to the Cloud to get you your 10 A4 machines. After some time (this is where the “flex” part comes from - it’s a flexible process) the system has the 10 A4 machines you need and provides them to you all at once. This 'all-or-nothing' approach ensures you receive the full requested capacity simultaneously. This way, you don’t have to worry about paying for unused 7 machines while you wait to create 3 more. You get all 10 at the same time. Once they are delivered to you, they will be yours until the specified time runs out, or you’re done with your task. If you release the resources before the time runs out, you pay only for the time you actually used them. Since there is no provisioning notification, ensure your workloads can start automatically upon machine creation.

While Calendar mode was similar to booking rooms in a hotel, Flex Start is more akin to waiting for your order in a restaurant. You wait until your “order” is served and eat until you’re done, or the restaurant closes. If you change your mind before the order is fulfilled, you can cancel your request without any consequences.

To summarize:

Flex Start mode requests hardware for specified periods of time from 1 minute to 7 days.
Requests are fulfilled as soon as possible. (Shorter requests tend to be fulfilled quicker)
You can cancel your request at any time; you only pay for what you used.
DWS Flex Start pricing offers discounts compared to on-demand provisioning.
Once the time limit of your request is reached, the resources are reclaimed.
Resources acquired through Flex Start mode consume the preemptible quota, which is usually a lot higher than on-demand quota.
Works only for Accelerator-optimized machine series and N1 virtual machine (VM) instances with GPUs attached
You can't stop, suspend, or recreate the instances you create through Flex Start mode.

Flex start mode works best if:

You have a short (< 7 days) need for resources.
You want your job started as soon as possible.
You don’t know how long your task will take, and appreciate the flexibility to release resources early and only pay for actual usage.

How to use it?

Flex Start mode works a bit differently in every supported product.

For Compute Engine, it comes in the form of an all-or-nothing Managed Instance Group resize request with the maximum run duration specified.
For Google Kubernetes Engine (GKE), it’s specified for a workload or through scheduling tool.
For Cloud Batch, it’s available for jobs running on specific machine types.
For Vertex AI, specify FLEX_START as your scheduling strategy.

Happy computing!

When it comes to getting your hands on high-demand hardware for your advanced workloads, Google Cloud's Dynamic Workload Scheduler has you covered. With its Calendar and Flex Start modes, you get powerful and flexible solutions that truly fit your needs. By digging into these new provisioning methods, you can count on predictable, reliable, and efficient access to essential resources like GPUs and TPUs. This means your AI, ML, and HPC projects will run smoother than ever. Try booking some powerful machines for your next project now!