Spot VM in Google Cloud

#googlecloud #cloudskills #tutorial #gcp

What are Spot VMs?

Spot VMs are virtual machine (VM) instances with the spot provisioning model. Spot VMs are available at much lower price—a 60-91% discount—compared to the price of standard VMs. However, Compute Engine might preempt Spot VMs if it needs to reclaim those resources for other tasks. At this uncertain preemption time, Compute Engine either stops (default) or deletes your Spot VMs depending on your specified termination action for each VM. Spot VMs are excess Compute Engine capacity, so their availability varies with usage. Spot VMs do not have a minimum or maximum runtime.

If your workloads are fault-tolerant and can withstand possible VM preemption, Spot VMs can reduce your Compute Engine costs significantly. For example, batch processing jobs can run on Spot VMs. If some of those VMs stop during processing, the job slows but does not completely stop. Spot VMs complete your batch processing tasks without placing additional load on your existing VMs and without requiring you to pay full price for additional standard VMs.

Spot VMs limitations

Spot VMs function like standard VMs but have the following limitations:

Compute Engine might stop Spot VMs at any time due to system events. The probability that Compute Engine stops Spot VMs for a system event is generally low, but might vary from day to day and from zone to zone depending on current conditions.
Spot VMs are finite Compute Engine resources, so they might not always be available.
Spot VMs can't use live migrate to become standard VMs while they are running or be set to automatically restart when there is a maintenance event.
Due to the preceding limitations, Spot VMs are not covered by any Service Level Agreement and are excluded from the Compute Engine SLA.
The Google Cloud Free Tier credits for Compute Engine do not apply to Spot VMs.

How to create a Spot VM in Google Cloud ?

To create a VM from the gcloud tool, use the *gcloud beta compute instances create * command. To create Spot VMs, you must include the --provisioning-model=SPOT flag. Optionally, you can also specify a termination action for Spot VMs by also including the --instance-termination-action flag.

gcloud beta compute instances create my-spot-vm \
    --provisioning-model=SPOT \
    --instance-termination-action=DELETE

For instance-termination-action , you can specify which action to take when Compute Engine preempts the VM, either STOP (default behavior) or DELETE.

Like any other VM, Spot VMs start upon creation. Likewise, if Spot VMs are stopped, you can restart the VMs to resume the RUNNING state. You can stop and restart preempted Spot VMs as many times as you would like, as long as there is capacity. For more information, see VM instance life cycle.

If Compute Engine stops one or more Spot VMs in an autoscaling managed instance group (MIG) or Google Kubernetes Engine (GKE) cluster, the group restarts the VMs when the resources become available again.

Identifying Spot VMs

To describe a VM from the gcloud tool, use the gcloud beta compute instances describe command:

gcloud beta compute instances describe my-spot-vm

The output for describing Spot VMs includes the provisioningModel: spot field, similar to the following.

...
scheduling:
  ...
  provisioningModel: spot
  instanceTerminationAction: TERMINATION_ACTION
...

where TERMINATION_ACTION indicates which action to take when Compute Engine preempts the VM, either STOP or DELETE. If the instanceTerminationAction field is missing, the default behavior is STOP.

Detect preemption of Spot VMs

Determine if Spot VMs were preempted by Compute Engine using the gcloud command

gcloud compute operations list \
    --filter="operationType=compute.instances.preempted"

Few things to consider while using Spot VMs

Pick smaller machine shapes. Resources for Spot VMs come out of excess and backup Google Cloud capacity. It's often easier to get lots of capacity for Spot VMs with smaller machine types than larger ones. You might also get more spare capacity by using a custom machine type that is in between the predefined types. For example, there's likely more capacity for a custom machine type with 48 vCPUs than there are n1-standard-64s.
Run large clusters of Spot VMs during off peak times. The load on Google Cloud data centers varies with location and time of day, but generally lowest on nights and weekends. As such, nights and weekends are the best times to run large clusters of Spot VMs.
Design your applications to be fault and preemption tolerant. It's important to be prepared for the fact that there are changes in preemption patterns at different points in time. For example, if a zone suffers a partial outage, large numbers of Spot VMs could be preempted to make room for standard VMs that need to be moved as part of the recovery. In that small window of time, the preemption rate would look very different than on any other day. If your application assumes that preemptions are always done in small groups, you might not be prepared for such an event. You can test your application's behavior under a preemption event by stopping the VM.
Retry creating Spot VMs that have been preempted. If your Spot VMs have been preempted, try creating new Spot VMs once or twice before falling back to standard VMs. Depending on your requirements, it might be a good idea to combine standard VMs and Spot VMs in your clusters to ensure that work proceeds at an adequate pace.
Use shutdown scripts. Manage shutdown and preemption notices with a shutdown script that can save a job's progress so that it can pick up where it left off, rather than start over from scratch.

These articles are fueled by coffee. So if you enjoy my work and found it useful, consider buying me a coffee! I would really appreciate it.