<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Google Cloud</title>
    <description>The latest articles on DEV Community by Google Cloud (@googlecloud).</description>
    <link>https://dev.to/googlecloud</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F809%2Fc7814399-cf4a-4dc9-9f12-d0a97ed21bf6.png</url>
      <title>DEV Community: Google Cloud</title>
      <link>https://dev.to/googlecloud</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/googlecloud"/>
    <language>en</language>
    <item>
      <title>Inference on GKE Private Clusters</title>
      <dc:creator>Maciej Strzelczyk</dc:creator>
      <pubDate>Thu, 12 Mar 2026 12:52:00 +0000</pubDate>
      <link>https://dev.to/googlecloud/inference-on-gke-private-clusters-35i8</link>
      <guid>https://dev.to/googlecloud/inference-on-gke-private-clusters-35i8</guid>
      <description>&lt;h2&gt;
  
  
  Setting up inference service without access to Internet
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/tutorials/serve-with-gke-inference-gateway?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Deploying an inference service&lt;/a&gt; on your &lt;a href="https://cloud.google.com/kubernetes-engine?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GKE&lt;/a&gt; cluster in 2026 is a fairly simple task. With a short Deployment definition making use of a &lt;a href="https://docs.vllm.ai/en/latest/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; image (&lt;a href="https://docs.cloud.google.com/tpu/docs/intro-to-tpu?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;TPU&lt;/a&gt; or &lt;a href="https://cloud.google.com/gpu?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GPU&lt;/a&gt;) and a Service definition, you have the basic setup ready to go! vLLM grabs the model of your choosing from &lt;a href="https://huggingface.co/" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; during its startup. It’s all nicely automated. However, this setup requires your GKE nodes to have access to the Internet. What should you do when there’s no Internet connection? I will discuss the options in this article, but first, let’s start with a short analysis of how and why you may want to have no Internet connection for your nodes.&lt;/p&gt;

&lt;h2&gt;
  
  
  GKE Private Nodes
&lt;/h2&gt;

&lt;p&gt;One situation where your vLLM pod might not be able to download a model from the Internet is when you decide to use &lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/how-to/legacy/network-isolation?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GKE Private Cluster&lt;/a&gt;. When you choose this option, the nodes in your cluster are assigned only a private IP from your VPC network. With only a private IP address, it’s impossible to reach them from outside of your network, but they also lose the default way to communicate with the outside world. This feature is great for increasing the security of your system, but it has obvious drawbacks, like this lack of connectivity to the world.&lt;/p&gt;

&lt;p&gt;One easy solution to the private nodes situation is to configure &lt;a href="https://docs.cloud.google.com/nat/docs/overview?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud NAT&lt;/a&gt; for the region your cluster is in. That will create a way for the nodes and pods running on them to access the Internet, while keeping them protected from any attempt to establish new connections from outside of the network. However, if you want your pods to be unable to connect to the Internet, we need another way to get the model for vLLM to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Providing images to the pods
&lt;/h2&gt;

&lt;p&gt;One other problem you might encounter when choosing to use Private Cluster without access to the Internet is the fact that your nodes won’t have access to the default source of Docker images: &lt;a href="https://hub.docker.com/" rel="noopener noreferrer"&gt;Docker Hub&lt;/a&gt;. The simple &lt;code&gt;vllm/vllm-openai:latest&lt;/code&gt; image specification will not work. You will need to copy the images you want to use to the &lt;a href="https://docs.cloud.google.com/artifact-registry/docs/overview?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Artifact Registry&lt;/a&gt;—this way GKE Nodes will be able to download the images and run them. This gives you additional control over your environment; you can carefully control which versions of the images to download and allow cluster users to use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Providing the LLM
&lt;/h2&gt;

&lt;p&gt;vLLM can run a model stored in a local directory if you pass it as the &lt;code&gt;--model&lt;/code&gt; argument value. To make use of this ability in your private GKE cluster, you will have to somehow provide the model to the vLLM through a mounted directory. The easiest way to do this is through &lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GCS FUSE&lt;/a&gt;, which allows you to simply mount a &lt;a href="https://docs.cloud.google.com/storage/docs/buckets?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GCS bucket&lt;/a&gt; as a folder in your Pod. You just need to remember that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The GKE Cluster must have the &lt;code&gt;GcsFuseCsiDriver&lt;/code&gt; add-on enabled.
&lt;/li&gt;
&lt;li&gt;You should use &lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/concepts/workload-identity?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Workload Identity&lt;/a&gt; and a dedicated &lt;a href="https://docs.cloud.google.com/iam/docs/service-account-overview?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;service account&lt;/a&gt; to allow the pod to access the bucket. The &lt;code&gt;roles/storage.objectViewer&lt;/code&gt; role should work just fine for read-only access.
&lt;/li&gt;
&lt;li&gt;It’s important to host the model in the same region as the nodes of your cluster to ensure the fastest transfers.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Serving LLMs from a mounted directory speeds up the startup process of your inference service, as it doesn’t have to download the model each time a new pod is started.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Alternative to mounting GCS Bucket - persistent disks&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;An alternative to mounting a bucket is to use a zonal or regional &lt;a href="https://docs.cloud.google.com/compute/docs/disks/persistent-disks?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;persistent disk&lt;/a&gt; or &lt;a href="https://docs.cloud.google.com/compute/docs/disks/hyperdisks?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;hyperdisk&lt;/a&gt;. A single disk can be mounted by multiple pods at once if using read-only mode. Creating a disk to store a model is a bit more time consuming than using a GCS bucket, but might provide better performance (depending on the disk type) and be cheaper, as GCS and disk billing is structured differently.&lt;/p&gt;

&lt;p&gt;To create a disk storing a model, you will need a temporary &lt;a href="https://docs.cloud.google.com/compute/docs/instances?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Compute Instance&lt;/a&gt;, where you will mount, format and fill the disk with data (&lt;a href="https://huggingface.co/docs/huggingface_hub/guides/cli" rel="noopener noreferrer"&gt;&lt;code&gt;hf download&lt;/code&gt;&lt;/a&gt; works just fine for this). Once the disk is ready, the VM can be deleted and the disk attached to the vLLM pods.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Using GKE without Internet access can be a good practice, providing you with additional security and control. As you can see, the additional work required to get your inference service running in this case is not negligible, but it is also not a deal-breaker. It’s up to you to decide if it’s a configuration you would like to use in your setup. Using a GCS Bucket or persistent disk to store a model is also a very good idea to simply cut down on the startup time of your services, especially with larger models.&lt;/p&gt;

&lt;p&gt;The ecosystem of AI is changing at a rapid pace and it’s important to stay up to date with all the latest news. Follow the official &lt;a href="https://cloud.google.com/blog?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud blog&lt;/a&gt;, &lt;a href="https://developers.googleblog.com/?utm_campaign=CDR_0x73f0e2c4_default_b491386531&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Developers blog&lt;/a&gt; and &lt;a href="https://www.youtube.com/googlecloudplatform" rel="noopener noreferrer"&gt;Google Cloud Tech YouTube channel&lt;/a&gt; to not miss any updates!&lt;/p&gt;

</description>
      <category>gke</category>
      <category>gcp</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>AI deployment: to host or not to host?</title>
      <dc:creator>Maciej Strzelczyk</dc:creator>
      <pubDate>Tue, 10 Mar 2026 23:28:46 +0000</pubDate>
      <link>https://dev.to/googlecloud/ai-deployment-to-host-or-not-to-host-4p2</link>
      <guid>https://dev.to/googlecloud/ai-deployment-to-host-or-not-to-host-4p2</guid>
      <description>&lt;p&gt;So you’ve built your AI application prototype. You used your own local GPU to run the AI model, or just used the &lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;free AI Studio tier&lt;/a&gt; to power your clever program. The app is ready, the world is ready, time to deploy your production instance! In the case of traditional, non-AI powered apps and services, the choice of deployment platform is based on personal preference, what you are familiar with, how much control over fine details you want to have etc. Cost is usually not the most important factor, as for a new service, that’s just going to start gaining a userbase, the first usage bills won’t be that high anyway. The situation is different when it comes to running services that make use of AI. Here, you need to make two separate decisions. First is how to deploy your application, this is the same as for a vanilla non-AI app. Second is how you are going to provision the AI capabilities. This second decision will most likely be responsible for a big chunk of your bill and it shouldn’t be made without proper consideration. In this article, I will try to help you make the right decision for your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Serverless vs hosted inference service
&lt;/h2&gt;

&lt;p&gt;There are two ways of provisioning AI for a production-grade application: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serverless&lt;/strong&gt; - where you pay for the tokens your application sends and receives.This is sometimes called Model as a Service (MaaS). In Google Cloud, this approach is available in &lt;a href="https://cloud.google.com/vertex-ai?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Vertex AI&lt;/a&gt; and &lt;a href="https://ai.google.dev/gemini-api/docs?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google AI Studio (Gemini API)&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hosted&lt;/strong&gt; - where you pay for the time you use the infrastructure running an LLM. In Google Cloud, this model is available through multiple services like: &lt;a href="https://cloud.google.com/products/compute?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Compute&lt;/a&gt; (through certain &lt;a href="https://docs.cloud.google.com/compute/docs/accelerator-optimized-machines?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;machine types&lt;/a&gt;), &lt;a href="https://cloud.google.com/vertex-ai?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Vertex AI&lt;/a&gt;, &lt;a href="https://cloud.google.com/kubernetes-engine?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GKE&lt;/a&gt; or &lt;a href="https://docs.cloud.google.com/run/docs/ai/overview?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Depending on your situation, you may not have an option to choose between the two, because only one would be possible. For example, if you have to use one of the &lt;a href="https://ai.google.dev/gemini-api/docs/gemini-3?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini models&lt;/a&gt;, there’s no way to host it yourself and the MaaS (pay per token) approach is the only one available. Similarly, if you have to use a custom model that is not available as a service, you just have to go down the hosted path.&lt;/p&gt;

&lt;p&gt;In cases where you do have a choice between the two paths you need to understand how they will affect your budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Serverless (pay per token)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Paying only for the tokens your application uses is a fair and easy to understand setup. It works exactly like any other paid service on Google Cloud - you pay for what you use. &lt;/p&gt;

&lt;h4&gt;
  
  
  Pros:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;It scales to zero, when you don’t use the AI,
&lt;/li&gt;
&lt;li&gt;you don’t have to worry about scaling,
&lt;/li&gt;
&lt;li&gt;Configuration and maintenance are extremely simple,&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Less predictable for your budget
&lt;/li&gt;
&lt;li&gt;You may reach service quota, either when your application experiences a rush-hour or when you reach some total monthly usage quota
&lt;/li&gt;
&lt;li&gt;In case your application is hacked, your bill might skyrocket
&lt;/li&gt;
&lt;li&gt;Once your application gets popular, the bill will grow with your active userbase&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Hosted (pay per second)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Hosting an LLM on infrastructure that you pay for is extremely predictable cost-wise. As long as you know how long you are going to hold on to that GPU or &lt;a href="https://cloud.google.com/tpu?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;TPU&lt;/a&gt; accelerated instance, you know exactly how much you are going to pay.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Extremely predictable cost
&lt;/li&gt;
&lt;li&gt;Many ways to lower your bill: &lt;a href="https://docs.cloud.google.com/docs/cuds?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;CUDs&lt;/a&gt;, &lt;a href="https://cloud.google.com/solutions/spot-vms?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Spot Instances&lt;/a&gt;, choosing a cheaper zone or choosing the right &lt;a href="https://docs.cloud.google.com/compute/docs/accelerator-optimized-machines?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;instance and/or accelerator type&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;No quota on how much tokens your application consumes
&lt;/li&gt;
&lt;li&gt;Full control over hardware and software inference configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Big initial cost
&lt;/li&gt;
&lt;li&gt;Doesn’t scale as smoothly as serverless
&lt;/li&gt;
&lt;li&gt;Configuration and maintenance is more complicated&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Couple of considerations
&lt;/h2&gt;

&lt;p&gt;To help you out a bit further, here are some questions you should ask yourself, before deciding on one of the deployment options.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much traffic do I expect?
&lt;/h3&gt;

&lt;p&gt;With low traffic, the choice is almost obvious - serverless is cheaper and easier. However, as your usage grows, the number of tokens consumed will add up to a considerable amount. In such a case, using a self-hosted solution might save you from unexpected bills at the end of the month.&lt;/p&gt;

&lt;h3&gt;
  
  
  Am I legally bound to keep user data in certain region?
&lt;/h3&gt;

&lt;p&gt;In some cases, like with medical or financial data, you might be required by local regulations or your own contracts to ensure your user data doesn’t leave a certain location, or will not be sent to service you don’t control. This might be a situation where no matter the cost effectiveness self-hosting an AI model is the only possible solution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Am I likely to hit the hourly/monthly quota?
&lt;/h3&gt;

&lt;p&gt;All API Services have some usage quotas, that includes AI services. If you expect your application may reach this quota, it’s a big hint that you should consider self-hosting your model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mixed-approach
&lt;/h2&gt;

&lt;p&gt;It is also worth noting that you don’t have to limit your architecture to using only one AI Model with one deployment option. Imagine your application offers multiple AI-powered features - some of them might be simple enough for a small model to handle, while others require full power of Gemini. It is perfectly fine to have for example a &lt;a href="https://ai.google.dev/gemma/docs/core?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemma 3&lt;/a&gt; running on a VM, handling the easier tasks, while you delegate the harder/bigger tasks to Gemini API.&lt;/p&gt;

&lt;h2&gt;
  
  
  This is not an irrevocable decision
&lt;/h2&gt;

&lt;p&gt;Even after careful consideration, the decision might still not be a simple one, especially if you’re starting with a new idea and simply don’t know how popular it’ll get. Luckily, with good architecture of your application, it is not that difficult to prepare for changing the AI API endpoint. It’s reasonable to start with a serverless solution, where you will often make great use of the fact that no traffic = zero cost. Once your application takes off and the Vertex AI or AI Studio bill reaches levels comparable to running a self-hosted model, you should reevaluate your situation and perhaps switch to the more predictable approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keep up!
&lt;/h2&gt;

&lt;p&gt;The ecosystem of AI is changing at a rapid pace and it’s important to stay up to date with all the latest news. Follow the official &lt;a href="https://cloud.google.com/blog?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud blog&lt;/a&gt;, &lt;a href="https://developers.googleblog.com/?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Developers blog&lt;/a&gt; and &lt;a href="https://www.youtube.com/@googlecloudtech" rel="noopener noreferrer"&gt;Google Cloud Tech YouTube channel&lt;/a&gt; to not miss any updates!&lt;/p&gt;

&lt;p&gt;P.S. Did you know that Google Cloud now offers &lt;a href="https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/?utm_campaign=CDR_0x73f0e2c4_default_b485824284&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Developer Knowledge API and MCP server&lt;/a&gt; that can give your AI Agents access to always up-to-date knowledge straight from the official Google Cloud, Firebase and Android documentation?!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gcp</category>
      <category>vertexai</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Making Sure Your Prompt Will Be There For You When You Need It</title>
      <dc:creator>Shawn Jones</dc:creator>
      <pubDate>Tue, 10 Mar 2026 17:44:00 +0000</pubDate>
      <link>https://dev.to/googlecloud/making-sure-your-prompt-will-be-there-for-you-when-you-need-it-lk7</link>
      <guid>https://dev.to/googlecloud/making-sure-your-prompt-will-be-there-for-you-when-you-need-it-lk7</guid>
      <description>&lt;p&gt;At Google, our team (Google Cloud Samples) &lt;a href="https://adamross.dev/p/prompting-for-production/" rel="noopener noreferrer"&gt;uses Gemini to produce thousands of samples&lt;/a&gt; in batches. In doing so, we've learned that the biggest hurdle isn't the AI, it's our own expectations about these tools. As developers, we are wired for &lt;a href="https://en.wikipedia.org/wiki/Deterministic_system" rel="noopener noreferrer"&gt;deterministic&lt;/a&gt; systems: we call a function and it produces the same result for the same input every time. This predictability allows for standard unit tests.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/ai/llms" rel="noopener noreferrer"&gt;Large Language Models&lt;/a&gt; (LLMs) however, are &lt;a href="https://medium.com/@raj-srivastava/the-great-llm-debate-are-they-probabilistic-or-stochastic-3d1cd975994b" rel="noopener noreferrer"&gt;probabilistic&lt;/a&gt; and stochastic. They don't store facts; they store the likelihood of patterns and use a "sophisticated roll of the dice" to choose the next token. This is why the same prompt can yield a “&lt;a href="https://www.wsj.com/tech/ai/how-the-sparkles-emoji-became-the-symbol-of-our-ai-future-e7786eef" rel="noopener noreferrer"&gt;sparkly&lt;/a&gt;” ✨ success one minute and a hallucination 🤪 the next. You aren't just testing code anymore; you are forecasting the weather of your system. To move to production, we must build containment structures (like quality gates and evaluators) that make the unpredictability manageable.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLMs Can Make Mistakes
&lt;/h2&gt;

&lt;p&gt;Trying to make samples in large batches is different from asking for a single sample from a tool like &lt;a href="https://github.com/google-gemini/gemini-cli" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt;. When producing many samples at once, we see more mistakes because the statistics catch up with us. A small percentage of bad samples becomes a large number once the overall number of samples gets higher, not unlike issues in manufacturing. Here are some examples of mistakes.&lt;/p&gt;

&lt;p&gt;Sometimes we detect code with syntax issues, like the &lt;code&gt;def def&lt;/code&gt; snippet below. Python uses only one &lt;code&gt;def&lt;/code&gt; keyword to specify the start of a &lt;a href="https://docs.python.org/3/tutorial/controlflow.html#defining-functions" rel="noopener noreferrer"&gt;function definition&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_secret_with_expiration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;secret_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Syntax issues like this can be detected with linting or other build tools. If we detect them in our pipeline, we can just regenerate the sample. Other times the issues are more subtle, like how this JSDoc below is 7 lines away from the function it is documenting, separated from its function by a use statement, imports, and an object instantiation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="cm"&gt;/**
 * Get secret metadata.
 *
 * @param projectId Google Cloud Project ID (such as 'example-project-id')
 * @param secretId ID of the secret to retrieve (such as 'my-secret-id')
 */&lt;/span&gt;
&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;use strict&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;SecretManagerServiceClient&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@google-cloud/secret-manager&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@grpc/grpc-js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SecretManagerServiceClient&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getSecretMetadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;secretId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or other times the docstring is incorrect, like how the the docstring below is missing parameters used by the function it documents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_secret_with_notifications&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;secret_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create Secret with Pub/Sub Notifications. Creates a new secret resource
    configured to send notifications to Pub/Sub topics. This enables external
    systems to react to secret lifecycle events.

    Args:
        project_id: The Google Cloud project ID. for example,
            &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;example-project-id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
        location: The location of the resource. for example, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Issues don’t always show up directly in code, either. We have Gemini generating build artifacts, like &lt;code&gt;package.json&lt;/code&gt;. In the case below, it was so eager to include the &lt;a href="https://grpc.io/" rel="noopener noreferrer"&gt;gRPC&lt;/a&gt; package that it listed the package 3 times in different ways, including one that has been deprecated.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"example"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"private"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Google Cloud Platform Code Samples 🎒"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@google-cloud/secret-manager"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"latest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@grpc/grpc-js"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"latest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@grpc/grpc-js"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"^1.10.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"grpc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"latest"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scripts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"test"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node --test"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have other, more subtle issues as well. Sometimes the code is correct, but not saved with the correct filename or in the correct folder structure. Issues like these lead to more manual evaluation and testing. By iterating on prompts with evaluation we have improved our results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt Templates as Functional Interfaces
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffdgujtput47i31wr7kmw.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffdgujtput47i31wr7kmw.webp" alt="The LLM alone is not the function. The input data, the prompt template, and the LLM together create your response." width="515" height="124"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Quality responses are guided by the three elements shown above: the input data, a &lt;strong&gt;prompt template&lt;/strong&gt;, and the LLM itself. As part of &lt;a href="https://adamross.dev/p/prompting-for-production/" rel="noopener noreferrer"&gt;prompting for production&lt;/a&gt;, we’re evaluating prompt templates, like those created with the &lt;a href="https://github.com/google/dotprompt" rel="noopener noreferrer"&gt;dotprompt&lt;/a&gt; format. Below is a very simple example of a prompt template in dotprompt. Using the prompt template we can reuse the same prompt text over and over with different inputs. Prompt templates give us a function interface for interacting with the LLM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemini-3-flash-preview&lt;/span&gt;
&lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;need&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
    &lt;span class="na"&gt;language&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
&lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="s"&gt;Generate code that satisfies the need of {{ need }} using language {{ language }}.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By using templates, we can run the same logic across hundreds of different inputs to see where the "weather" changes.&lt;/p&gt;

&lt;p&gt;We've found that a successful workflow follows these phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build a foundation with Ground Truth
&lt;/li&gt;
&lt;li&gt;Finding Your Candidate Prompt (Vibe Check)
&lt;/li&gt;
&lt;li&gt;Statistical Trials – Because Unit Tests Alone Don’t Work&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 1: Build a foundation with Ground Truth&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In the prompt template world, the template is only part of the picture. We need the input values as well. We also need the matching expected output values. You may say “&lt;em&gt;But this sounds like unit testing!&lt;/em&gt;” and you would be right; it is a similar idea. The amount of testing data you need depends on what question you want to answer. If your question boils down to “&lt;em&gt;Is the prompt template bad?&lt;/em&gt;” then 5-10 records of input/output test data is good enough. This will help you eliminate a bad prompt template quickly. If your question is more “&lt;em&gt;Will my prompt template work well?&lt;/em&gt;” then you need &lt;a href="https://developers.openai.com/api/docs/guides/supervised-fine-tuning" rel="noopener noreferrer"&gt;50&lt;/a&gt; - &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/eval-python-sdk/evaluation-dataset" rel="noopener noreferrer"&gt;100&lt;/a&gt;. The more edge cases you can insert into your test data, the better.&lt;/p&gt;

&lt;p&gt;Fortunately, we have a golden set of samples we can use as known good testing data. We continue to iterate on our test data while also adding more samples to it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 2: Finding Your Candidate Prompt (Vibe Check)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before you share with your team, start experimenting by using a tool like &lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;Google AI studio&lt;/a&gt; to develop some handmade prompts. Try them with different inputs and outputs. Build an intuition for what works and what doesn’t. Use Gemini to help in your evaluation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36v5coavg9h05tkugqca.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36v5coavg9h05tkugqca.webp" alt="AI Studio can be a useful tool for developing prompts." width="800" height="655"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AI Studio’s playground can be very helpful at this stage, including providing structured outputs that can then be used to help plan the outputs used in our dotprompt file. When you feel good about your results, you have anecdotal evidence that your prompt template might work, but not statistical evidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Phase 3: Statistical Trials – Because Unit Tests Alone Don’t Work&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Does your candidate prompt template work with many different inputs? This is where things get more complex and we move from the familiar deterministic unit testing to probabilistic testing. Because the LLM could answer differently each time, we need to run multiple trials for each input/output test record. But how many is enough? For &lt;a href="https://doi.org/10.18653/v1/2025.aisd-main.6" rel="noopener noreferrer"&gt;recent academic work&lt;/a&gt;, my previous team ran as many as 128 times per input/output pair for better statistical relevance, but this gets expensive fast.  To balance cost, time, and effort, the community consensus is either &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/configure-judge-model" rel="noopener noreferrer"&gt;four&lt;/a&gt; and &lt;a href="https://arxiv.org/html/2502.06233v1" rel="noopener noreferrer"&gt;five&lt;/a&gt; times &lt;em&gt;per input/output test record&lt;/em&gt;. The argument for five over four is that we need an odd number to “break ties.”&lt;/p&gt;

&lt;p&gt;But how do you know if the output of your prompt is working well? Use a deterministic metric. In the case of samples, we build the code, we lint it, we apply other static analysis tools, which all provide deterministic review and feedback. Finally, once we have something that passes those quality gates, we perform manual testing and human review. With this many quality gates and a large  number of samples, we can begin to rely on the &lt;a href="https://www.probabilitycourse.com/chapter7/7_1_1_law_of_large_numbers.php" rel="noopener noreferrer"&gt;Law of Large Numbers&lt;/a&gt; to determine if a prompt template is working and not worry about four or five trials per sample. &lt;/p&gt;

&lt;h2&gt;
  
  
  Embracing Statistical Techniques For The Best Performance
&lt;/h2&gt;

&lt;p&gt;Beyond prompt templates, we can evaluate other parts of our workflow. The scenarios below show how we can freeze some elements of the workflow while keeping others the same (freeze). We start by listing the question we want to answer and then list which elements to change and which elements to freeze.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How well does my new prompt template work?

&lt;ol&gt;
&lt;li&gt;Change: prompt template
&lt;/li&gt;
&lt;li&gt;Freeze: model, hyperparameters, ground truth input and output
&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;How well does a different model or model version affect the results?

&lt;ol&gt;
&lt;li&gt;Change: model
&lt;/li&gt;
&lt;li&gt;Freeze: hyperparameters, ground truth input and output, prompt template
&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;Is a new input value a useful addition to the ground truth?

&lt;ol&gt;
&lt;li&gt;Change: input value
&lt;/li&gt;
&lt;li&gt;Freeze: model, hyperparameters, ground truth output, prompt template
&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;Is a new output value a useful addition to the ground truth?

&lt;ol&gt;
&lt;li&gt;Change: output value
&lt;/li&gt;
&lt;li&gt;Freeze: model, hyperparameters, ground truth input, prompt template
&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;How will changing the &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values" rel="noopener noreferrer"&gt;hyperparameter&lt;/a&gt; values improve the results?

&lt;ol&gt;
&lt;li&gt;Change: hyperparameter value
&lt;/li&gt;
&lt;li&gt;Freeze: model, ground truth input and output, prompt template&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;Say a new model version is released and we have results from testing the previous model. We can keep the hyperparameters, ground truth, and prompt template the same as before. Then we change the model in the dotprompt file and rerun our evaluation. Now we have data to decide if we want to use the new model version. Likewise, we can alter the other items in the list above to answer other questions.&lt;/p&gt;

&lt;p&gt;We might be able to sidestep the statistical testing by forcing Gemini behave more deterministically. We could set its hyperparameters to their most deterministic values – such as &lt;em&gt;temperature&lt;/em&gt; at 0, &lt;em&gt;top-k&lt;/em&gt; at 1, &lt;em&gt;top-p&lt;/em&gt; at 0, or by using the same &lt;em&gt;seed&lt;/em&gt; the same value every time. This creates its own issues, and does not rid us of the need for testing. What if a given prompt’s deterministic response is incorrect every time? How do we automatically correct things for which there are no deterministic tools? We want there to be some degree of creativity and stochasticity in its responses. We want the option of running the generation again with the probability of getting a better response. We embrace this power but we also need to be more statistics-minded about our testing to make sure our prompts are there for us when we need them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the Conversation
&lt;/h2&gt;

&lt;p&gt;I’m curious about what others are doing to help evaluate their prompts and prompt templates.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are you just starting out? How do you do your vibe checks? How do you test before shipping?
&lt;/li&gt;
&lt;li&gt;Have you been evaluating prompts for a while? How many times do you evaluate a prompt template before putting it into production? How do you keep time and cost down?
&lt;/li&gt;
&lt;li&gt;What recommendations do you follow when testing prompts? Do you have sources to share? Can we do this better?
&lt;/li&gt;
&lt;li&gt;What workflows have you found to work?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Please share in the comments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Read More
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A paper on the “Budget 5”: &lt;a href="https://arxiv.org/html/2502.06233v1" rel="noopener noreferrer"&gt;Confidence Improves Self-Consistency in LLMs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/eval-python-sdk/evaluation-dataset#best-practices" rel="noopener noreferrer"&gt;Vertex AI’s advice on evaluation datasets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Anthropic’s &lt;a href="https://www.anthropic.com/engineering/writing-tools-for-agents" rel="noopener noreferrer"&gt;&lt;em&gt;Writing Effective tools for AI agents&lt;/em&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Stanford’s and UC Santa Barbara’s &lt;a href="https://web.stanford.edu/~jurafsky/pubs/2020.emnlp-main.745.pdf" rel="noopener noreferrer"&gt;&lt;em&gt;With LIttle Power Comes Great Responsibility&lt;/em&gt;&lt;/a&gt; about how many NLP studies are underpowered in terms of statistical testing
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/7-technical-takeaways-from-using-gemini-to-generate-code-samples-at-scale" rel="noopener noreferrer"&gt;7 Technical Takeaways from Using Gemini to Generate Code Samples at Scale&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://adamross.dev/p/prompting-for-production/" rel="noopener noreferrer"&gt;How My Team Aligns on Prompting for Production&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks to &lt;a href="https://dev.to/sigje"&gt;Jennifer Davis&lt;/a&gt;, &lt;a href="https://adamross.dev/" rel="noopener noreferrer"&gt;Adam Ross&lt;/a&gt;, &lt;a href="https://nim.emuxo.com/" rel="noopener noreferrer"&gt;Nim Jayawardena&lt;/a&gt;, and &lt;a href="https://glasnt.com/" rel="noopener noreferrer"&gt;Katie McLaughlin&lt;/a&gt; for feedback on this post.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>softwareengineering</category>
      <category>promptengineering</category>
    </item>
    <item>
      <title>How My Team Aligns on Prompting for Production</title>
      <dc:creator>Adam Ross</dc:creator>
      <pubDate>Tue, 17 Feb 2026 16:53:46 +0000</pubDate>
      <link>https://dev.to/googlecloud/how-my-team-aligns-on-prompting-for-production-1lpf</link>
      <guid>https://dev.to/googlecloud/how-my-team-aligns-on-prompting-for-production-1lpf</guid>
      <description>&lt;p&gt;My team at Google is &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/7-technical-takeaways-from-using-gemini-to-generate-code-samples-at-scale" rel="noopener noreferrer"&gt;automating sample code generation and maintenance&lt;/a&gt;. Part of that is using Generative AI to produce and assess instructional code. This introduces a challenge: How do we trust the system to meet our specific standards, when core components are non-deterministic?&lt;/p&gt;

&lt;p&gt;Establishing trust requires isolating and understanding each &lt;a href="https://cloud.google.com/ai/llms" rel="noopener noreferrer"&gt;large language model (LLM)&lt;/a&gt; request. We need to know exactly what goes into the model, and a guarantee of what comes out.&lt;/p&gt;

&lt;p&gt;This challenge isn't different from other feature development. To succeed, we realized we had to stop treating prompting like chatting or guessing and start treating it like coding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Natural Language is a Fuzzy Programming Language
&lt;/h2&gt;

&lt;p&gt;Prompting an LLM is effectively "&lt;a href="https://ai.google.dev/gemini-api/docs/prompting-strategies" rel="noopener noreferrer"&gt;natural language programming&lt;/a&gt;": we are programming in English. The problem is that English is not the greatest language for programming. It is ambiguous. It is subjective. It is open to interpretation.&lt;/p&gt;

&lt;p&gt;In C++, a missing semicolon breaks the build. In English, a missing comma changes the objective entirely: &lt;a href="https://en.wikipedia.org/wiki/Vocative_case" rel="noopener noreferrer"&gt;&lt;em&gt;"I don't know, John"&lt;/em&gt; becomes &lt;em&gt;"I don't know John"&lt;/em&gt;&lt;/a&gt;. In a programming language, syntax is binary; it works or it doesn't. In English, the difference between "Ensure variables are immutable" and "Make sure variables never change" might yield different results based on the model’s training data.&lt;/p&gt;

&lt;p&gt;When you combine the fuzziness of human language with the "black box" probabilistic processing of an LLM, you face a difficult question: &lt;em&gt;What is the weather going to be today in the land of AI?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To answer that, you have to make the intentions behind your prompts explicit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Be Efficient with Brains (Pair &amp;amp; Review)
&lt;/h2&gt;

&lt;p&gt;Writing a prompt is an exploratory process of finding words that trigger the best response. However, a single writer is limited by their own understanding. This is risky with LLMs, which are suited for ambiguous problems but require strict guardrails.&lt;/p&gt;

&lt;p&gt;Relying on one person to do this creates blind spots. We found that prompt quality benefits from &lt;a href="https://martinfowler.com/articles/on-pair-programming.html" rel="noopener noreferrer"&gt;pairing&lt;/a&gt;. A diversity of thought helps create a more complete definition of any problem. What one engineer considers a clear instruction, another might see as a loophole. Pairing covers the gaps that a single brain might miss.&lt;/p&gt;

&lt;p&gt;Furthermore, you should also review every prompt. This isn't just checking for typos; it’s a logic check. Does this prompt align with business requirements? &lt;strong&gt;We’ve found that prompt reviews often uncover disagreements about the requirements themselves, forcing us to align as a team before we ship.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Document "The Why" and manage change
&lt;/h2&gt;

&lt;p&gt;Because English is fuzzy, the intent behind a specific word choice isn't always obvious. Why did we use the passive voice here? Why did we specify "immutable"?&lt;/p&gt;

&lt;p&gt;Even well-structured prompts can eventually obscure the original business requirements. As optimizations blur into the general text, we must take every opportunity to document "the why":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Documentation:&lt;/strong&gt; Avoid relying on prompts as canonical business requirements. Our LLM requests are a combination of system instructions, user input, context, and deterministic post-processing; the prompt is not complete for onboarding a developer.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Comments:&lt;/strong&gt; Comment on complex prompts just as you would complex code. Spotlight specific constraints or even punctuation to explain the problem they solve. The model is a moving target, so any unintentional changes can make troubleshooting hard.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Commit Messages:&lt;/strong&gt; Use commit messages as an opportunity to explain what was wrong with the prompt. (for example, &lt;code&gt;fixed: Missing comma lost John&lt;/code&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Separation of Concerns (Use Dedicated Files)
&lt;/h2&gt;

&lt;p&gt;Writing code and writing prompts require distinct mindsets. One focuses on syntax and execution flow; the other on semantics and intent. &lt;a href="https://medium.com/@rexnino/what-is-separation-of-concerns-in-coding-and-why-is-it-important-731aa8cfa898" rel="noopener noreferrer"&gt;Embedding long English instructions inside code creates a distraction&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We keep prompts in dedicated files to disentangle application logic from the LLM interaction configuration, which requires frequent tuning.&lt;/p&gt;

&lt;p&gt;By treating the prompt as a standalone component, we can prototype and iterate on the LLM behavior independent of the application's control flow. &lt;a href="https://google.github.io/dotprompt/" rel="noopener noreferrer"&gt;Tools like dotprompt&lt;/a&gt; allow us to treat these files as first-class artifacts containing text, model parameters, and schema definitions. This highlights that invoking a model isn't just a function call; it’s an integration with a distinct system that requires its own configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Structured Output
&lt;/h2&gt;

&lt;p&gt;To build a reliable tool, you need a bridge between unpredictable LLM output and deterministic computers.&lt;/p&gt;

&lt;p&gt;We rely on &lt;a href="https://genkit.dev/docs/models/#structured-output" rel="noopener noreferrer"&gt;structured output&lt;/a&gt;&lt;sup id="fnref1"&gt;1&lt;/sup&gt; to guide the model to emit JSON according to a schema. Even if we only need a single field, defining a schema provides a guardrail that helps the model output conform to a shape we can validate programmatically. This is critical for code generation, where models often add unwanted preambles, conversational filler, or inconsistent markdown fences.&lt;/p&gt;

&lt;p&gt;If the output doesn't match the schema, we fail fast or retry. This allows us to integrate the LLM output into our process with the same confidence we have in detecting a bad API response.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Magic to Engineering
&lt;/h2&gt;

&lt;p&gt;Moving from one successful prompting to a reliable system requires acknowledging that prompts are code. You need to manage, review, and test them with the same rigor applied to the rest of your stack. While we are still working on better ways to &lt;a href="https://www.statsig.com/perspectives/what-are-non-deterministic-ai-outputs-" rel="noopener noreferrer"&gt;benchmark quality&lt;/a&gt;, treating our prompts as first-class codebase assets is our first step towards building confidence in our AI assisted automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the conversation
&lt;/h2&gt;

&lt;p&gt;I'm curious how you are handling the fuzzy nature of LLMs in production.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How does your team review and test prompts?
&lt;/li&gt;
&lt;li&gt;Do you treat prompts as configuration, code, or something else entirely?
&lt;/li&gt;
&lt;li&gt;Share your workflow in the comments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Read More
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://dev.to/googlecloud/the-lumberjack-paradox-from-theory-to-practice-2lb5"&gt;The lumberjack paradox: From theory to practice&lt;/a&gt; by Jennifer Davis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/google-cloud/google-sandwich-manager-and-the-hallucinated-sdk-6bed653e6318" rel="noopener noreferrer"&gt;Google Sandwich Manager, and the hallucinated SDK&lt;/a&gt; by Katie McLaughlin and Brian Dorsey&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Thanks to &lt;a href="https://dev.to/sigje"&gt;Jennifer Davis&lt;/a&gt; &amp;amp; &lt;a href="https://shawnmjones.org/" rel="noopener noreferrer"&gt;Shawn Jones&lt;/a&gt; for review and contributions.&lt;br&gt;
Cover Photo by &lt;a href="https://unsplash.com/@jeton7?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Jeton Bajrami&lt;/a&gt; on &lt;a href="https://unsplash.com/photos/a-group-of-people-rowing-a-boat-in-the-water-d4e2mitxgsE?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;This links to how structured output works in the Genkit framework, which my team is using. A succinct example. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>productivity</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Why your `curl` logic just bit you 🐾</title>
      <dc:creator>Jennifer Davis</dc:creator>
      <pubDate>Mon, 09 Feb 2026 05:01:07 +0000</pubDate>
      <link>https://dev.to/googlecloud/why-your-curl-logic-just-bit-you-5gk1</link>
      <guid>https://dev.to/googlecloud/why-your-curl-logic-just-bit-you-5gk1</guid>
      <description>&lt;p&gt;It’s a common strategy to test a new API directly with &lt;a href="https://curl.se/docs/manual.html" rel="noopener noreferrer"&gt;curl&lt;/a&gt;. It feels intuitive, fast, and removes the overhead of a language runtime. For example, if you are testing out the &lt;a href="https://cloud.google.com/logging/docs/reference/rest?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;Google Cloud Logging API&lt;/a&gt;, you might start with a simple request to list logs from a &lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/learn/containers?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;Kubernetes container&lt;/a&gt;.:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud auth print-access-token&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  https://logging.googleapis.com/v2/entries:list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "resourceNames": ["projects/your-project-id"],
    "filter": "resource.type=\"k8s_container\""
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The JSON returns perfectly. But then, you move that logic into a &lt;a href="https://nodejs.org/en/docs/" rel="noopener noreferrer"&gt;Node.js application&lt;/a&gt; using the official &lt;a href="https://cloud.google.com/nodejs/docs/reference/logging/latest?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;&lt;code&gt;@google-cloud/logging&lt;/code&gt; library&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Logging&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@google-cloud/logging&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;readLogsAsync&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Common approach: Initializing the client without explicit auth&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logging&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Logging&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntries&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;resource.type="k8s_container"&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;resourceNames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;projects/your-project-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nextPageToken&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All of a sudden, the code fails with a cryptic error: &lt;a href="https://cloud.google.com/docs/authentication/troubleshoot-adc?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;&lt;code&gt;Error: Could not load the default credentials&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This happens because of a disconnect between how the &lt;a href="https://cloud.google.com/sdk/gcloud?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;&lt;code&gt;gcloud&lt;/code&gt; CLI&lt;/a&gt; manages session tokens and how the Google Cloud client libraries search for credentials.&lt;/p&gt;

&lt;p&gt;While your &lt;code&gt;curl&lt;/code&gt; command relies on the &lt;a href="https://cloud.google.com/docs/authentication/production?utm_campaign=CDR_0x0d701af0_default_b482487387#obtaining_and_providing_service_account_credentials_manually" rel="noopener noreferrer"&gt;&lt;strong&gt;explicit token&lt;/strong&gt;&lt;/a&gt; you provided via &lt;a href="https://cloud.google.com/sdk/gcloud/reference/auth/print-access-token?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;&lt;code&gt;gcloud auth print-access-token&lt;/code&gt;&lt;/a&gt;, the client libraries are designed to look for &lt;a href="https://cloud.google.com/docs/authentication/application-default-credentials?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;&lt;strong&gt;Application Default Credentials (ADC)&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Running &lt;a href="https://cloud.google.com/sdk/gcloud/reference/auth/login?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;&lt;code&gt;gcloud auth login&lt;/code&gt;&lt;/a&gt; authenticates, but it does not create the specific credential file that the Node.js library requires to run locally.&lt;/p&gt;

&lt;h2&gt;
  
  
  🤫 Authenticate for local development
&lt;/h2&gt;

&lt;p&gt;The most efficient way to solve this is to provide the library with the credentials it is looking for. Run the &lt;a href="https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;&lt;code&gt;gcloud auth application-default login&lt;/code&gt;&lt;/a&gt; command in your terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud auth application-default login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens a browser window to authorize your account and saves a JSON file to your local configuration folder. Once this is done, your Node.js code will automatically find these credentials—no code changes required.&lt;/p&gt;

&lt;h2&gt;
  
  
  🏡 Moving to Production
&lt;/h2&gt;

&lt;p&gt;Once you get your local environment running, you need to think about "productionizing" your code. Hardcoded project IDs and lack of &lt;a href="https://cloud.google.com/apis/design/errors?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;robust error handling&lt;/a&gt; are common causes of production outages.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🛡️ Robust Error Handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In production, your app should handle permission issues or network timeouts gracefully.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEntries&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;resource.type="k8s_container"&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;resourceNames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;`projects/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;pageSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Permission Denied: Check your Service Account roles.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unexpected Logging Error:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;🎭 Service Accounts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In production, you shouldn't use your personal user credentials. Use a Service Account with the &lt;a href="https://cloud.google.com/logging/docs/access-control?utm_campaign=CDR_0x0d701af0_default_b482487387#logging.viewer" rel="noopener noreferrer"&gt;"Logs Viewer" role&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🌍 Environment Variables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Avoid hardcoding project IDs. Use an environment variable to make your code portable across staging and production environments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PROJECT_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-project-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;readLogsAsync&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logging&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Logging&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PROJECT_ID&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Structured Logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of just reading logs, think about how you write them. Using structured JSON logs makes them much easier to query later in the Cloud Logging console.&lt;/p&gt;

&lt;p&gt;If you are running on &lt;a href="https://cloud.google.com/kubernetes-engine?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;Google Kubernetes Engine (GKE)&lt;/a&gt;, you don't even need to use the Logging library to write logs. Printing a JSON string to stdout allows GKE to parse your data into searchable fields automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;logStructuredStatus&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logEntry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;INFO&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Container health check successful&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;container_info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;v2.1.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;uptime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uptime&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;http_stats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;active_connections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="c1"&gt;// GKE picks this up and converts it to a structured log automatically&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logEntry&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Get started
&lt;/h2&gt;

&lt;p&gt;I'll be sharing a more thorough walk-through on using the native observability features provided with GKE real soon—including how to automate this stuff so you can avoid the manual debugging of credential errors.&lt;/p&gt;

&lt;p&gt;In the meantime, dive deeper here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📖 &lt;a href="https://cloud.google.com/docs/authentication/application-default-credentials?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;How Application Default Credentials work&lt;/a&gt; – Understand the "magic" behind the lookup.&lt;/li&gt;
&lt;li&gt;🛠️ &lt;a href="https://cloud.google.com/docs/authentication/provide-credentials-adc?utm_campaign=CDR_0x0d701af0_default_b482487387" rel="noopener noreferrer"&gt;Providing credentials to ADC&lt;/a&gt; – Setup guides for every environment.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>googlecloud</category>
      <category>node</category>
      <category>devops</category>
      <category>authentication</category>
    </item>
    <item>
      <title>The lumberjack paradox: From theory to practice</title>
      <dc:creator>Jennifer Davis</dc:creator>
      <pubDate>Wed, 19 Nov 2025 00:42:14 +0000</pubDate>
      <link>https://dev.to/googlecloud/the-lumberjack-paradox-from-theory-to-practice-2lb5</link>
      <guid>https://dev.to/googlecloud/the-lumberjack-paradox-from-theory-to-practice-2lb5</guid>
      <description>&lt;p&gt;&lt;a href="https://www.linkedin.com/posts/sigje_i-have-so-many-thoughts-about-this-interesting-ugcPost-7389735278690742272-viPE" rel="noopener noreferrer"&gt;Previously&lt;/a&gt;, I shared my thoughts on Neal Sample’s "&lt;a href="https://www.linkedin.com/pulse/challenge-all-leaders-how-do-you-create-right-culture-david-reimer-1ydbc/" rel="noopener noreferrer"&gt;lumberjack paradox&lt;/a&gt;" and the urgent need to build the systems thinkers of tomorrow. I argued that leaders must move beyond simple efficiency and focus on &lt;a href="https://www.researchgate.net/publication/227690136_Deliberate_Performance_Accelerating_Expertise_in_Natural_Settings" rel="noopener noreferrer"&gt;re-engineering the experience&lt;/a&gt; (Dr. Gary Klein) and creating context to ensure we don't lose the path to deep expertise.&lt;/p&gt;

&lt;p&gt;But what does "leadership as context creator" look like in practice?&lt;/p&gt;

&lt;p&gt;For us in Cloud DevRel Engineering, it isn't abstract. It comes down to how we manage the most fundamental unit of our developer experience: the code sample.&lt;/p&gt;

&lt;p&gt;As Neal notes, AI will lead to the "industrialization of creativity"—an infinite supply of ideas and code. In this world, the premium shifts to discernment: the ability to distinguish the quality from the mediocre.&lt;/p&gt;

&lt;p&gt;But this isn't a choice between the axe (manual craft) and the chainsaw (AI). The modern expert needs both.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;If you only have the axe, you are restricted to the problems that fit within manual reach. It is the perfect tool for the campsite, but it cannot clear the forest.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;But if you only have the chainsaw, without the judgment to guide it, you are dangerous. You lack the control to distinguish a clean cut from a destructive one.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You need the deep expertise of the axe to get the precise, consistent outcomes from the chainsaw.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;From theory to practice: The catalog as ground truth&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;In my previous post, I mentioned Dr. Richard Cook's work on "building common ground" and Donella Meadows’ warnings about &lt;a href="https://donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/" rel="noopener noreferrer"&gt;suboptimization&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;In Cloud DevRel Engineering, we realized that our code samples are the primary tool for building this common ground. In Dr. Cook’s terms, they form the "&lt;a href="https://queue.acm.org/detail.cfm?id=3380777" rel="noopener noreferrer"&gt;Line of Representation&lt;/a&gt;"—the tangible surface that connects the human "above the line" to the complex system "below the line."&lt;/p&gt;

&lt;p&gt;When a developer (the human) learns a new platform, the sample is their manual for the "axe." When an AI assistant generates a solution, the sample is the training data that guides the "chainsaw."&lt;/p&gt;

&lt;p&gt;When we looked at our systems, we saw suboptimization. By treating samples as low-priority content maintained by individual contributors, we created a fractured reality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We broke the Line of Representation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We saw this failure hit on two fronts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;We break the human judgment loop:&lt;/strong&gt; If samples are inconsistent, developers cannot learn "good" from "bad." We fail to re-engineer the experience (Dr. Klein) necessary to build expertise.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;We poison the AI well:&lt;/strong&gt; AI models ingest our official repositories. The AI learns them, scales them, and feeds them back to the user.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We are currently witnessing exactly how this hand-crafted approach fails at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The high cost of "geological strata" in code&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Without central standardization, our repositories accumulated "geological strata"—layers of outdated practices—because manual maintenance cannot keep up with language evolution. This makes it hard to know what is correct today.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node.js' paradigm tax:&lt;/strong&gt; Our Node.js repositories contain a mix of callbacks, raw promises, and async/await. A user learning Pub/Sub sees one era, while a user learning Cloud Storage sees another. The AI sees all of it and treats it all as valid, stripping away the context of "outdated" versus "modern."
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python: The contributor long tail:&lt;/strong&gt; With over 650 contributors, our Python samples suffer from extreme fragmentation. The &lt;strong&gt;total cost of ownership (TCO)&lt;/strong&gt; of manually bringing thousands of older snippets up to modern Python 3.10+ standards is astronomically high, so it simply doesn't happen. This leaves a massive surface area of "technical debt" that the AI happily recycles.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Inconsistent quality creates "false best practices"&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When samples are hand-written by federated teams, personal "developer flair" masquerades as industry best practice. Users copy-paste these patterns, inadvertently adopting technical debt.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Java's Framework creep:&lt;/strong&gt; Instead of teaching the core platform, contributors often introduce heavy frameworks for simple tasks. This increases the "time-to-hello-world" and teaches the AI that simple tasks require complex dependencies.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python vs. Go:&lt;/strong&gt; Most Go samples handle errors correctly because the language forces it. Many Python samples show only the "happy path," skipping &lt;strong&gt;critical distributed systems patterns&lt;/strong&gt; like exponential backoff or retry logic. The AI then generates code that looks clean but fails in production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The hidden cost of incoherence&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This is the "suboptimization" Donella Meadows warned about. It is not enough for individual samples to be correct in isolation; they must function as a cohesive unit.&lt;/p&gt;

&lt;p&gt;For a human developer, shifting between products that use different coding styles creates friction. They have to spend mental energy decoding the "dialect" of a specific product team rather than focusing on the logic.&lt;/p&gt;

&lt;p&gt;For an AI, this lack of cohesion is even more dangerous.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Context Gap:&lt;/strong&gt; When our samples for Cloud Storage look structurally different from our samples for BigQuery, the AI treats them as unrelated entities. It fails to learn the underlying "grammar" of our platform.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Integration Failure:&lt;/strong&gt; When a user asks for a solution that combines these products, the AI struggles to bridge the gap. Lacking a consistent pattern to follow, it often hallucinates a messy, "glue code" solution that is brittle and insecure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By allowing fragmentation, we aren't just impacting the docs; we are training the AI to misunderstand how our platform is supposed to fit together.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Get started&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We cannot view code samples as static documentation. They are the active constraints of our system—the "environment" we design for our users. If we fail to maintain them, we dull the tools that build developer judgment, and we degrade the quality of the AI they trust.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Recommended Reading&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you want to dig deeper into the systems thinking concepts behind this post, I recommend starting here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;On the "Line of Representation":&lt;/strong&gt; &lt;a href="https://queue.acm.org/detail.cfm?id=3380777" rel="noopener noreferrer"&gt;Above the Line, Below the Line&lt;/a&gt; by Dr. Richard Cook — The essential framework for understanding why we must care about the representations (like code samples) that sit between us and complex systems.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On System Failure:&lt;/strong&gt; &lt;a href="https://how.complexsystems.fail/" rel="noopener noreferrer"&gt;How Complex Systems Fail&lt;/a&gt; by Dr. Richard Cook — His classic treatise on why failure is never about a single "root cause" but the result of multiple latent factors.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On Suboptimization:&lt;/strong&gt; &lt;a href="https://donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/" rel="noopener noreferrer"&gt;Leverage Points: Places to Intervene in a System&lt;/a&gt; by Donella Meadows — The definitive essay on why optimizing parts often destroys the whole.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On Re-engineering Experience:&lt;/strong&gt; &lt;a href="https://www.researchgate.net/publication/227690136_Deliberate_Performance_Accelerating_Expertise_in_Natural_Settings" rel="noopener noreferrer"&gt;Deliberate Performance&lt;/a&gt; by Dr. Gary Klein — Research on how to build expertise when you can't stop the work to train.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Coming up next
&lt;/h2&gt;

&lt;p&gt;Next in this series, I will share our structural solution: the "Golden Path." This approach moves us away from isolated automation and towards a human-led, AI-scaled system that improves consistency.&lt;/p&gt;

&lt;p&gt;I’ll be focusing more on the strategy in this series, but the execution is its own journey. Using AI to write code is well-known, but relying on it to produce production-ready educational content? &lt;a href="https://dev.to/grayside"&gt;Adam Ross&lt;/a&gt;, and &lt;a href="https://www.linkedin.com/in/nimjay/" rel="noopener noreferrer"&gt;Nim Jayawardena&lt;/a&gt; have shared the technical reality of our team's shift in a post on their &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/7-technical-takeaways-from-using-gemini-to-generate-code-samples-at-scale?e=48754805" rel="noopener noreferrer"&gt;&lt;strong&gt;7 takeaways from generating samples at scale with Gemini&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Until then, ask yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are you trying to automate away your documentation debt without first defining a standard of quality?
&lt;/li&gt;
&lt;li&gt;Are your samples strong enough to serve as the "ground truth" for the AI models your developers rely on?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Special thanks to &lt;a href="https://dev.to/glasnt"&gt;Katie McLaughlin&lt;/a&gt;, &lt;a href="https://dev.to/grayside"&gt;Adam Ross&lt;/a&gt;, and &lt;a href="https://www.linkedin.com/in/nimjay/" rel="noopener noreferrer"&gt;Nim Jayawardena&lt;/a&gt; for reviewing early drafts of this post.&lt;/em&gt; &lt;/p&gt;

</description>
      <category>cloud</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to enable Secure Boot for your AI workloads</title>
      <dc:creator>Maciej Strzelczyk</dc:creator>
      <pubDate>Mon, 21 Jul 2025 14:43:09 +0000</pubDate>
      <link>https://dev.to/googlecloud/how-to-enable-secure-boot-for-your-ai-workloads-khm</link>
      <guid>https://dev.to/googlecloud/how-to-enable-secure-boot-for-your-ai-workloads-khm</guid>
      <description>&lt;p&gt;Written in cooperation with &lt;a href="https://www.linkedin.com/in/aroneidelman/" rel="noopener noreferrer"&gt;Aron Eidelman&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As organizations race to deploy powerful GPU-accelerated workloads, they might overlook a foundational step: ensuring the integrity of the system from the very moment it turns on. &lt;/p&gt;

&lt;p&gt;Threat actors, however, have not overlooked this. They increasingly target the boot process with sophisticated malware like bootkits, which seize control before any traditional security software can load and grant them the highest level of privilege to steal data or corrupt your most valuable AI models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; The most foundational security measure for any server is verifying its integrity the moment it powers on. This process, known as Secure Boot, is designed to stop deep-level malware that can hijack a system before its primary defenses are even awake.&lt;/p&gt;

&lt;p&gt;Secure Boot is part of Google Cloud’s &lt;a href="https://cloud.google.com/compute/shielded-vm/docs/shielded-vm?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Shielded VM&lt;/a&gt; offering, which allows you to verify the integrity of your Compute VM instances, including the VMs that handle your AI workloads. It’s the only major cloud offering of its kind that can track changes beyond initial boot out of the box and without requiring the use of separate tools or event-driven rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottom line:&lt;/strong&gt; Organizations don't have to sacrifice security for performance. There is a clear, repeatable process to sign your own GPU drivers, allowing you to lock down your infrastructure's foundation without compromising your AI workloads. &lt;/p&gt;

&lt;p&gt;Google Cloud’s Secure Boot capability can be opted into at no additional charge, and now there’s a new, easier way to set it up for your GPU-accelerated machines.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Understanding the danger of bootkits&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;It’s important to secure your systems from boot-level threats. Bootkits target the boot process, the foundation of an operating system. By compromising the bootloader and other early-stage system components, a bootkit can gain kernel-level control before the operating system and its security measures load. Malware can then operate with the highest privileges, bypassing traditional security software.&lt;/p&gt;

&lt;p&gt;This technique falls under the Persistence and Defense Evasion tactics in the &lt;a href="https://attack.mitre.org/techniques/T1542/003/" rel="noopener noreferrer"&gt;MITRE ATT&amp;amp;CK framework&lt;/a&gt;. Bootkits are difficult to detect and remove due to their low-level operation. They hide by intercepting system calls and manipulating data, persisting across reboots, stealing data, installing malware, and disabling security features. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.cisa.gov/news-events/cybersecurity-advisories/aa20-336a" rel="noopener noreferrer"&gt;Bootkits and rootkits&lt;/a&gt; pose a persistent, embedded threat, and have been observed as part of current threat actor trends from &lt;a href="https://cloud.google.com/blog/topics/threat-intelligence/china-nexus-espionage-targets-juniper-routers?e=48754805&amp;amp;utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Threat Intelligence Group&lt;/a&gt;, the &lt;a href="https://www.welivesecurity.com/2023/03/01/blacklotus-uefi-bootkit-myth-confirmed/" rel="noopener noreferrer"&gt;European Union Agency for Cybersecurity&lt;/a&gt; (ENISA), and the U.S. &lt;a href="https://www.cisa.gov/news-events/analysis-reports/ar25-087a" rel="noopener noreferrer"&gt;Cybersecurity and Infrastructure Security Agency&lt;/a&gt; (CISA). Google Cloud always works on improving the security of our solutions by strengthening our products and providing tools you can use yourself. In this article, we would like to demonstrate a new, easier way of setting up Secure Boot for your GPU-accelerated machines.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Limitations of Secure Boot with GPUs&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/compute/shielded-vm/docs/shielded-vm?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Shielded VMs&lt;/a&gt; employ a &lt;a href="https://en.wikipedia.org/wiki/Trusted_Platform_Module" rel="noopener noreferrer"&gt;TPM&lt;/a&gt; 2.0-compliant &lt;a href="https://cloud.google.com/vmware-engine/docs/vmware-ecosystem/howto-vtpm?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;virtual Trusted Platform Module&lt;/a&gt; (vTPM) as their root of trust, protected by Google Cloud's virtualization and isolation powered by &lt;a href="https://cloud.google.com/docs/security/titan-hardware-chip?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Titan chips&lt;/a&gt;. While Secure Boot enforces signed software execution, &lt;a href="https://cloud.google.com/docs/security/boot-integrity#measured-boot-process?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Measured Boot&lt;/a&gt; logs boot component measurements to the vTPM for remote attestation and integrity verification. &lt;/p&gt;

&lt;p&gt;Limitations start when you want to use a kernel module that is not part of the official distribution of your operating system. That is especially problematic for AI workloads, which rely on GPUs whose drivers are usually not part of official distributions. If you want to manually install GPU drivers on a system with Secure Boot, the system will refuse to use them because they won’t be properly signed. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to use Secure Boot on GPU-accelerated machines&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are two ways you can tell Google Cloud to trust your signature when it confirms the GPU driver validity with Secure Boot: with an automated script, or manually. &lt;/p&gt;

&lt;p&gt;The script that can help you prepare a Secure Boot compatible image is open-source and is available in our &lt;a href="https://github.com/GoogleCloudPlatform/compute-gpu-installation/tree/main/linux" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;. Here’s how you can use it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Download the newest version of the script:&lt;/span&gt;
curl &lt;span class="nt"&gt;-L&lt;/span&gt; https://storage.googleapis.com/compute-gpu-installation-us/installer/latest/cuda_installer.pyz &lt;span class="nt"&gt;--output&lt;/span&gt; cuda_installer.pyz

&lt;span class="c"&gt;# Make sure you are logged in with gcloud&lt;/span&gt;
gcloud auth login

&lt;span class="c"&gt;# Check available option for the build process&lt;/span&gt;
python3 cuda_installer.pyz build_image &lt;span class="nt"&gt;--help&lt;/span&gt;

&lt;span class="c"&gt;# Use the script to build an image based on Ubuntu 24.04&lt;/span&gt;
PROJECT &lt;span class="o"&gt;=&lt;/span&gt; your_project_name
ZONE &lt;span class="o"&gt;=&lt;/span&gt; zone_you_want_to_use
SECURE_BOOT_IMAGE &lt;span class="o"&gt;=&lt;/span&gt; name_of_the_final_image

python3 cuda_installer.pyz build_image &lt;span class="nt"&gt;--project&lt;/span&gt; &lt;span class="nv"&gt;$PROJECT&lt;/span&gt; &lt;span class="nt"&gt;--vm-zone&lt;/span&gt; &lt;span class="nv"&gt;$ZONE&lt;/span&gt; &lt;span class="nt"&gt;--base-image&lt;/span&gt; ubuntu-24 &lt;span class="nv"&gt;$SECURE_BOOT_IMAGE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script will execute each of the five steps described below for you. It may take up to 30 minutes, as the installation process takes this much time. We’ve also detailed how to use the building script in &lt;a href="https://cloud.google.com/compute/docs/gpus/install-drivers-gpu#self-signing-automated?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;our documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To manually tell Google Cloud to trust your signature, follow these five steps (also available in &lt;a href="https://cloud.google.com/compute/docs/gpus/install-drivers-gpu#self-signing-manual?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;our documentation&lt;/a&gt;):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate your own certificate to be used for signing the driver.
&lt;/li&gt;
&lt;li&gt;Create a fresh VM with the OS of your choice (Secure Boot disabled, GPU not required).
&lt;/li&gt;
&lt;li&gt;Install and sign the GPU driver (and optionally CUDA toolkit).
&lt;/li&gt;
&lt;li&gt;Create a new Disk Image based on the machine with a self-signed driver, &lt;a href="https://cloud.google.com/compute/shielded-vm/docs/creating-shielded-images#adding-shielded-image?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;adding your certificate to the list of trusted certificates&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;The new image can be now used with Secure Boot enabled VMs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Whether you used the script or performed the task manually, you’ll want to verify that the process worked. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Start a new GPU accelerated VM using the created image&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To verify that everything worked, we can create a new VM using the new disk image with the following command (we enable the Secure Boot option to verify that our process worked).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a new VM with T4 GPU to verify that everything works. Note that here ZONE needs to have T4 GPUs available.&lt;/span&gt;
TEST_INSTANCE_NAME &lt;span class="o"&gt;=&lt;/span&gt; name_of_the_test_instance

gcloud compute instances create &lt;span class="nv"&gt;$TEST_INSTANCE_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--machine-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;n1-standard-4 &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--accelerator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1,type&lt;span class="o"&gt;=&lt;/span&gt;nvidia-tesla-t4 &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--create-disk&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;auto-delete&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;yes&lt;/span&gt;,boot&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;yes&lt;/span&gt;,device-name&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$TEST_INSTANCE_NAME&lt;/span&gt;,image&lt;span class="o"&gt;=&lt;/span&gt;projects/&lt;span class="nv"&gt;$PROJECT&lt;/span&gt;/global/images/&lt;span class="nv"&gt;$SECURE_BOOT_IMAGE&lt;/span&gt;,mode&lt;span class="o"&gt;=&lt;/span&gt;rw,size&lt;span class="o"&gt;=&lt;/span&gt;100,type&lt;span class="o"&gt;=&lt;/span&gt;pd-balanced &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--shielded-secure-boot&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--shielded-vtpm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--shielded-integrity-monitoring&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--maintenance-policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;TERMINATE

&lt;span class="c"&gt;# gcloud compute ssh to run nvidia-smi and see the output&lt;/span&gt;
gcloud compute ssh &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT&lt;/span&gt; &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt; &lt;span class="nv"&gt;$TEST_INSTANCE_NAME&lt;/span&gt; &lt;span class="nt"&gt;--command&lt;/span&gt; &lt;span class="s2"&gt;"nvidia-smi"&lt;/span&gt;

&lt;span class="c"&gt;# If you decided to also install CUDA, you can verify it with the following command&lt;/span&gt;
gcloud compute ssh &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT&lt;/span&gt; &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt; &lt;span class="nv"&gt;$TEST_INSTANCE_NAME&lt;/span&gt; &lt;span class="nt"&gt;--command&lt;/span&gt; &lt;span class="s2"&gt;"python3 cuda_installer.pyz verify_cuda"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Clean up&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When you verify that the new image works, there’s no need to keep the verification VM around. You can delete it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute instances delete &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$ZONE&lt;/span&gt; &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT&lt;/span&gt; &lt;span class="nv"&gt;$TEST_INSTANCE_NAME&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Enabling Secure Boot&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Now that you have built a Secure Boot compatible base image for your GPU-based workloads, remember to actually enable Secure Boot on your VM instances when you use those images! Secure Boot is disabled by default, so it needs to be explicitly enabled for Compute Engine instances.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;When creating new instances&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you create a new instance using Cloud Console, the checkbox to enable Secure Boot can be found in the Security tab of the creation page, under the Shielded VM section.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fepx2y5a8j7gta8cn5tqz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fepx2y5a8j7gta8cn5tqz.png" alt="Google Compute Instance creation interface with " width="800" height="536"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the gcloud enthusiasts, there’s &lt;code&gt;--shielded-secure-boot&lt;/code&gt; flag available for the &lt;a href="https://cloud.google.com/sdk/gcloud/reference/compute/instances/create#--shielded-secure-boot?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;gcloud compute instances create&lt;/a&gt; command.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Updating existing instances&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;You can also enable Secure Boot for instances that already exist, however, make sure that they are running a compatible system. If the driver installed on those machines is not signed with a properly configured key, the driver will not be loaded. To update Secure Boot configuration for existing VMs, you’ll have to follow the stop, update and restart procedure, as described in this &lt;a href="https://cloud.google.com/compute/shielded-vm/docs/modifying-shielded-vm?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;documentation page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Get started&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Make sure to visit our &lt;a href="https://cloud.google.com/compute/docs/gpus/install-drivers-gpu#self-signing-automated?utm_campaign=CDR_0x73f0e2c4_default_b407730070&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;documentation page&lt;/a&gt; to learn more about the process and follow our &lt;a href="https://github.com/GoogleCloudPlatform/compute-gpu-installation" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; to stay up to date with other GPU automation news.&lt;/p&gt;

</description>
      <category>security</category>
      <category>googlecloud</category>
      <category>nvidia</category>
      <category>gpu</category>
    </item>
    <item>
      <title>Understanding Google Cloud’s Dynamic Workload Scheduler</title>
      <dc:creator>Maciej Strzelczyk</dc:creator>
      <pubDate>Tue, 01 Jul 2025 11:53:04 +0000</pubDate>
      <link>https://dev.to/googlecloud/understanding-google-clouds-dynamic-workload-scheduler-5p</link>
      <guid>https://dev.to/googlecloud/understanding-google-clouds-dynamic-workload-scheduler-5p</guid>
      <description>&lt;p&gt;In the age of artificial intelligence and machine learning, there is a constant need for powerful hardware like &lt;a href="https://cloud.google.com/compute/docs/gpus?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GPUs&lt;/a&gt; and &lt;a href="https://cloud.google.com/tpu?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;TPUs&lt;/a&gt;. Ideally, access to this hardware should be predictable and reliable. Resource availability shouldn’t be a blocker for your projects. If customers want to use a GPU, they should be provided with a GPU! After all, this is supposed to be one of the ideas behind cloud computing: to have resources available on demand. But with a limited supply of hardware, there is a need for a solution more sophisticated than simple “first come, first serve.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing DWS
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Dynamic Workload Scheduler (DWS)&lt;/strong&gt; is Google Cloud's innovative solution designed to optimize the allocation of high-demand, finite resources like GPUs and TPUs, ensuring that customer workloads can access the necessary hardware when needed. It directly addresses the supply and demand imbalance problem. On one hand, Google Cloud has customers asking for GPUs and TPUs to run their workloads. On the other hand, there’s a limited number of hardware resources that can be assigned to the customers. DWS is what balances customer demands against the finite resources of the cloud (which wants to &lt;em&gt;feel&lt;/em&gt; infinite).&lt;/p&gt;

&lt;p&gt;To the traditional model of on-demand provisioning, &lt;a href="https://cloud.google.com/solutions/spot-vms?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Spot instances&lt;/a&gt; and &lt;a href="https://cloud.google.com/compute/docs/instances/reservations-overview?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;reservations&lt;/a&gt;, DWS adds two simple, yet powerful provisioning methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/dws?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Flex Start mode&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/compute/docs/instances/future-reservations-calendar-mode-overview?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Calendar mode&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, I’ll explain the benefits of each of these DWS methods and provide practical scenarios for when you might want to use them, helping you choose the best provisioning strategy for your specific workloads. Both methods are still in preview, so you can expect their availability and scope to improve once they enter general availability later this year.&lt;/p&gt;

&lt;p&gt;If you’d rather watch a video about Dynamic Workloads Scheduler — I’ve got you covered:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/uWiO00RVQP4"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Calendar mode
&lt;/h2&gt;

&lt;p&gt;Let’s start with &lt;a href="https://cloud.google.com/compute/docs/instances/create-future-reservations-calendar-mode?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Calendar mode&lt;/a&gt;, which is a bit simpler to understand. DWS Calendar Mode allows you to create &lt;a href="https://cloud.google.com/compute/docs/instances/future-reservations-overview?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;future reservations&lt;/a&gt; for the hardware you know you will need in advance. Booking rooms in a hotel is a great analogy here. You specify the &lt;strong&gt;range of dates&lt;/strong&gt;, &lt;strong&gt;location&lt;/strong&gt;, &lt;strong&gt;type&lt;/strong&gt; and &lt;strong&gt;quantity&lt;/strong&gt; of the hardware you need and you submit your request. Like a hotel, the system checks resource availability. It then books the resources you want to reserve. Once your future reservation is approved, all you need to do is wait for the starting date. Google Cloud creates a reservation for you on the start date that you can then consume however you want (&lt;a href="https://cloud.google.com/compute/docs/instances/reservations-consume?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GCE&lt;/a&gt;, &lt;a href="https://cloud.google.com/kubernetes-engine/docs/how-to/consuming-reservations?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GKE&lt;/a&gt;, &lt;a href="https://cloud.google.com/vertex-ai/docs/training/use-reservations?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Vertex AI&lt;/a&gt;, &lt;a href="https://cloud.google.com/vertex-ai/docs/workbench/instances/reservations?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Vertex AI Workbench&lt;/a&gt; and &lt;a href="https://cloud.google.com/batch/docs/create-run-job-reservation?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Batch&lt;/a&gt; - they can all consume reservations).&lt;/p&gt;

&lt;p&gt;Once the reservation time runs out, the system will reclaim the resources, so they can be allocated to other customers. Just like in a hotel, you pay for the time you had your reservation, even if you didn’t use it 100% of the time.&lt;/p&gt;

&lt;p&gt;Here are some facts about the DWS Calendar Mode reservations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The reservation period has a fixed length of 1 to 90 days.
&lt;/li&gt;
&lt;li&gt;Currently, GPUs require a 4 day lead time before the reservation can start. TPU reservations can be submitted 24 hours in advance of the desired start time.
&lt;/li&gt;
&lt;li&gt;Once your request is accepted, you will have to pay for the full reservation period, even if not used.
&lt;/li&gt;
&lt;li&gt;Once the reservation period ends, the resources are reclaimed.
&lt;/li&gt;
&lt;li&gt;Reserved resources are &lt;a href="https://cloud.google.com/ai-hypercomputer/docs/terminology#dense-deployment?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;physically close to each other&lt;/a&gt; to minimize network latency.
&lt;/li&gt;
&lt;li&gt;Calendar Mode reservations can be &lt;a href="https://cloud.google.com/compute/docs/instances/reservations-shared?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;shared&lt;/a&gt; with other projects.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/products/dws/pricing?e=48754805&amp;amp;utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;DWS has its own pricing&lt;/a&gt;, separate from other provisioning methods. (Usually cheaper than on-demand pricing).
&lt;/li&gt;
&lt;li&gt;No quota is consumed while using resources booked through Calendar Mode reservations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, what are the best scenarios for Calendar mode? If you…&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Know how much resources you need
&lt;/li&gt;
&lt;li&gt;Know how long you need them for
&lt;/li&gt;
&lt;li&gt;Know when you want to start and finish your project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…then DWS Calendar Mode is the solution for you. Whether it’s an ML training job, HPC simulation or expected spike in the number of inference requests (isn’t Black Friday great?) - the Calendar Mode has you covered.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;So what’s the difference between regular future reservations and Calendar Mode?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;You might have seen that in Google Cloud, there are also &lt;a href="https://cloud.google.com/compute/docs/instances/future-reservations-overview?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;future reservations&lt;/a&gt; that are not related to DWS Calendar Mode. You can think of Calendar Mode reservations as a subset of the more generic future reservations. Every Calendar Mode reservation is a Future Reservation, but for a Future Reservation to be a Calendar Mode reservation, it needs to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configured to auto-delete the reservation on expiry, even if it’s not consumed.
&lt;/li&gt;
&lt;li&gt;No longer than 90 days.
&lt;/li&gt;
&lt;li&gt;Limited to certain types of resources (see &lt;a href="https://cloud.google.com/compute/docs/instances/create-future-reservations-calendar-mode?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; for up to date list)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additionally,  Calendar Mode comes with a handy assistant that helps you find available capacity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz5lp241c2om9izr80kd6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz5lp241c2om9izr80kd6.png" alt="Calendar Mode Assistant"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Flex start mode
&lt;/h2&gt;

&lt;p&gt;With Calendar mode being so great, what more may you need? Well, you don’t always have a schedule you need to keep. Sometimes you want your job finished as soon as possible. At other times, you don’t know how long it will take to complete the work. This is where Flex Start mode comes in. If Calendar mode works similar to a hotel, you can compare Flex Start mode to a restaurant.&lt;/p&gt;

&lt;p&gt;How does it work? You tell DWS that you need hardware, let’s say &lt;strong&gt;10x A4 machines&lt;/strong&gt;, to run a job that will take &lt;strong&gt;at most 6 days&lt;/strong&gt;. With that knowledge, DWS goes out to the Cloud to get you your 10 A4 machines. After some time (this is where the “flex” part comes from - it’s a flexible process) the system has the 10 A4 machines you need and provides them to you all at once. This 'all-or-nothing' approach ensures you receive the full requested capacity simultaneously. This way, you don’t have to worry about paying for unused 7 machines while you wait to create 3 more. You get all 10 at the same time. Once they are delivered to you, they will be yours until the specified time runs out, or you’re done with your task. If you release the resources before the time runs out, you pay only for the time you actually used them. Since there is no provisioning notification, ensure your workloads can start automatically upon machine creation.&lt;/p&gt;

&lt;p&gt;While Calendar mode was similar to booking rooms in a hotel, Flex Start is more akin to waiting for your order in a restaurant. You wait until your “order” is served and eat until you’re done, or the restaurant closes. If you change your mind before the order is fulfilled, you can cancel your request without any consequences. &lt;/p&gt;

&lt;p&gt;To summarize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flex Start mode requests hardware for specified periods of time from 1 minute to 7 days.
&lt;/li&gt;
&lt;li&gt;Requests are fulfilled as soon as possible. (Shorter requests tend to be fulfilled quicker)
&lt;/li&gt;
&lt;li&gt;You can cancel your request at any time; you only pay for what you used.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/products/dws/pricing?e=48754805#how-dws-pricing-works&amp;amp;utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;DWS Flex Start pricing&lt;/a&gt; offers discounts compared to on-demand provisioning.
&lt;/li&gt;
&lt;li&gt;Once the time limit of your request is reached, the resources are reclaimed.
&lt;/li&gt;
&lt;li&gt;Resources acquired through Flex Start mode consume the &lt;a href="https://cloud.google.com/compute/resource-usage#preemptible-quotas?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;preemptible quota&lt;/a&gt;, which is usually a lot higher than on-demand quota.
&lt;/li&gt;
&lt;li&gt;Works only for &lt;a href="https://cloud.google.com/compute/docs/accelerator-optimized-machines?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Accelerator-optimized machine series&lt;/a&gt; and &lt;a href="https://cloud.google.com/compute/docs/gpus/create-gpu-vm-general-purpose?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;N1 virtual machine (VM) instances with GPUs attached&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;You can't stop, suspend, or recreate the instances you create through Flex Start mode.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Flex start mode works best if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You have a short (&amp;lt; 7 days) need for resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You want your job started as soon as possible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You don’t know how long your task will take, and appreciate the flexibility to release resources early and only pay for actual usage.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How to use it?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Flex Start mode works a bit differently in every supported product.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For &lt;strong&gt;Compute Engine&lt;/strong&gt;, it comes in the form of an all-or-nothing Managed Instance Group &lt;a href="https://cloud.google.com/compute/docs/instance-groups/create-resize-requests-mig?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;resize request&lt;/a&gt; with the maximum run duration specified.
&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;Google Kubernetes Engine (GKE)&lt;/strong&gt;, it’s specified for a &lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/dws?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;workload or through scheduling tool&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;Cloud&lt;/strong&gt; &lt;strong&gt;Batch&lt;/strong&gt;, it’s available for jobs running on &lt;a href="https://cloud.google.com/batch/docs/create-run-job-gpus#select-provisioning-method?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;specific machine types&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;For Vertex AI, specify FLEX_START as your scheduling strategy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Happy computing!
&lt;/h2&gt;

&lt;p&gt;When it comes to getting your hands on high-demand hardware for your advanced workloads, Google Cloud's Dynamic Workload Scheduler has you covered. With its Calendar and Flex Start modes, you get powerful and flexible solutions that truly fit your needs. By digging into these new provisioning methods, you can count on predictable, reliable, and efficient access to essential resources like GPUs and TPUs. This means your AI, ML, and HPC projects will run smoother than ever. &lt;a href="https://console.cloud.google.com/compute/futureReservations/add?utm_campaign=CDR_0x73f0e2c4_default_b423037559&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Try booking some powerful machines for your next project now&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>googlecloud</category>
      <category>tpu</category>
      <category>gpu</category>
    </item>
    <item>
      <title>Developing in the (Google) Cloud</title>
      <dc:creator>Maciej Strzelczyk</dc:creator>
      <pubDate>Thu, 26 Jun 2025 13:46:24 +0000</pubDate>
      <link>https://dev.to/googlecloud/developing-in-the-google-cloud-57c6</link>
      <guid>https://dev.to/googlecloud/developing-in-the-google-cloud-57c6</guid>
      <description>&lt;p&gt;As I entered the office today, it was clear that physical desktop computers are becoming a rarity. Most desks were equipped only with monitors, reflecting a significant shift in how many organizations, including Google, are approaching employee workstations. Historically, developers might have received both a desktop and a laptop. However, the trend is now towards providing only high-tier laptops, with heavy workloads and software development tasks offloaded to virtual workstations hosted in the cloud. This approach offers enhanced control over assets, improved security, and streamlined management for organizations.&lt;/p&gt;

&lt;p&gt;This cloud-centric approach offers substantial benefits for organizations aiming to equip their employees with powerful development environments without the complexities of procuring and maintaining physical desktops. Beyond the immediate advantage of remote work flexibility, where employees can be fully productive with just a laptop and a stable internet connection, cloud-based workstations offer significant scalability. They allow organizations to rapidly provision and de-provision resources as needed, ensuring developers always have access to the optimal computing power, including high-end GPU-accelerated environments that traditional laptops simply cannot provide for demanding industry needs.&lt;/p&gt;

&lt;p&gt;There are two ways your organization can leverage this model using Google Cloud Platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Compute Engine
&lt;/h2&gt;

&lt;p&gt;Google Compute Engine (GCE) provides an Infrastructure as a Service (IaaS) approach to creating virtual workstations through highly configurable virtual machines. This solution offers unparalleled flexibility, granting you complete control over virtually every aspect of your development environment. You can choose your preferred operating system, machine type (including CPU, memory, and specialized hardware), storage solutions, and install any software or tools required. This level of customization makes GCE an excellent choice for a variety of use cases, including:&lt;/p&gt;

&lt;h3&gt;
  
  
  Heavy graphics
&lt;/h3&gt;

&lt;p&gt;Once you create a virtual machine equipped with a powerful GPU, you can work with demanding graphical applications. Designing complicated systems and models, programming games or rendering videos - all this heavy lifting can happen in the datacenter, while your computer only has to handle the decoding of the remote desktop stream. To fully leverage the remote desktop experience of those setups, you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pick a GPU that supports &lt;a href="https://cloud.google.com/compute/docs/gpus#gpu-virtual-workstations?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;NVIDIA RTX Virtual Workstations (vWS) for graphics workloads&lt;/a&gt;. That means L4, T4, P4 or P100 accelerators. A new &lt;a href="https://cloud.google.com/blog/products/compute/introducing-g4-vm-with-nvidia-rtx-pro-6000?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;G4 machine type&lt;/a&gt; hosting NVIDIA RTX PRO 6000 Blackwell cards should be available by the end of 2025.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/compute/docs/gpus/install-grid-drivers?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Install RTX-compatible GPU drivers&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Select the remote desktop software you want to use to access the machines. There are many available options like &lt;a href="https://anyware.hp.com/" rel="noopener noreferrer"&gt;HP Anywhere&lt;/a&gt;, &lt;a href="https://parsec.app/" rel="noopener noreferrer"&gt;Parsec&lt;/a&gt; or &lt;a href="https://moonlight-stream.org/" rel="noopener noreferrer"&gt;Moonlight&lt;/a&gt; to name a few.
&lt;/li&gt;
&lt;li&gt;Ensure the Internet connection on your client side is fast and reliable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Computation intensive (like AI)
&lt;/h3&gt;

&lt;p&gt;Google Cloud offers really powerful GPUs that can empower your team to effortlessly tackle many AI challenges. With no need for high-quality graphical interface, access to machines in this category can be even limited to an SSH tunnel. The developer can run their favourite IDE on their laptop, while executing the code remotely in the cloud. Depending on the GPU you pick, the pricing of such workstations will vary greatly. Good news though, a single machine can be easily shared between multiple developers with proper configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  General development
&lt;/h3&gt;

&lt;p&gt;Developers who don’t need GPU-powered machines to do their jobs, still can benefit from a remote, powerful environment. More RAM, CPUs and storage is really easy to obtain to exceed what even the best laptops can provide.&lt;/p&gt;

&lt;h3&gt;
  
  
  Considerations
&lt;/h3&gt;

&lt;p&gt;When working with GCE VMs, it is crucial to pay special attention to both the security and cost optimization of these machines. Failing to properly configure these aspects can lead to vulnerabilities or unnecessary expenses. Here are some key considerations (this list is &lt;strong&gt;not exhaustive&lt;/strong&gt;):&lt;/p&gt;

&lt;h4&gt;
  
  
  Security Best Practices
&lt;/h4&gt;

&lt;p&gt;1) &lt;strong&gt;Service Accounts&lt;/strong&gt;: Avoid using the &lt;a href="https://cloud.google.com/compute/docs/access/service-accounts#compute_engine_service_account?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;default compute Service Account&lt;/a&gt;, which comes with an overly permissive Editor role. Instead, create new service accounts with the principle of least privilege, assigning only the minimal required permissions for your workloads. For individual users, consider creating dedicated service accounts.&lt;br&gt;&lt;br&gt;
2) &lt;strong&gt;Network Access&lt;/strong&gt;: Consider disabling external IPs for your VMs. For internet access, configure &lt;a href="https://cloud.google.com/nat/docs/overview?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud NAT&lt;/a&gt;. For secure remote access, leverage &lt;a href="https://cloud.google.com/network-connectivity/docs/vpn/concepts/overview?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud VPN&lt;/a&gt; or &lt;a href="https://cloud.google.com/security/products/iap?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Identity-Aware Proxy (IAP)&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
3) &lt;strong&gt;Firewall Policies&lt;/strong&gt;: Implement stringent &lt;a href="https://cloud.google.com/firewall/docs/firewall-policies-overview?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;firewall policies&lt;/a&gt; to control inbound and outbound traffic, ensuring only necessary ports and protocols are open.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cost Optimization Strategies
&lt;/h4&gt;

&lt;p&gt;1) &lt;strong&gt;Commitment-based Discounts&lt;/strong&gt;: Take advantage of &lt;a href="https://cloud.google.com/docs/cuds?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Committed Use Discounts (CUDs)&lt;/a&gt; for predictable workloads, which can substantially reduce costs over long-term commitments.&lt;br&gt;&lt;br&gt;
2) &lt;strong&gt;Automated Scheduling&lt;/strong&gt;: Implement &lt;a href="https://cloud.google.com/compute/docs/instances/schedule-instance-start-stop?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;VM instance scheduling&lt;/a&gt; to automatically stop workstations during off-hours (e.g., overnight or weekends), minimizing resource consumption when not in use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Cloud Workstations
&lt;/h2&gt;

&lt;p&gt;If all your team needs is the computation power of cloud instances and not a full graphical connection, then &lt;a href="https://cloud.google.com/workstations?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Workstations&lt;/a&gt; might be just for you (&lt;a href="https://www.youtube.com/watch?v=E1cblFqb8nk" rel="noopener noreferrer"&gt;video explainer&lt;/a&gt;). It’s a managed solution, which allows you to create virtual workstations that your team can connect to and use for development. Those instances can be based on many different &lt;a href="https://cloud.google.com/workstations/docs/available-machine-types?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;machine types&lt;/a&gt;, including &lt;a href="https://cloud.google.com/workstations/docs/available-gpus?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GPU-accelerated ones&lt;/a&gt;. You can choose to use them through Code OSS (Visual Studio), multiple JetBrains IDEs through JetBrains Gateway or Posit Workbench (with RStudio Pro).&lt;/p&gt;

&lt;p&gt;Workstations allow you to &lt;a href="https://cloud.google.com/workstations/docs/customize-container-images?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;customize the developer environments&lt;/a&gt;, so that each new instance comes with all the necessary tools preinstalled. Users can be allowed to create and destroy their own environments, while you retain the control over the allowed configurations of those environments.&lt;/p&gt;

&lt;p&gt;Despite being a bit more expensive than “raw” Compute Engine instances, the managed Workstations might turn out to be cheaper in use than Compute Instances, as they allow you to &lt;a href="https://cloud.google.com/workstations/docs/create-configuration#define_machine_settings?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;configure&lt;/a&gt; auto-sleep and auto-shutdown settings, so resources are not wasted when the workstations are not used.&lt;/p&gt;

&lt;p&gt;Cloud Workstations offer a wide variety of &lt;a href="https://cloud.google.com/workstations/docs/customize-development-environment?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;customization options&lt;/a&gt; and &lt;a href="https://cloud.google.com/workstations/docs/set-up-security-best-practices?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;security configurations&lt;/a&gt;. While not as flexible as simple Virtual Machines, the Workstations might be more attractive due to easier management, strict control and out-of-the-box compatibility with popular coding solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  In summary
&lt;/h2&gt;

&lt;p&gt;Google Cloud offers virtual workstation solutions for all kinds of developer needs. Here’s a short summary table, highlighting various applications of GCE and Workstations.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;
&lt;a href="https://cloud.google.com/products/compute?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Compute Engine&lt;/a&gt; (unmanaged VMs)&lt;/th&gt;
&lt;th&gt;&lt;a href="https://cloud.google.com/workstations/docs/overview?utm_campaign=CDR_0x73f0e2c4_default_b427179257&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Workstations&lt;/a&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Graphical-heavy work&lt;/strong&gt; Designing Gaming Game development Video editing&lt;/td&gt;
&lt;td&gt;GPU-accelerated VMs offer great performance when paired with proper virtual workspace software.&lt;/td&gt;
&lt;td&gt;N/A - Cloud Workstations don’t support this kind of work.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;AI and HPC workloads&lt;/strong&gt; AI training AI inference GPU-powered simulations&lt;/td&gt;
&lt;td&gt;GPU-accelerated VMs can make use of every GPU-type available in Google Cloud. Sharing a big VM between multiple developers is a valid approach.&lt;/td&gt;
&lt;td&gt;Cloud Workstations support GPU-accelerated machine types, allowing developers to work on software that requires GPU-acceleration.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;General workloads&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;While regular VMs can work for hosting a workstation for this kind of applications, it might not be worth the management effort.&lt;/td&gt;
&lt;td&gt;Cloud Workstations work great as a platform for developers who need a remote cloud-based environment to work on their projects. With the majority of management hassle taken care of, you are free to just work on your project.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Embrace the future of development today by exploring the powerful virtual workstation solutions offered by Google Cloud. While Compute Engine provides unbridled flexibility, Cloud Workstations offer streamlined efficiency. Unlock enhanced productivity and simplified asset management for your team. Start your cloud development journey now and discover the perfect environment for your needs.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>remote</category>
      <category>gcp</category>
    </item>
    <item>
      <title>Observability in Action: A Google Cloud Next demo</title>
      <dc:creator>Olivier Bourgeois</dc:creator>
      <pubDate>Mon, 05 May 2025 18:17:47 +0000</pubDate>
      <link>https://dev.to/googlecloud/observability-in-action-a-google-cloud-next-demo-2fkb</link>
      <guid>https://dev.to/googlecloud/observability-in-action-a-google-cloud-next-demo-2fkb</guid>
      <description>&lt;p&gt;It was only a few weeks ago that over 32,000 cloud practitioners from all over the world came together in Las Vegas to attend &lt;a href="https://cloud.withgoogle.com/next/25" rel="noopener noreferrer"&gt;Google Cloud Next 2025&lt;/a&gt;. Beyond the keynotes, the workshops, and the multiple jam-packed tracks of talks and sessions, an entire expo hall offered attendees the opportunity to observe or play around with more than 500 live demos. Let’s check out one of these demos!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2j0xjh3gevygsas1jd0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2j0xjh3gevygsas1jd0.png" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of the demo
&lt;/h2&gt;

&lt;p&gt;The main goals of the Observability in Action demo were twofold. We wanted to showcase various ways of interacting with metrics and logs. And we wanted to give attendees a little bit of an interactive experience. For the interactive part of the demo, we utilized various oversized physical buttons and pedals that could be used to select answers or confirm inputs.&lt;/p&gt;

&lt;p&gt;The flow of the demo was as followed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We ask the attendee to type in a prompt that they wanted sent to an AI model.
&lt;/li&gt;
&lt;li&gt;The prompt was sent in the background to three different models: Gemma 3 on Cloud Run, Gemini 2.0 Flash on Vertex AI, and Gemini 2.0 Flash-Lite on Vertex AI. This generated logs and metrics.
&lt;/li&gt;
&lt;li&gt;The attendee was then given a short quiz about these three models. Each quiz input also generated logs and metrics.
&lt;/li&gt;
&lt;li&gt;At the end of the quiz, we give the attendee a rundown of their answers, and then flip over to the Google Cloud Console.
&lt;/li&gt;
&lt;li&gt;In Cloud Monitoring, we showcase the various native metrics that Cloud Run offers, custom metrics implemented using OpenTelemetry, as well as the Cloud Trace functionality.
&lt;/li&gt;
&lt;li&gt;Finally, we turn to BigQuery to showcase how we can mirror logs to a database for further analysis using Jupyter Notebooks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7r633avlam0n78b2rqyv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7r633avlam0n78b2rqyv.png" width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;While the demo frontend runs locally, the backend is deployed as a Cloud Run instance. This instance is then talking to Gemini through the Vertex AI SDK and to Gemma through its own Cloud Run instance. The persistent state of the demo resides in a Firestore database. All Cloud Run logs are mirrored to BigQuery using a simple sink.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcn41o28fet53zgsc05wv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcn41o28fet53zgsc05wv.png" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing metrics using Cloud Monitoring
&lt;/h2&gt;

&lt;p&gt;Cloud Monitoring provides visibility into the performance and health of your cloud applications and infrastructure. It collects metrics, events, and metadata from Google Cloud services and other sources, allowing you to visualize this data on dashboards and create alerts for critical issues. This is useful for proactively identifying and resolving problems, optimizing resource utilization, improving uptime, and understanding system behavior, ultimately leading to more reliable and cost-effective applications.&lt;/p&gt;

&lt;p&gt;For services like Cloud Run which we’re using for the backend of this demo, Cloud Monitoring automatically collects a wide array of native metrics without any setup needed. This includes data points such as request latency, count, container CPU and memory usage, and instance counts. This out-of-the-box integration means developers get immediate insights into their serverless application's performance and resource consumption, simplifying troubleshooting and optimization efforts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52tl8uurp6v1a0fu170k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52tl8uurp6v1a0fu170k.png" alt="Image description" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloud Trace is a distributed tracing system within Google Cloud that helps you understand request latency across your application and its services. It tracks how long different parts of your application take to process requests, visualizing the entire request flow. This is particularly valuable for identifying performance bottlenecks in microservices architectures by showing where time is spent during a request's lifecycle.&lt;/p&gt;

&lt;p&gt;Here’s a real life example: In this demo we send a prompt to multiple models. We were sure we implemented concurrency correctly (so the calls to the three different models should’ve happened in parallel) yet the latency seems significantly higher than expected. When we dug into the trace of a call, we quickly realized that we were accidentally making those calls sequentially! These traces were made available to us via an OpenTelemetry instrumentation we added to our code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1e7t3t9mw6m0r2b40xnz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1e7t3t9mw6m0r2b40xnz.png" alt="Image description" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Interact with your logs with BigQuery
&lt;/h2&gt;

&lt;p&gt;BigQuery is a serverless enterprise data warehouse that enables super-fast SQL queries on large datasets without infrastructure management. It's built for scalable analytics, supports diverse data types, and integrates machine learning, offering a powerful platform for insights from real-time and historical data.&lt;/p&gt;

&lt;p&gt;With &lt;a href="https://cloud.google.com/logging/docs/export/configure_export_v2" rel="noopener noreferrer"&gt;a simple sink&lt;/a&gt;, you can directly stream logs from Cloud Logging into BigQuery, transforming it into a powerful, long-term log analytics platform. This allows you to run complex SQL queries across extensive historical log data, which is invaluable for in-depth security audits, compliance, and identifying subtle operational trends.&lt;/p&gt;

&lt;p&gt;Connecting BigQuery to Jupyter Notebooks further enhances log analysis capabilities. This empowers users to leverage Python and data science libraries for advanced data exploration, custom visualizations, and machine learning on log data, facilitating deeper insights and shareable, interactive analysis beyond standard logging tools.&lt;/p&gt;

&lt;p&gt;For this demo, we &lt;a href="https://github.com/GoogleCloudDevRel/next25-observability-in-action/blob/main/Log_Exploration_in_BigQuery.ipynb" rel="noopener noreferrer"&gt;built a Jupyter Notebook&lt;/a&gt; that did analysis on the various interactive quiz events, cross-referenced answers with an external Firestore database, and built tables and charts of the resulting data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvh6tfxzgqrc1x0srebfh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvh6tfxzgqrc1x0srebfh.png" alt="Image description" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it out!
&lt;/h2&gt;

&lt;p&gt;Want to try this demo from home? The source code is &lt;a href="https://github.com/GoogleCloudDevRel/next25-observability-in-action" rel="noopener noreferrer"&gt;available on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Want to learn more about observability on Google Cloud? Check out these resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/stackdriver/docs" rel="noopener noreferrer"&gt;Documentation: Observability in Google Cloud&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.cloudskillsboost.google/course_templates/864" rel="noopener noreferrer"&gt;Online course: Observability in Google Cloud&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>googlecloud</category>
      <category>observability</category>
      <category>googlecloudnext</category>
      <category>bigquery</category>
    </item>
    <item>
      <title>Getting started with Rust on Google Cloud</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Thu, 27 Mar 2025 04:29:49 +0000</pubDate>
      <link>https://dev.to/googlecloud/getting-started-with-rust-on-google-cloud-4hln</link>
      <guid>https://dev.to/googlecloud/getting-started-with-rust-on-google-cloud-4hln</guid>
      <description>&lt;p&gt;This post will guide you through deploying a simple “Hello, World!” application on &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;. You’ll then extend the application by showing how to integrate with Google Cloud services with experimental &lt;a href="https://github.com/googleapis/google-cloud-rust" rel="noopener noreferrer"&gt;Rust client libraries&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I’ll cover the necessary code, Dockerfile configuration, and deployment steps. I’ll also recommend a robust and scalable stack for building web services, especially when combined with Google Cloud’s serverless platform, Cloud Run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AX_eDJ5lRKkKc64Ut" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AX_eDJ5lRKkKc64Ut" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Rust and Axum?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.rust-lang.org/" rel="noopener noreferrer"&gt;Rust&lt;/a&gt; has gained significant traction in backend development, earning the title of &lt;a href="https://survey.stackoverflow.co/2024/technology#2-programming-scripting-and-markup-languages" rel="noopener noreferrer"&gt;most-admired language&lt;/a&gt; in the StackOverflow 2024 Developer Survey. This popularity stems from its core strengths: performance, memory safety, and reliability. Rust’s low-level control and zero-cost abstractions enable &lt;a href="https://nnethercote.github.io/perf-book/title-page.html" rel="noopener noreferrer"&gt;highly performant&lt;/a&gt; applications. Its &lt;a href="https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html" rel="noopener noreferrer"&gt;ownership system&lt;/a&gt; prevents common programming errors like data races and null pointer dereferences. In addition, Rust’s strong &lt;a href="https://doc.rust-lang.org/reference/type-system.html" rel="noopener noreferrer"&gt;type system&lt;/a&gt; and compile-time checks catch errors early in the development process, leading to more reliable software.&lt;/p&gt;

&lt;p&gt;The Rust web framework ecosystem is vibrant and evolving. Popular choices include &lt;a href="https://github.com/tokio-rs/axum" rel="noopener noreferrer"&gt;Axum&lt;/a&gt;, &lt;a href="https://rocket.rs/" rel="noopener noreferrer"&gt;Rocket&lt;/a&gt;, and &lt;a href="https://github.com/actix/actix-web" rel="noopener noreferrer"&gt;Actix&lt;/a&gt;. In this post, I’ll showcase &lt;a href="https://github.com/tokio-rs/axum" rel="noopener noreferrer"&gt;Axum&lt;/a&gt;, but you can apply what you’ve learned here to other Rust web frameworks. Axum’s API is clear and composable, making it easy to build web services. Its modular architecture allows developers to select only the necessary components. Axum is built on &lt;a href="https://tokio.rs/" rel="noopener noreferrer"&gt;Tokio&lt;/a&gt;, a popular asynchronous runtime for Rust, which allows it to handle concurrency and I/O operations efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hello World Application
&lt;/h3&gt;

&lt;p&gt;Let’s start by exploring a basic “Hello, World!” &lt;a href="https://github.com/tokio-rs/axum/tree/main/examples/hello-world" rel="noopener noreferrer"&gt;example&lt;/a&gt; from the official Axum repository. In each section of this blog post, you will enhance the example to leverage Google Cloud capabilities. You can access the final code sample in the &lt;a href="https://github.com/kweinmeister/cloud-rust-example" rel="noopener noreferrer"&gt;cloud-rust-example&lt;/a&gt; repository.&lt;/p&gt;

&lt;p&gt;First, the &lt;a href="https://github.com/tokio-rs/axum/blob/main/examples/hello-world/Cargo.toml" rel="noopener noreferrer"&gt;Cargo.toml&lt;/a&gt; manifest file defines the project’s metadata and dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[package]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"example-hello-world"&lt;/span&gt;
&lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1.0"&lt;/span&gt;
&lt;span class="py"&gt;edition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"2021"&lt;/span&gt;
&lt;span class="py"&gt;publish&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;axum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"../../axum"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;tokio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"full"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Within this file, you see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[package]&lt;/code&gt;: Contains basic project information like name, version, and the Rust edition. &lt;code&gt;publish = false&lt;/code&gt; prevents accidental publication.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[dependencies]&lt;/code&gt;: Lists the project’s dependencies — &lt;code&gt;axum&lt;/code&gt; for the web framework and &lt;code&gt;tokio&lt;/code&gt; for asynchronous capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, let’s examine the core application code, &lt;a href="https://github.com/tokio-rs/axum/blob/main/examples/hello-world/src/main.rs" rel="noopener noreferrer"&gt;src/main.rs&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;axum&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="nn"&gt;response&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;routing&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="nd"&gt;#[tokio::main]&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// build our application with a route&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="c1"&gt;// run it&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;net&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;TcpListener&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"127.0.0.1:3000"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;.await&lt;/span&gt;
        &lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"listening on {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="nf"&gt;.local_addr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="nn"&gt;axum&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Html&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;'static&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;Html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;h1&amp;gt;Hello, World!&amp;lt;/h1&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code sets up a minimal web server using Axum and Tokio. The #[tokio::main] macro enables asynchronous execution. The main function creates a Router to handle requests, defines a single route / that responds with “Hello, World!”, binds the server to 127.0.0.1:3000, and starts the server. The handler function generates the HTML response for the root route.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhancements for Cloud Run
&lt;/h3&gt;

&lt;p&gt;The basic example above works well for local development, but let’s make some improvements for deploying to Cloud Run. The official example notably does &lt;em&gt;not&lt;/em&gt; include a Dockerfile, which is required for Cloud Run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Standalone Deployment:&lt;/strong&gt; To make the example standalone and deployable, modify the Cargo.toml file. Change the axum dependency from &lt;code&gt;axum = { path = “../../axum” }&lt;/code&gt; to &lt;code&gt;axum = “0.8”&lt;/code&gt; to use the published version of Axum from &lt;a href="http://crates.io" rel="noopener noreferrer"&gt;crates.io&lt;/a&gt; instead of the local path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Dynamic Port Configuration:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud Run dynamically assigns a port to your application, which is provided through the PORT environment variable. The original example hardcodes the port to 3000. To make our application Cloud Run-compatible, modify the main function to read the PORT environment variable and use it if available, falling back to a default port such as 8080 if the variable is not set.&lt;/p&gt;

&lt;p&gt;The address should also be changed to 0.0.0.0 to listen on all network interfaces, which is generally preferred for containerized applications.&lt;/p&gt;

&lt;p&gt;Here’s the modified main function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[tokio::main]&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Get the port from the environment, defaulting to 8080&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PORT"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.unwrap_or_else&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="s"&gt;"8080"&lt;/span&gt;&lt;span class="nf"&gt;.to_string&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0.0.0.0:{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// build our application with a route&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="c1"&gt;// run it&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;net&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;TcpListener&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;.await&lt;/span&gt;
        &lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"listening on {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="nf"&gt;.local_addr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="nn"&gt;axum&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Dockerfile:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To deploy to Cloud Run, you’ll need a Dockerfile. Here’s a simple one that works well for this example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; rust:1.85.1&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;cargo build &lt;span class="nt"&gt;--release&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["./target/release/example-hello-world"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Dockerfile uses the official &lt;a href="https://hub.docker.com/_/rust" rel="noopener noreferrer"&gt;Rust image&lt;/a&gt; as a base, copies the project files, builds the application in release mode, exposes port 8080 (&lt;a href="https://cloud.google.com/run/docs/container-contract#port?utm_campaign=CDR_default_0x80ca756c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;the default port&lt;/a&gt;), and sets the command to run the compiled executable. You can upgrade to the latest Rust image if you’d like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. .gcloudignore file:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can also add a .gcloudignore file to the project root to exclude unnecessary files (like the target directory containing build artifacts) from the deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.git/
.gitignore
target/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploying to Cloud Run
&lt;/h3&gt;

&lt;p&gt;Before deploying, ensure you have the &lt;a href="https://cloud.google.com/sdk/docs/install-sdk?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud SDK&lt;/a&gt; installed and configured, and you have &lt;a href="https://console.cloud.google.com/flows/enableapi?apiid=run.googleapis.com&amp;amp;utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;enabled the Cloud Run API&lt;/a&gt; in your Google Cloud project. You’ll also need to be in the root directory of your Axum project (where the Cargo.toml file is located).&lt;/p&gt;

&lt;p&gt;Before attempting your deployment, you can &lt;a href="https://doc.rust-lang.org/cargo/commands/cargo-check.html" rel="noopener noreferrer"&gt;check&lt;/a&gt; the local package and deployment for errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo check
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To deploy directly to Cloud Run &lt;a href="https://cloud.google.com/run/docs/deploying-source-code?utm_campaign=CDR_default_0x80ca756c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;from source&lt;/a&gt;, use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy cloud-rust-example &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s what each part of the command means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gcloud run deploy cloud-rust-example&lt;/code&gt;: This is the base command to deploy a service to Cloud Run. &lt;code&gt;cloud-rust-example&lt;/code&gt; is the name we’re giving to our service. You can choose a different name.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;—-source .&lt;/code&gt;: This flag tells Cloud Run where to find the source code for your application. The . indicates the current directory. Cloud Run will use the Dockerfile in this directory to build a container image.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;—-region us-central1&lt;/code&gt;: This specifies the Google Cloud region where your service will be deployed. In this case, we’re using us-central1. You can choose a region closer to your users for lower latency.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;—-allow-unauthenticated&lt;/code&gt;: This flag makes your deployed service publicly accessible without requiring authentication. This is convenient for initial testing and simple public services. &lt;strong&gt;For production applications, you should remove this flag and implement proper authentication and authorization.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud Run will automatically build and deploy your application. You will be provided with a service URL in the output. Accessing this URL in your browser will display the “Hello, World!” message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F573%2F0%2AQJmbsamFXPgavNTB" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F573%2F0%2AQJmbsamFXPgavNTB" width="573" height="135"&gt;&lt;/a&gt;&lt;/p&gt;
Hello world output from / route



&lt;h3&gt;
  
  
  Integrating with Google Cloud Services
&lt;/h3&gt;

&lt;p&gt;Let’s now show how to integrate our application with Google Cloud services. I’ve selected a straightforward scenario that doesn’t require any project configuration to work. You’ll add a new application route &lt;code&gt;/project&lt;/code&gt; that will display information about your project.&lt;/p&gt;

&lt;p&gt;To implement this, you’ll use the &lt;a href="https://github.com/googleapis/google-cloud-rust" rel="noopener noreferrer"&gt;google-cloud-rust&lt;/a&gt; library to interact with the &lt;a href="https://cloud.google.com/resource-manager/docs?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Resource Manager&lt;/a&gt; API and retrieve information about your Google Cloud project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The google-cloud-rust library is currently experimental. APIs may change, and it’s important to stay updated with the latest releases and documentation.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Add Dependencies&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;First, add the Resource Manager v3 API and &lt;a href="https://docs.rs/reqwest/latest/reqwest/" rel="noopener noreferrer"&gt;reqwest&lt;/a&gt; HTTP client to your Cargo.toml file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo add google-cloud-resourcemanager-v3 reqwest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;Implement the handler&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;There are four key changes we’ll need to make in src/main.rs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Add /project Route:&lt;/strong&gt; A new route &lt;code&gt;/project&lt;/code&gt; will display project information, implemented by &lt;code&gt;project_handler()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/project"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_handler&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.layer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;sync&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;project_handler function:&lt;/strong&gt; The project handler will call &lt;a href="https://docs.rs/google-cloud-resourcemanager-v3/latest/google_cloud_resourcemanager_v3/client/struct.Projects.html#method.get_project" rel="noopener noreferrer"&gt;get_project()&lt;/a&gt; to fetch project details. Finally, it formats the project information into an HTML response. Error handling is included to display any errors that occur during the API call.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;project_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;Extension&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Arc&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Projects&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Html&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Project ID not initialized"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;project_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"projects/{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="nf"&gt;.get_project&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.send&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;project_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="py"&gt;.name&lt;/span&gt;&lt;span class="nf"&gt;.strip_prefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"projects/"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.unwrap_or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Unknown"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

            &lt;span class="nf"&gt;Html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s"&gt;"&amp;lt;h1&amp;gt;Project Info&amp;lt;/h1&amp;gt;&amp;lt;ul&amp;gt;&amp;lt;li&amp;gt;Name: &amp;lt;code&amp;gt;{}&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;ID: &amp;lt;code&amp;gt;{}&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;Number: &amp;lt;code&amp;gt;{}&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="py"&gt;.display_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;project_number&lt;/span&gt;
            &lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;h1&amp;gt;Error getting project info: {}&amp;lt;/h1&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Share client with handler:&lt;/strong&gt; For best performance, any one-time configuration should not reside in the handler. The &lt;a href="https://docs.rs/google-cloud-resourcemanager-v3/latest/google_cloud_resourcemanager_v3/client/struct.Projects.html#" rel="noopener noreferrer"&gt;Projects&lt;/a&gt; client can be initialized in main() and then shared with the handler with Axum’s &lt;a href="https://docs.rs/axum/latest/axum/struct.Extension.html" rel="noopener noreferrer"&gt;Extension&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add helper function for project metadata&lt;/strong&gt; : To find out the project ID the container is running in, you’ll need to access the &lt;a href="https://cloud.google.com/resource-manager/docs?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;metadata key&lt;/a&gt;. That project ID will then be used to call the Resource Manager API to get more &lt;a href="https://docs.rs/google-cloud-resourcemanager-v3/latest/google_cloud_resourcemanager_v3/model/struct.Project.html" rel="noopener noreferrer"&gt;information about the project&lt;/a&gt;, including its display name and creation time. You can use &lt;a href="https://doc.rust-lang.org/std/sync/struct.LazyLock.html" rel="noopener noreferrer"&gt;LazyLock&lt;/a&gt; to initialize the project only once.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OnceLock&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;OnceLock&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_project_id&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="nf"&gt;.expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to get project ID"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="nf"&gt;.set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to set PROJECT_ID"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;get_project_id&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GOOGLE_CLOUD_PROJECT"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;reqwest&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://metadata.google.internal/computeMetadata/v1/project/project-id"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;
        &lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.header&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Metadata-Flavor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Google"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.send&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="nf"&gt;.status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.is_success&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="nf"&gt;.text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="nf"&gt;.map_err&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="nf"&gt;.to_string&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Metadata server returned error: {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="nf"&gt;.status&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error querying metadata server: {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;Set GOOGLE_CLOUD_PROJECT Environment Variable (Locally)&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;For local testing, you’ll need to set the &lt;code&gt;GOOGLE_CLOUD_PROJECT&lt;/code&gt; environment variable to your Google Cloud project ID. You can do this in your terminal before running the application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-project-id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;your-project-id&lt;/code&gt; with your actual project ID. Cloud Run will automatically set this environment variable when deployed.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Enable the Resource Manager API&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;If you haven’t already, make sure to enable the &lt;a href="https://console.cloud.google.com/apis/api/cloudresourcemanager.googleapis.com/overview?utm_campaign=CDR_default_0xd368824c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Resource Manager API&lt;/a&gt; within your Google Cloud project.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Provide Resource Manager IAM access&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;You will need to provide the &lt;a href="https://cloud.google.com/resource-manager/docs/access-control-proj#permissions?utm_campaign=CDR_default_0xd368824c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;resourcemanager.projects.get&lt;/a&gt; role to the appropriate &lt;a href="https://cloud.google.com/run/docs/securing/service-identity#types-of-service-accounts?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run service account&lt;/a&gt;. The instructions here use the Compute Engine default service account. If you are running locally, you’ll also need to provide these permissions to your account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redeploy to Cloud Run&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use the same &lt;code&gt;gcloud run deploy&lt;/code&gt; command as before to redeploy your updated application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy cloud-rust-example &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, when you visit the service URL provided by Cloud Run and navigate to the &lt;code&gt;/project&lt;/code&gt; path, you should see information about your Google Cloud project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F716%2F0%2AojC486ePfJcfZ30r" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F716%2F0%2AojC486ePfJcfZ30r" width="716" height="440"&gt;&lt;/a&gt;&lt;/p&gt;
Project information output from /project route



&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;This guide demonstrates the process of deploying a Rust Axum application on Cloud Run. I started with a basic “Hello, World!” example from the Axum repository, explained its code, and then showed how to enhance it for Cloud Run compatibility by dynamically configuring the port and creating a Dockerfile. By combining Rust and Axum with Cloud Run’s serverless simplicity, you can efficiently build and deploy robust web services. The sample source code is available in the &lt;a href="https://github.com/kweinmeister/cloud-rust-example" rel="noopener noreferrer"&gt;cloud-rust-example&lt;/a&gt; repository.&lt;/p&gt;

&lt;p&gt;For more information about Cloud Run, I recommend the &lt;a href="https://cloud.google.com/run/docs/quickstarts/build-and-deploy/deploy-service-other-languages?utm_campaign=CDR_default_0x80ca756c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;quickstart&lt;/a&gt; for building and deploying a web application in the documentation. Also, check out &lt;a href="https://www.youtube.com/watch?v=rOMroL3mhO4" rel="noopener noreferrer"&gt;this video&lt;/a&gt; for a video walkthrough of running Rust on Cloud Run. Feel free to connect on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt; to continue the discussion!&lt;/p&gt;




</description>
      <category>dockerfiles</category>
      <category>web</category>
      <category>axum</category>
      <category>rust</category>
    </item>
    <item>
      <title>Polish Large Language Model (PLLuM) on Google Cloud</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Mon, 17 Mar 2025 08:44:00 +0000</pubDate>
      <link>https://dev.to/googlecloud/pllum-na-google-cloud-5c6</link>
      <guid>https://dev.to/googlecloud/pllum-na-google-cloud-5c6</guid>
      <description>&lt;p&gt;"Wpadła śliwka w .... Google Cloud" 😉&lt;/p&gt;

&lt;p&gt;Recently, thanks to the Ministry of Digital Affairs, there's been a lot of buzz about the new Polish Large Language Model (PLLuM). I decided to play around with it a bit and show others how to run it on Google Cloud using Vertex AI.&lt;/p&gt;

&lt;p&gt;I invite you to check out &lt;a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/serving/vertex_ai_pytorch_inference_pllum_with_custom_handler.ipynb" rel="noopener noreferrer"&gt;this notebook&lt;/a&gt;, which will guide you through this process step by step.&lt;/p&gt;

&lt;p&gt;Let me know in the comments what applications you see for this new open model.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>locallama</category>
      <category>llm</category>
      <category>googlecloud</category>
    </item>
  </channel>
</rss>
