<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Remigiusz Samborski</title>
    <description>The latest articles on DEV Community by Remigiusz Samborski (@rsamborski).</description>
    <link>https://dev.to/rsamborski</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2829111%2Fd4264501-e5df-440a-af46-f1549d9ecba1.jpg</url>
      <title>DEV Community: Remigiusz Samborski</title>
      <link>https://dev.to/rsamborski</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rsamborski"/>
    <language>en</language>
    <item>
      <title>Secure Gemini CLI for Cloud development</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Fri, 13 Mar 2026 06:01:17 +0000</pubDate>
      <link>https://dev.to/googleai/secure-gemini-cli-for-cloud-development-2mpe</link>
      <guid>https://dev.to/googleai/secure-gemini-cli-for-cloud-development-2mpe</guid>
      <description>&lt;p&gt;AI agents are a double-edged sword. You hear horror stories of autonomous tools deleting production databases or purging entire email inboxes. These risks often lead users to require manual confirmation for every agent operation. This approach keeps you in control but limits the agent's autonomy. You will soon find yourself hand-holding the agent and hindering its true capabilities. You need a way to let the agent run in "yolo mode" without risking your system.&lt;/p&gt;

&lt;p&gt;In this blog you will learn how to secure your &lt;a href="https://geminicli.com/" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt; in a way that will allow you to run it in an isolated environment with limited &lt;a href="https://github.com/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and &lt;a href="https://cloud.google.com/?utm_campaign=CDR_0x87fa8d40_default_b490349988&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud&lt;/a&gt; access while not worrying that it will do too much damage if things go wrong. We will follow the least privilege pattern to make sure Gemini CLI has all necessary permissions to build your project, but at the same time can’t access systems it shouldn’t touch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sandbox premise
&lt;/h2&gt;

&lt;p&gt;The solution consists of following components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Using &lt;a href="https://docs.cloud.google.com/iam/docs/service-account-overview?utm_campaign=CDR_0x87fa8d40_default_b490349988&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GitHub fine-grained personal access tokens&lt;/a&gt; - limits source control risks.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.cloud.google.com/iam/docs/service-account-overview?utm_campaign=CDR_0x87fa8d40_default_b490349988&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud service account&lt;/a&gt; - limits cloud risks.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; - limits local system risks.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://geminicli.com/docs/cli/session-management/#session-limits" rel="noopener noreferrer"&gt;Session limits&lt;/a&gt; - avoid surprises with the number of used tokens (especially important when running in &lt;code&gt;--yolo&lt;/code&gt; mode).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Following this approach will protect you from the '&lt;strong&gt;helpful agent curse&lt;/strong&gt;' - it’s a situation when the agent tries very hard to achieve a task by finding ways around blockers. Examples include: granting itself more permissions, copying files to the current folder to edit them, and many more.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub fine-grained personal access tokens
&lt;/h3&gt;

&lt;p&gt;First let’s limit agent’s GitHub exposure by leveraging the fine grained tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to GitHub Settings &amp;gt; Developer Settings &amp;gt; Personal access tokens &amp;gt; &lt;a href="https://github.com/settings/personal-access-tokens" rel="noopener noreferrer"&gt;Fine-grained tokens&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;Click &lt;em&gt;Generate a new token&lt;/em&gt;.
&lt;/li&gt;
&lt;li&gt;Provide a descriptive &lt;em&gt;name&lt;/em&gt; for your token and consider using &lt;em&gt;expiration date&lt;/em&gt; to force rotations on a regular basis.
&lt;/li&gt;
&lt;li&gt;Restrict &lt;em&gt;Repository access&lt;/em&gt; to the specific target repo you are working on.
&lt;/li&gt;
&lt;li&gt;Grant &lt;em&gt;Read and Write&lt;/em&gt; permissions for &lt;em&gt;Contents&lt;/em&gt;.
&lt;/li&gt;
&lt;li&gt;Save the token locally by running &lt;code&gt;export GITHUB_TOKEN="github_pat_..."&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Google Cloud Service Account
&lt;/h3&gt;

&lt;p&gt;Create an isolated Service Account (SA) with minimal permissions. This prevents the agent from accessing protected resources and other projects.&lt;/p&gt;

&lt;p&gt;Run these commands after updating &lt;code&gt;YOUR_PROJECT_ID&lt;/code&gt; and roles below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set your project ID&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLOUDSDK_CORE_PROJECT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"YOUR_PROJECT_ID"&lt;/span&gt;
gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;project &lt;span class="nv"&gt;$CLOUDSDK_CORE_PROJECT&lt;/span&gt;

&lt;span class="c"&gt;# Create the Service Account&lt;/span&gt;
gcloud iam service-accounts create gemini-cli-sa &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Isolated account for Gemini CLI"&lt;/span&gt;

&lt;span class="c"&gt;# Grant minimal roles (adjust roles as needed)&lt;/span&gt;
gcloud projects add-iam-policy-binding &lt;span class="nv"&gt;$CLOUDSDK_CORE_PROJECT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
   &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:gemini-cli-sa@&lt;/span&gt;&lt;span class="nv"&gt;$CLOUDSDK_CORE_PROJECT&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/aiplatform.user"&lt;/span&gt;

&lt;span class="c"&gt;# Generate the JSON key file&lt;/span&gt;
gcloud iam service-accounts keys create sa-key.json &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--iam-account&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;gemini-cli-sa@&lt;span class="nv"&gt;$CLOUDSDK_CORE_PROJECT&lt;/span&gt;.iam.gserviceaccount.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Hint: you can use the &lt;a href="https://docs.cloud.google.com/iam/docs/roles-permissions?utm_campaign=CDR_0x87fa8d40_default_b490349988&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;IAM roles and permissions index&lt;/a&gt; page to easily find the roles to grant.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A good practice is to use a dedicated project for each of your AI coding initiatives. This way you can run several agents in parallel. They will build different solutions without worrying about stepping on each other's toes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom Docker Build
&lt;/h3&gt;

&lt;p&gt;The Gemini CLI uses a sandbox image to isolate the execution environment. You must customize this image to install gcloud, terraform, vim and set git configuration.&lt;/p&gt;

&lt;h4&gt;
  
  
  Prepare the Dockerfile
&lt;/h4&gt;

&lt;p&gt;Create a &lt;code&gt;.gemini&lt;/code&gt; directory in your project, and inside it, create a &lt;code&gt;sandbox.Dockerfile&lt;/code&gt;. Using this specific file name allows Gemini CLI to automatically detect and build your custom sandbox profile if you’re &lt;a href="//?tab=t.0#bookmark=id.5tyzxboz7enm"&gt;running it from source&lt;/a&gt; and you can also use it to build the image manually if you’re &lt;a href="//?tab=t.0#bookmark=id.l2l4uuiqql87"&gt;running a binary installation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Paste this content in the &lt;code&gt;.gemini/sandbox.Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start from the official Gemini CLI sandbox image with proper version&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; GEMINI_CLI_VERSION 0.33.0&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; us-docker.pkg.dev/gemini-code-dev/gemini-cli/sandbox:${GEMINI_CLI_VERSION}&lt;/span&gt;

&lt;span class="c"&gt;# Switch to root to install system dependencies (gcloud)&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;

&lt;span class="c"&gt;# Install Google Cloud SDK, Git, and prerequisites&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; curl apt-transport-https ca-certificates gnupg git &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main"&lt;/span&gt; | &lt;span class="nb"&gt;tee&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; /etc/apt/sources.list.d/google-cloud-sdk.list &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key &lt;span class="nt"&gt;--keyring&lt;/span&gt; /usr/share/keyrings/cloud.google.gpg add - &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; google-cloud-cli

&lt;span class="c"&gt;# Install Terraform&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; wget lsb-release &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    wget &lt;span class="nt"&gt;-O-&lt;/span&gt; https://apt.releases.hashicorp.com/gpg | gpg &lt;span class="nt"&gt;--dearmor&lt;/span&gt; | &lt;span class="nb"&gt;tee&lt;/span&gt; /usr/share/keyrings/hashicorp-archive-keyring.gpg &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"deb [arch=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;dpkg &lt;span class="nt"&gt;--print-architecture&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-oP&lt;/span&gt; &lt;span class="s1"&gt;'(?&amp;lt;=UBUNTU_CODENAME=).*'&lt;/span&gt; /etc/os-release &lt;span class="o"&gt;||&lt;/span&gt; lsb_release &lt;span class="nt"&gt;-cs&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; main"&lt;/span&gt; | &lt;span class="nb"&gt;tee&lt;/span&gt; /etc/apt/sources.list.d/hashicorp.list &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; terraform

&lt;span class="c"&gt;# Install vim&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; vim

&lt;span class="c"&gt;# Switch back to the non-root user (the official sandbox image uses 'node' as the default user)&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; node&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /workspace&lt;/span&gt;

&lt;span class="c"&gt;# Configure Git to use the injected GitHub PAT at runtime&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;git config &lt;span class="nt"&gt;--global&lt;/span&gt; credential.helper &lt;span class="s1"&gt;'!f() { echo "username=x-access-token"; echo "password=$GITHUB_TOKEN"; }; f'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Prepare docker for building images (optional MacOS step)
&lt;/h4&gt;

&lt;p&gt;If you haven’t built any Docker images before then run following commands to prepare your environment with &lt;code&gt;brew&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install dependencies&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;docker colima docker-buildx

&lt;span class="c"&gt;# Configure docker-buildx&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.docker/cli-plugins
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-sfn&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;brew &lt;span class="nt"&gt;--prefix&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/opt/docker-buildx/bin/docker-buildx ~/.docker/cli-plugins/docker-buildx

&lt;span class="c"&gt;# Start colima service&lt;/span&gt;
brew services start colima

&lt;span class="c"&gt;# Update DOCKER_HOST (you might want to add this line to .bash_profile):&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DOCKER_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"unix://&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;HOME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/.colima/default/docker.sock"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Build the image (binary installation)
&lt;/h4&gt;

&lt;p&gt;If you installed Gemini CLI with &lt;code&gt;npm&lt;/code&gt;, &lt;code&gt;brew&lt;/code&gt; or any other binary method then you will need to manually build the Docker image and tag it as a default one that Gemini CLI is looking for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get the base name the CLI looks for&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IMAGE_BASE_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-docker.pkg.dev/gemini-code-dev/gemini-cli/sandbox"&lt;/span&gt;

&lt;span class="c"&gt;# Get your currently installed Gemini CLI version (e.g., 0.33.0)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_CLI_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gemini &lt;span class="nt"&gt;--version&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Combine them&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IMAGE_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;IMAGE_BASE_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GEMINI_CLI_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Build your custom sandbox image&lt;/span&gt;
docker build &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--build-arg&lt;/span&gt; &lt;span class="nv"&gt;GEMINI_CLI_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$GEMINI_CLI_VERSION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-t&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;IMAGE_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-f&lt;/span&gt; .gemini/sandbox.Dockerfile &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Important: this image will be tagged with the exact version of the Gemini CLI you use. This means it needs to be rebuilt every time you update the CLI. I keep the above code in a shell script to run it after every update.&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Build the image (source installation)
&lt;/h4&gt;

&lt;p&gt;If you’re running your Gemini CLI from source as explained &lt;a href="https://geminicli.com/docs/get-started/installation/#run-from-source-recommended-for-gemini-cli-contributors" rel="noopener noreferrer"&gt;here&lt;/a&gt;. You can trigger the image build automatically each time you start &lt;code&gt;gemini&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;First update the top part of your sandbox.Dockerfile by substituting the FROM line with following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start from the official Gemini CLI sandbox image (source installation)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; gemini-cli-sandbox&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Start Gemini CLI in the sandbox mode
&lt;/h2&gt;

&lt;p&gt;First set couple very important environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Export the necessary environment variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"github_pat_..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-api-key"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLOUDSDK_CORE_PROJECT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"YOUR_PROJECT_ID"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_SANDBOX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker

&lt;span class="c"&gt;# We keep the ENV variables for our dynamic credentials&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SANDBOX_FLAGS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
-e GITHUB_TOKEN=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
-e CLOUDSDK_CORE_PROJECT=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CLOUDSDK_CORE_PROJECT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
-e CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;/sa-key.json"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: You can put the above variables in a shell script to speed up starts in the future. Just make sure to update your &lt;code&gt;.gitignore&lt;/code&gt; to keep it and &lt;code&gt;sa-key.json&lt;/code&gt; from getting added to your repository.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now you can start your Gemini CLI with following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For binary installation&lt;/span&gt;
gemini

&lt;span class="c"&gt;# For source installation&lt;/span&gt;
&lt;span class="nv"&gt;BUILD_SANDBOX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1 gemini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Session limits
&lt;/h3&gt;

&lt;p&gt;To avoid surprises with the number of tokens that Gemini CLI uses in your session, you can use the Max Session Turns in &lt;code&gt;/settings&lt;/code&gt; or your &lt;code&gt;~/.gemini/settings.json&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frmikj41wnhuz4dt73gky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frmikj41wnhuz4dt73gky.png" alt="Max Session Turns" width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Congratulations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Congratulations 🚀&lt;/strong&gt;You’re ready to validate your setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Validation and "Ultimate Tests"
&lt;/h2&gt;

&lt;p&gt;Once the environment is launched within a sandbox we should verify the security boundaries. &lt;/p&gt;

&lt;p&gt;First let’s run the &lt;code&gt;/about&lt;/code&gt; command to see if we’re running within a sandbox. You should see something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvy0utgxku5aajiv8n3t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvy0utgxku5aajiv8n3t.png" alt="About with sandbox" width="800" height="284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let’s try to break out from our new sandbox.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub privilege escalation
&lt;/h3&gt;

&lt;p&gt;Try asking Gemini CLI to access a private repo it shouldn’t have access. Example prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Clone https://github.com/USER_NAME/PRIVATE_REPOSITORY to a new folder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see how Gemini CLI really tries and struggles to get access. Mine got really creative at trying to access the repo with &lt;code&gt;git&lt;/code&gt;, &lt;code&gt;gh&lt;/code&gt;, &lt;code&gt;curl&lt;/code&gt; and even tried to reuse the &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; manually. All these tries failed and this error was displayed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlzibm3xkqf561kdu5u2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlzibm3xkqf561kdu5u2.png" alt="GitHub privilege escalation test" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Cloud privilege escalation
&lt;/h3&gt;

&lt;p&gt;Ask the agent to list all compute instances:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;List all my compute instances
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It should fail due to missing permissions on your restricted Service Account. Gemini CLI tries really hard and executes couple different commands including reauthentication, but it fails at the end:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqkltvcxnsmh7y3ikl7t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqkltvcxnsmh7y3ikl7t.png" alt="Google Cloud privilege escalation test" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Local privilege escalation
&lt;/h3&gt;

&lt;p&gt;Finally let’s try to access a file from another project folder by prompting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;There are other projects in the folder above the current one. List them and let me know if there is anything that is interesting from hacker's perspective.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I am starting to feel sorry for the poor agent 😉 Once again it can’t complete its task:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbq7t4260zre91hsr7kdq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbq7t4260zre91hsr7kdq.png" alt="Local privilege escalation test" width="800" height="166"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;Now that you have validated your sandbox setup you should feel much more confident to run &lt;code&gt;gemini  --yolo&lt;/code&gt; and streamline your work as &lt;a href="https://geminicli.com/" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt; delivers your code without hand-holding and pesky &lt;code&gt;Can I execute this command?&lt;/code&gt; Prompts.&lt;/p&gt;

&lt;p&gt;I am looking forward to all the creative ideas you’ll bring to life!&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next?
&lt;/h2&gt;

&lt;p&gt;If you find this setup useful here are some additional steps to consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try out &lt;a href="https://github.com/gemini-cli-extensions/conductor" rel="noopener noreferrer"&gt;Gemini CLI Conductor Extension&lt;/a&gt; - it’s very powerful and can significantly help you run autonomous agents effectively. &lt;a href="https://medium.com/google-cloud/trying-out-the-new-conductor-extension-in-gemini-cli-0801f892e2db" rel="noopener noreferrer"&gt;Here is a deep dive&lt;/a&gt; into some of the advantages.
&lt;/li&gt;
&lt;li&gt;Read my &lt;a href="https://medium.com/google-cloud/antigravity-the-ralph-wiggum-style-ee6784a78237" rel="noopener noreferrer"&gt;Antigravity the Ralph Wiggum style&lt;/a&gt; which covers sandboxing for &lt;a href="https://antigravity.google/?utm_campaign=CDR_0x87fa8d40_default_b490349988&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;Add emojis to this post to help others find it.
&lt;/li&gt;
&lt;li&gt;Share this post with your friends on socials.
&lt;/li&gt;
&lt;li&gt;Connect with me via &lt;a href="https://www.linkedin.com/in/remigiusz-samborski/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/RemikSamborski" rel="noopener noreferrer"&gt;X&lt;/a&gt; or &lt;a href="https://bsky.app/profile/rsamborski.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks for reading.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>devtool</category>
      <category>coding</category>
    </item>
    <item>
      <title>Agent Factory Recap: Antigravity and Nano Banana Pro with Remik</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Mon, 02 Feb 2026 12:18:03 +0000</pubDate>
      <link>https://dev.to/googleai/agent-factory-recap-antigravity-and-nano-banana-pro-with-remik-55c5</link>
      <guid>https://dev.to/googleai/agent-factory-recap-antigravity-and-nano-banana-pro-with-remik-55c5</guid>
      <description>&lt;p&gt;In Episode #17 of the &lt;a href="https://youtube.com/playlist?list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;feature=shared" rel="noopener noreferrer"&gt;Agent Factory podcast&lt;/a&gt;, we step away from the purely theoretical and get our hands dirty with the latest developer tools from Google. Together with Vlad we take a deep dive into Antigravity and Nano Banana Pro, demonstrating how to build AI agents that bridge the gap between code generation and high-fidelity media.&lt;/p&gt;

&lt;p&gt;This post guides you through the &lt;strong&gt;key ideas&lt;/strong&gt; from our conversation. Use it to quickly recap topics or dive deeper into specific segments with &lt;strong&gt;links and timestamps&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introductions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Antigravity - a new agentic mission control
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=XCGbDx7aSks&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=122s" rel="noopener noreferrer"&gt;02:02&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt; is Google’s new agent-first development application. It is designed as a multi-window IDE that uses &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/get-started-with-gemini-3" rel="noopener noreferrer"&gt;Gemini 3&lt;/a&gt; under the hood to manage complex, asynchronous coding tasks. Unlike a standard text editor, Antigravity features a dedicated Agent Manager view where developers can interact with agents through planning modes, artifact reviews, and built-in browsers for live UI testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nano Banana Pro and why it’s so special
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=XCGbDx7aSks&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=227s" rel="noopener noreferrer"&gt;03:47&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model behind our slide generator is Nano Banana Pro (the technical name is Gemini 3 Pro Image model). What sets Nano Banana Pro apart is its ability to "think" before it creates. It uses &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/grounding/grounding-with-google-search" rel="noopener noreferrer"&gt;Google Search grounding&lt;/a&gt; to retrieve real-time data, like current weather or live stock charts, and integrates that information into the generated image.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Factory Floor
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Agent architecture
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=XCGbDx7aSks&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=401s" rel="noopener noreferrer"&gt;06:41&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We dove into the architecture for building the slide generator agent using a combination of the &lt;a href="https://github.com/google/adk-python" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt;, &lt;a href="//antigravity.google"&gt;Antigravity&lt;/a&gt;, and &lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/nano-banana-pro-available-for-enterprise" rel="noopener noreferrer"&gt;Nano Banana Pro&lt;/a&gt;. Moreover we explained how these components interconnect to support the flow from a user prompt to a high-fidelity presentation by leveraging a Model Context Protocol (MCP) server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Starter Pack
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=XCGbDx7aSks&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=604s" rel="noopener noreferrer"&gt;10:04&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We started by introducing &lt;a href="https://github.com/GoogleCloudPlatform/agent-starter-pack" rel="noopener noreferrer"&gt;Agent Starter Pack&lt;/a&gt; and using it to spin up a new ADK project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vibe Coding a Slide Generator Agent
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=XCGbDx7aSks&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=896s" rel="noopener noreferrer"&gt;14:56&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using Antigravity’s "Always Review" mode, we submitted a prompt to build a completely new slides agent which will use a &lt;a href="https://cloud.google.com/discover/what-is-model-context-protocol?e=48754805" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt; to create detailed images.&lt;/p&gt;

&lt;p&gt;We highlighted how Antigravity handles tasks and implementation plan reviews, allowing the developer to modify the agent's proposed steps before it executes them. We also dove into automated testing with Antigravity’s web browser plugin functionality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;p&gt;Timestamp: &lt;a href="https://www.youtube.com/watch?v=XCGbDx7aSks&amp;amp;list=PLIivdWyY5sqLXR1eSkiM5bE6pFlXC-OSs&amp;amp;index=1&amp;amp;t=1945s" rel="noopener noreferrer"&gt;32:25&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The slides generator agent successfully called the MCP tools to generate a series of images stored in a Cloud Storage bucket, presenting the final links directly in the UI. Antigravity reviewed the results and to our surprise immediately suggested additional improvements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap up section
&lt;/h2&gt;

&lt;p&gt;We are moving into an era where building an agent isn't just about the LLM; it's about the orchestration of tools, planning, and high-fidelity output. Antigravity is our agent manager and Nano Banana Pro acts as an asset creator.&lt;/p&gt;

&lt;p&gt;We can’t wait to see what our viewers create next with these powerful tools!&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources &amp;amp; links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Antigravity → &lt;a href="https://goo.gle/3XMvGwf" rel="noopener noreferrer"&gt;https://goo.gle/3XMvGwf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Nano Banana Pro → &lt;a href="https://goo.gle/3KMmKUG" rel="noopener noreferrer"&gt;https://goo.gle/3KMmKUG&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Agent starter pack → &lt;a href="https://goo.gle/4oLfypT" rel="noopener noreferrer"&gt;https://goo.gle/4oLfypT&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Agent Development Kit (Python) → &lt;a href="https://goo.gle/4p0Pszm" rel="noopener noreferrer"&gt;https://goo.gle/4p0Pszm&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  GitHub with demo code → &lt;a href="https://goo.gle/48N83IN" rel="noopener noreferrer"&gt;https://goo.gle/48N83IN&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Connect with us
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Remik → &lt;a href="https://www.linkedin.com/in/remigiusz-samborski/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/RemikSamborski" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://github.com/rsamborski" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Vlad → &lt;a href="https://www.linkedin.com/in/vkolesnikov/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/vladkol" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://github.com/vladkol" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>google</category>
      <category>tooling</category>
    </item>
    <item>
      <title>Antigravity the Ralph Wiggum style</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Mon, 02 Feb 2026 12:05:20 +0000</pubDate>
      <link>https://dev.to/googleai/antigravity-the-ralph-wiggum-style-362o</link>
      <guid>https://dev.to/googleai/antigravity-the-ralph-wiggum-style-362o</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s9cchaaum7tfmr9diln.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s9cchaaum7tfmr9diln.jpeg" alt="Antigravity the Ralph Wiggum Style&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://ghuntley.com/ralph/" rel="noopener noreferrer"&gt;Ralph Wiggum trend&lt;/a&gt; has been surfacing across social platforms lately. If you're tracking current tech developments, it’s hard to miss. Named after a persistent and slightly confused second-grader, the Wiggum Loop agentic development boils down to: &lt;strong&gt;Don't stop until the job is done.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In traditional AI coding, the agent performs a task, stops, and waits for you to approve its next step or request changes. In a Wiggum Loop, you give the agent a mission and success criteria (like passing tests), and it keeps looping, fixing its own bugs and refactoring - until it hits the green light.&lt;/p&gt;

&lt;p&gt;The recent excitement around the Wiggum Loop agentic development highlights a powerful shift: achieving autonomous, self-correcting development. I've been leveraging a similar approach effectively with &lt;a href="http://antigravity.google" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt; for some time already. In this post, I’ll share my strategy, enabling you to implement true unsupervised development yourself.&lt;/p&gt;
&lt;h2&gt;
  
  
  Going "Full Wiggum"
&lt;/h2&gt;

&lt;p&gt;To achieve true unsupervised development, we need to move away from the review-driven defaults and let the agent take the wheel. Antigravity is uniquely built for this because it's an agent-first environment capable of acting in both the terminal and the browser.&lt;/p&gt;

&lt;p&gt;To mirror the “Bash loop" persistence of the Ralph Wiggum plugin, configure your &lt;a href="https://antigravity.google/docs/agent-modes-settings" rel="noopener noreferrer"&gt;Antigravity settings&lt;/a&gt; as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Mode:&lt;/strong&gt; Select Agent-driven development. This shifts the agent from a "wait for instructions" assistant to a "goal-oriented" architect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terminal execution policy:&lt;/strong&gt; Set to Always Proceed. This allows the agent to run &lt;code&gt;npm test&lt;/code&gt;, &lt;code&gt;uv run pytests&lt;/code&gt;, and other commands without constantly pausing for approval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review policy:&lt;/strong&gt; Set to Always Proceed. This tells the agent that its implementation plans are pre-approved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JavaScript execution policy:&lt;/strong&gt; Set to Always Proceed. This is essential for agents that need to run scripts or interact with browser environments to verify their work.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hs06xb8565kltigm7v1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hs06xb8565kltigm7v1.png" alt="Antigravity settings&amp;lt;br&amp;gt;
"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;WARNING: THE SANDBOX IS NOT OPTIONAL.&lt;/strong&gt; Running an agent in "Always Proceed" mode is like giving Bart Simpson a slingshot in front of a mirror store. &lt;strong&gt;Only do this in a sandbox environment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is &lt;a href="https://medium.com/google-cloud/using-chrome-remote-desktop-to-run-antigravity-on-a-cloud-workstation-or-just-in-a-container-d00296425a0f" rel="noopener noreferrer"&gt;a great article&lt;/a&gt; from my &lt;a href="https://medium.com/@danistrebel" rel="noopener noreferrer"&gt;colleague&lt;/a&gt; which shows a step-by-step guide to setting such an environment up and running on &lt;a href="https://cloud.google.com/workstations?utm_campaign=CDR_0x87fa8d40_default_b478855277&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Workstation&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;To see this in practice, I ran the following prompt against Antigravity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a REST API for todos in NodeJS.

When complete:
- All CRUD endpoints are working
- Input validation is in place
- Tests are passing (coverage &amp;gt; 80%)
- README with API docs exists
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The screencast below shows how Antigravity handled the task without my interruptions (I spent this time on other tasks rather than handholding the agent):&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/HLLCn4KsNcc"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  How does this work?
&lt;/h2&gt;

&lt;p&gt;Antigravity isn't just looping in a vacuum. Because it has native hooks into &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/get-started-with-gemini-3?utm_campaign=CDR_0x87fa8d40_default_b478855277&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini 3 Pro&lt;/a&gt;, it utilizes a massive context window that remembers exactly why a previous command failed. &lt;/p&gt;

&lt;p&gt;It kicks things off by drafting up an implementation plan and a task list. In the video, you can watch it tick through these items in real time. It doesn't just plan, though - it actually touches the terminal to initialize the npm project and run tests.&lt;/p&gt;

&lt;p&gt;The loop only closes once every requirement is met and the test suite hits green. It then provides a handy walkthrough so you can easily understand the architecture it just spun up.&lt;/p&gt;

&lt;p&gt;This approach turns development from writing code into verifying outcomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  From vibe-coding to vibe-building
&lt;/h2&gt;

&lt;p&gt;The Ralph Wiggum trend isn't about cutting corners; it's about embracing sheer, stubborn persistence through automation. By letting Antigravity operate autonomously, you transition from a coder to an architect and team lead. You define the standards and environment, while agents manage the iterative grind of writing, testing, and debugging cycles that typically consume a developer's valuable time.&lt;/p&gt;

&lt;p&gt;Are you brave enough to let the agent "Always Proceed"? Visit &lt;a href="https://antigravity.google/download" rel="noopener noreferrer"&gt;Antigravity’s download page&lt;/a&gt; to start experimenting yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://youtube.com/shorts/j5v1HDB15AQ?si=ZkXJQF8GBLt8NTAE" rel="noopener noreferrer"&gt;Billy’s Ralph Wiggum loop with Gemini CLI&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://medium.com/google-cloud/using-chrome-remote-desktop-to-run-antigravity-on-a-cloud-workstation-or-just-in-a-container-d00296425a0f" rel="noopener noreferrer"&gt;Daniel’s Antigravity on Cloud Workstation tutorial&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://medium.com/google-cloud/tutorial-getting-started-with-google-antigravity-b5cc74c103c2" rel="noopener noreferrer"&gt;Romin’s getting started with Google Antigravity tutorial&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Let’s Connect!
&lt;/h2&gt;

&lt;p&gt;I’d love to hear how you’re using Antigravity for your agentic workflows. Are you building Wiggum loops or keeping a tighter leash on your agents?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Connect on &lt;a href="https://www.linkedin.com/in/remigiusz-samborski/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Follow me on &lt;a href="https://x.com/RemikSamborski" rel="noopener noreferrer"&gt;X&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Catch me on &lt;a href="https://bsky.app/profile/rsamborski.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>coding</category>
      <category>antigravity</category>
      <category>agents</category>
      <category>programming</category>
    </item>
    <item>
      <title>Serverless AI: EmbeddingGemma with Cloud Run</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Thu, 25 Sep 2025 09:29:59 +0000</pubDate>
      <link>https://dev.to/googleai/serverless-ai-embeddinggemma-with-cloud-run-5ee7</link>
      <guid>https://dev.to/googleai/serverless-ai-embeddinggemma-with-cloud-run-5ee7</guid>
      <description>&lt;p&gt;Building on the &lt;a href="https://dev.to/googlecloud/serverless-ai-qwen3-embeddings-with-cloud-run-4h7b"&gt;previous blog post&lt;/a&gt; about running Qwen3 Embedding models on Cloud Run, this article focuses on the recently released EmbeddingGemma model from the &lt;a href="https://ai.google.dev/gemma/docs?utm_campaign=CDR_0x87fa8d40_platform_b443676501&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemma family&lt;/a&gt;. Discover how to leverage the same powerful serverless techniques to deploy this model on Google Cloud's serverless platform.&lt;/p&gt;

&lt;p&gt;You will learn how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Containerize the embedding model with Docker and Ollama&lt;/li&gt;
&lt;li&gt;  Deploy the embedding model to Cloud Run with GPUs&lt;/li&gt;
&lt;li&gt;  Test the deployed model from a local machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before we dive into the code, let's briefly discuss the core components that power this serverless AI solution.&lt;/p&gt;

&lt;h3&gt;
  
  
  EmbeddingGemma Model
&lt;/h3&gt;

&lt;p&gt;According to the &lt;a href="https://ai.google.dev/gemma/docs/embeddinggemma?utm_campaign=CDR_0x87fa8d40_platform_b443676501&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;EmbeddingGemma model card&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;“EmbeddingGemma is a 308M parameter multilingual text embedding model based on Gemma 3. It is optimized for use in everyday devices, such as phones, laptops, and tablets. The model produces numerical representations of text to be used for downstream tasks like information retrieval, semantic similarity search, classification, and clustering.”&lt;/p&gt;

&lt;p&gt;Its optimization for efficiency makes EmbeddingGemma an ideal candidate for serverless deployment on Cloud Run, ensuring high performance and cost-effectiveness for your AI applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud Run
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x87fa8d40_platform_b443676501&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; is a managed compute platform on Google Cloud that lets you run containerized applications in a serverless environment. Think of it as a middle ground between a simple function-as-a-service (like Cloud Run Functions) and a more customizable GKE cluster. You give it a container image, and it handles all the underlying infrastructure, from provisioning and scaling to managing the runtime.&lt;/p&gt;

&lt;p&gt;The beauty of Cloud Run is that it can automatically scale to zero, meaning when there are no requests, you aren't paying for any resources. When traffic picks up, it quickly scales up to handle the load. This makes it perfect for stateless models that need to be highly available and cost-effective.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;p&gt;Let's walk through the deployment process step-by-step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prepare the environment
&lt;/h3&gt;

&lt;p&gt;First lets configure the gcloud CLI environment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: if you do not have gcloud CLI installed please follow instructions &lt;a href="https://cloud.google.com/sdk/docs/install?utm_campaign=CDR_0x87fa8d40_platform_b443676501&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;available here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Step 1 - Set your default project:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;project PROJECT_ID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  Step 2 - Configure Google Cloud CLI to use the _europe-west1 _region for Cloud Run commands:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;run/region europe-west1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important: at the time of writing, GPUs on Cloud Run are available in several regions. To check the closest supported region please refer to &lt;a href="https://cloud.google.com/run/docs/configuring/services/gpu?utm_campaign=CDR_0x87fa8d40_platform_b443676501&amp;amp;utm_medium=external&amp;amp;utm_source=blog#supported-regions" rel="noopener noreferrer"&gt;this page&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Containerize
&lt;/h3&gt;

&lt;p&gt;Now we will use &lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; and &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; to run the EmbeddingGemma model. Create a file named &lt;code&gt;Dockerfile&lt;/code&gt; containing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM ollama/ollama:latest

# Listen on all interfaces, port 8080
ENV OLLAMA_HOST=0.0.0.0:8080

# Store model weight files in /models
ENV OLLAMA_MODELS=/models

# Reduce logging verbosity
ENV OLLAMA_DEBUG=false

# Never unload model weights from the GPU
ENV OLLAMA_KEEP_ALIVE=-1

# Store the model weights in the container image
ENV MODEL=embeddinggemma:latest
RUN ollama serve &amp;amp; sleep 5 &amp;amp;&amp;amp; ollama pull $MODEL

# Start Ollama
ENTRYPOINT ["ollama", "serve"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Build and Deploy
&lt;/h3&gt;

&lt;p&gt;We will now use Cloud Run's source deployments. This allows you to achieve the following with one command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  First, compile the container image from the provided source.&lt;/li&gt;
&lt;li&gt;  Next, upload the resulting container image to an &lt;a href="https://cloud.google.com/artifact-registry/docs?utm_campaign=CDR_0x87fa8d40_platform_b443676501&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Artifact Registry&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;  Then, deploy the container to Cloud Run, ensuring that GPU support is enabled using the &lt;code&gt;--gpu&lt;/code&gt; and &lt;code&gt;--gpu-type&lt;/code&gt; parameters.&lt;/li&gt;
&lt;li&gt;  Finally, redirect all incoming traffic to this newly deployed version.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You just need to run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy embedding-gemma &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--concurrency&lt;/span&gt; 4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cpu&lt;/span&gt; 8 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="nv"&gt;OLLAMA_NUM_PARALLEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-type&lt;/span&gt; nvidia-l4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-instances&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--memory&lt;/span&gt; 32Gi &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-cpu-throttling&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-gpu-zonal-redundancy&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;600 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--labels&lt;/span&gt; dev-tutorial&lt;span class="o"&gt;=&lt;/span&gt;blog-embedding-gemma
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the following important flags in this command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--concurrency 4&lt;/code&gt; is set to match the value of the environment variable OLLAMA_NUM_PARALLEL.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--gpu 1&lt;/code&gt; with &lt;code&gt;--gpu-type nvidia-l4&lt;/code&gt; assigns 1 NVIDIA L4 GPU to every Cloud Run instance in the service.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--max-instances 1&lt;/code&gt; specifies the maximum number of instances to scale to. It has to be equal to or lower than your project's NVIDIA L4 GPU quota.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--no-allow-unauthenticated&lt;/code&gt; restricts unauthenticated access to the service. By keeping the service private, you can rely on Cloud Run's built-in Identity and Access Management (IAM) authentication for service-to-service communication.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--no-cpu-throttling&lt;/code&gt; is required for enabling GPU.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--no-gpu-zonal-redundancy&lt;/code&gt; set zonal redundancy options depending on your zonal failover requirements and available quota.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Test the deployment
&lt;/h3&gt;

&lt;p&gt;Upon successful deployment of the service, you can initiate requests. However, direct api calls will result in an &lt;em&gt;HTTP 401 Unauthorized&lt;/em&gt; response from Cloud Run.&lt;/p&gt;

&lt;p&gt;This behaviour follows Google’s “secure by default” approach. The model is intended for calls from other services, such as a RAG application, and therefore is not open for public access.&lt;/p&gt;

&lt;p&gt;To support local testing of your deployment, the simplest approach is to launch the Cloud Run developer proxy using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run services proxy embedding-gemma &lt;span class="nt"&gt;--port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;9090
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Afterwards, in a second terminal window, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:9090/api/embed &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "model": "embeddinggemma",
  "input": "Sample text"
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response will look similar to this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ho53s40av2a1ezjqpxo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ho53s40av2a1ezjqpxo.png" alt="EmbeddingGemma curl command response" width="800" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also use Python to call the endpoint. Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:9090&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddinggemma&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sample text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Congratulations 🎉 The Cloud Run deployment is up and running!&lt;/p&gt;

&lt;h4&gt;
  
  
  RAG Example
&lt;/h4&gt;

&lt;p&gt;You can use the newly deployed model to build your first &lt;a href="https://en.wikipedia.org/wiki/Retrieval-augmented_generation" rel="noopener noreferrer"&gt;RAG application&lt;/a&gt;. Here’s how to achieve this:&lt;/p&gt;

&lt;h5&gt;
  
  
  Step 1 - Generate Embeddings
&lt;/h5&gt;

&lt;p&gt;Start with required dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;ollama chromadb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create an &lt;code&gt;example.py&lt;/code&gt; file containing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;

&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Poland is a country located in Central Europe.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The capital and largest city of Poland is Warsaw.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Poland&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s official language is Polish, which is a West Slavic language.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Marie Curie, the pioneering scientist who conducted groundbreaking research on radioactivity, was born in Warsaw, Poland.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Poland is famous for its traditional dish called pierogi, which are filled dumplings.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The Białowieża Forest in Poland is one of the last and largest remaining parts of the immense primeval forest that once stretched across the European Plain.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ollama_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:9090&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Store each document in a in-memory vector embeddings database
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddinggemma&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Step 2 - Retrieve
&lt;/h5&gt;

&lt;p&gt;Next, with the following code you can search the vector database for the most relevant document (add it to your &lt;code&gt;example.py&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# An example question
&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is Poland&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s official language?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Generate an embedding for the input and retrieve the most relevant document
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddinggemma&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Step 3 - Generate Final Answer
&lt;/h5&gt;

&lt;p&gt;In this final step step we will use a locally installed &lt;a href="https://ollama.com/library/gemma3" rel="noopener noreferrer"&gt;Gemma3&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;_Note: We use Gemma3 in the generation step, but any other model could work here (e.g., Gemini, Qwen3, Llama, etc.). Nevertheless, it is  &lt;strong&gt;critical to use the same embeddings&lt;/strong&gt;  model in Step 1 (Generate Embeddings) and Step 2 (Retrieve). _&lt;/p&gt;

&lt;p&gt;To install the Gemma3:latest model run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now can combine user’s prompt with search results and generate the final answer (add this code to &lt;code&gt;example.py&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Final step - generate a response combining the prompt and data we retrieved in step 2
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Using this data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Respond to this prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python example.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The answer should look similar to the one below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt: Using this data: Poland's official language is Polish, which is a West Slavic language.. Respond to this prompt: What is Poland's official language?
Poland's official language is Polish. It's a West Slavic language.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have successfully created and run your first RAG application using the EmbeddingGemma model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;At this point, you have successfully established a Cloud Run service running the EmbeddingGemma model, ready to generate embeddings for semantic search or RAG applications. &lt;/p&gt;

&lt;p&gt;This method also allows you to deploy and compare multiple embedding models on Cloud Run (e.g. &lt;a href="https://medium.com/google-cloud/serverless-ai-qwen3-embeddings-with-cloud-run-eb35d7f4037f" rel="noopener noreferrer"&gt;Qwen3 Embedding&lt;/a&gt; or &lt;a href="https://ollama.com/search?c=embedding" rel="noopener noreferrer"&gt;other Ollama-supported models&lt;/a&gt;), enabling you to find the best fit for your specific use case without major code changes. &lt;/p&gt;

&lt;p&gt;Ready to build your own serverless AI applications?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x87fa8d40_platform_b443676501&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Start building on Cloud Run today&lt;/a&gt; and explore its full potential!&lt;/li&gt;
&lt;li&gt;  If you’re interested in learning more about RAG evaluation, &lt;a href="https://medium.com/google-cloud/evaluating-rag-pipelines-d99e007e625f" rel="noopener noreferrer"&gt;this article&lt;/a&gt; is a good starting point.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Thanks for reading
&lt;/h2&gt;

&lt;p&gt;If you found this article helpful, please consider following me here and giving it a clap 👏 to help others discover it.&lt;/p&gt;

&lt;p&gt;I'm always eager to chat with fellow developers and AI enthusiasts, so feel free to connect with me on &lt;a href="https://www.linkedin.com/in/remigiusz-samborski/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://bsky.app/profile/rsamborski.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cloudnative</category>
      <category>googlecloud</category>
      <category>gemma</category>
    </item>
    <item>
      <title>Serverless AI: Qwen3 Embeddings with Cloud Run</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Wed, 20 Aug 2025 10:31:23 +0000</pubDate>
      <link>https://dev.to/googleai/serverless-ai-qwen3-embeddings-with-cloud-run-4h7b</link>
      <guid>https://dev.to/googleai/serverless-ai-qwen3-embeddings-with-cloud-run-4h7b</guid>
      <description>&lt;p&gt;In this blog post I’ll show you the process of deploying the Qwen3 Embedding model to &lt;a href="https://cloud.google.com/run/docs/configuring/services/gpu?utm_campaign=CDR_0x87fa8d40_platform_b438423716&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run with GPUs&lt;/a&gt; for enhanced performance.&lt;/p&gt;

&lt;p&gt;You will learn how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Containerize the embedding model with Docker and Ollama&lt;/li&gt;
&lt;li&gt;  Deploy the embedding model to Cloud Run with GPUs&lt;/li&gt;
&lt;li&gt;  Test the deployed model from a local machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before we jump into the code a couple words about key components of the solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qwen3 Embedding Model
&lt;/h2&gt;

&lt;p&gt;The Qwen3 Embedding series is a set of open-source models for text &lt;a href="https://huggingface.co/collections/Qwen/qwen3-embedding-6841b2055b99c44d9a4c371f" rel="noopener noreferrer"&gt;embedding&lt;/a&gt; and &lt;a href="https://huggingface.co/collections/Qwen/qwen3-reranker-6841b22d0192d7ade9cdefea" rel="noopener noreferrer"&gt;reranking&lt;/a&gt;, built on the Qwen3 Large Language Model (LLM) family. It's designed for retrieval-augmented generation (RAG), a technique that enhances the output of large language models by retrieving relevant information from a knowledge base, and other tasks requiring semantic search. You can learn more about embeddings in &lt;a href="https://www.youtube.com/watch?v=vlcQV4j2kTo" rel="noopener noreferrer"&gt;this video&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Open embedding models such as Qwen3 are the ideal choice when you need greater control, specialization, and security than proprietary, "black-box" APIs can offer. They are particularly well-suited for the following use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Fine-Tuning for Niche Domains📻: by fine-tuning them on specialized data (e.g., legal contracts, medical research, internal company wikis) they can provide more accurate results for semantic search and RAG than a general-purpose model.&lt;/li&gt;
&lt;li&gt;  Data Privacy &amp;amp; Security🔒: open models can be self-hosted or deployed to cloud resources managed by your organization. This ensures compliance with regulations like GDPR and prevents data from ever leaving your control.&lt;/li&gt;
&lt;li&gt;  Cost-Effectiveness at Scale💰: for high-volume tasks, running an optimized open model can be cheaper than paying per-API-call fees to a proprietary service provider.&lt;/li&gt;
&lt;li&gt;  Offline &amp;amp; Edge Deployment🛜: open models can run locally and are perfect for applications that must function without an internet connection, such as on-device search in mobile apps or analysis on remote IoT devices.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I chose the &lt;a href="https://huggingface.co/Qwen/Qwen3-Embedding-4B" rel="noopener noreferrer"&gt;Qwen3-Embedding-4B&lt;/a&gt; model due to its growing popularity and suitable size for the Cloud Run environment. However, you can experiment with different sizes (0.6B, 4B, and 8B) depending on your specific use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud Run
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x87fa8d40_platform_b438423716&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; is a managed compute platform on Google Cloud that lets you run containerized applications in a serverless environment. Think of it as a middle ground between a simple function-as-a-service (like Cloud Functions) and a more complex GKE cluster. You give it a container image, and it handles all the underlying infrastructure, from provisioning and scaling to managing the runtime.&lt;/p&gt;

&lt;p&gt;The beauty of Cloud Run is that it can automatically scale to zero, meaning when there are no requests, you aren't paying for any resources. When traffic picks up, it quickly scales up to handle the load. This makes it perfect for stateless models that need to be highly available and cost-effective.&lt;/p&gt;

&lt;h1&gt;
  
  
  Deployment
&lt;/h1&gt;

&lt;p&gt;But enough with the intros, let's get our hands dirty with some code 🧑‍💻&lt;/p&gt;

&lt;p&gt;Below is a step by step instruction on how to get the Qwen3 Embedding model up and running.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prepare the environment
&lt;/h2&gt;

&lt;p&gt;First we need to configure the gcloud CLI environment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: if you don’t have gcloud CLI installed please follow instructions &lt;a href="https://cloud.google.com/sdk/docs/install?utm_campaign=CDR_0x87fa8d40_platform_b438423716&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;available here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Step 1 - Set your default project:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud config set project PROJECT_ID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;  Step 2 - Configure Google Cloud CLI to use the _europe-west1 _region for Cloud Run commands:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud config set run/region europe-west1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important: at the time of writing, GPUs on Cloud Run are available in several regions. To check the closest supported region please refer to &lt;a href="https://cloud.google.com/run/docs/configuring/services/gpu#supported-regions?utm_campaign=CDR_0x87fa8d40_platform_b438423716&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;this page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Containerize
&lt;/h2&gt;

&lt;p&gt;We will use &lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; and &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; to run the Qwen3 Embedding model. Create a file named Dockerfile and put the following code inside it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM ollama/ollama:latest

# Listen on all interfaces, port 8080
ENV OLLAMA_HOST=0.0.0.0:8080

# Store model weight files in /models
ENV OLLAMA_MODELS=/models

# Reduce logging verbosity
ENV OLLAMA_DEBUG=false

# Never unload model weights from the GPU
ENV OLLAMA_KEEP_ALIVE=-1

# Store the model weights in the container image
ENV MODEL=dengcao/Qwen3-Embedding-4B:Q4_K_M
RUN ollama serve &amp;amp; sleep 5 &amp;amp;&amp;amp; ollama pull $MODEL

# Start Ollama
ENTRYPOINT ["ollama", "serve"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Build and deploy
&lt;/h2&gt;

&lt;p&gt;Next it’s time to leverage the power of Cloud Run’s source deployments. With a single command you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Build the container image from source (note the –source parameter in the command below)&lt;/li&gt;
&lt;li&gt;  Upload the container image to an &lt;a href="https://cloud.google.com/artifact-registry/docs?utm_campaign=CDR_0x87fa8d40_platform_b438423716&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Artifact Registry&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Deploy the container to Cloud Run with GPUs enabled (note &lt;code&gt;--gpu&lt;/code&gt; and &lt;code&gt;--gpu-type&lt;/code&gt; options)&lt;/li&gt;
&lt;li&gt;  Redirect all traffic to the new deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To do all the above, you just need to run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud run deploy ollama-qwen3-embeddings \
  --source . \
  --concurrency 4 \
  --cpu 8 \
  --set-env-vars OLLAMA_NUM_PARALLEL=4 \
  --gpu 1 \
  --gpu-type nvidia-l4 \
  --max-instances 1 \
  --memory 32Gi \
  --no-allow-unauthenticated \
  --no-cpu-throttling \
  --no-gpu-zonal-redundancy \
  --timeout=600 \
  --labels dev-tutorial=blog-qwen3-embeddings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the following important flags in this command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  --concurrency 4 is set to match the value of the environment variable OLLAMA_NUM_PARALLEL.&lt;/li&gt;
&lt;li&gt;  --gpu 1 with --gpu-type nvidia-l4 assigns 1 NVIDIA L4 GPU to every Cloud Run instance in the service.&lt;/li&gt;
&lt;li&gt;  --max-instances 1 specifies the maximum number of instances to scale to. It has to be equal to or lower than your project's NVIDIA L4 GPU quota.&lt;/li&gt;
&lt;li&gt;  --no-allow-unauthenticated restricts unauthenticated access to the service. By keeping the service private, you can rely on Cloud Run's built-in Identity and Access Management (IAM) authentication for service-to-service communication.&lt;/li&gt;
&lt;li&gt;  --no-cpu-throttling is required for enabling GPU.&lt;/li&gt;
&lt;li&gt;  --no-gpu-zonal-redundancy set zonal redundancy options depending on your zonal failover requirements and available quota.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Test the deployment
&lt;/h2&gt;

&lt;p&gt;Now that you have successfully deployed the service, you can send requests to it. However, if you send a request directly, Cloud Run will respond with &lt;em&gt;HTTP 401 Unauthorized&lt;/em&gt;. This is intentional, because we want our model to be called from other services, such as a RAG application, and not accessible by everyone on the Internet.&lt;/p&gt;

&lt;p&gt;The easiest way to test the deployment from a local machine is to spin up the Cloud Run developer proxy by executing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud run services proxy ollama-qwen3-embeddings --port=9090
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now in a second terminal window run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl http://localhost:9090/api/embed -d '{
  "model": "dengcao/Qwen3-Embedding-4B:Q4_K_M",
  "input": "Sample text"
}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see a response similar to this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A2000%2Fformat%3Awebp%2F0%2AVTcPgNBuyRAnKmrl" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A2000%2Fformat%3Awebp%2F0%2AVTcPgNBuyRAnKmrl" title="Qwen3 Embedding Model response from Cloud Run" alt="Qwen3 Embedding Model response from Cloud Run" width="1600" height="836"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also call the endpoint from a Python client. Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from ollama import Client

client = Client(host="http://localhost:9090")

response = client.embed(model="dengcao/Qwen3-Embedding-4B:Q4_K_M", input="Sample text")
print(response)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Congratulations 🎉 Your Cloud Run deployment is up and running!&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG Example
&lt;/h3&gt;

&lt;p&gt;You can use the newly deployed model to build your first RAG application. Here’s how to achieve this:&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1 - Generate Embeddings
&lt;/h4&gt;

&lt;p&gt;Install necessary dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install ollama chromadb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create an &lt;code&gt;example.py&lt;/code&gt; with the following content:&lt;code&gt;&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import ollama
import chromadb

documents = [
    "Poland is a country located in Central Europe.",
    "The capital and largest city of Poland is Warsaw.",
    "Poland's official language is Polish, which is a West Slavic language.",
    "Marie Curie, the pioneering scientist who conducted groundbreaking research on radioactivity, was born in Warsaw, Poland.",
    "Poland is famous for its traditional dish called pierogi, which are filled dumplings.",
    "The Białowieża Forest in Poland is one of the last and largest remaining parts of the immense primeval forest that once stretched across the European Plain.",
]

client = chromadb.Client()
collection = client.create_collection(name="docs")

ollama_client = ollama.Client(host="http://localhost:9090")

# Store each document in a in-memory vector embeddings database
for i, d in enumerate(documents):
    response = ollama_client.embed(model="dengcao/Qwen3-Embedding-4B:Q4_K_M", input=d)
    embeddings = response["embeddings"]
    collection.add(ids=[str(i)], embeddings=embeddings, documents=[d])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2 - Retrieve
&lt;/h4&gt;

&lt;p&gt;Next, the following code will search the vector database for the most relevant document (add it to your &lt;code&gt;example.py&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# An example prompt
prompt = "What is Poland's official language?"

# Generate an embedding for the input and retrieve the most relevant document
response = ollama_client.embed(model="dengcao/Qwen3-Embedding-4B:Q4_K_M", input=prompt)
results = collection.query(query_embeddings=[response["embeddings"][0]], n_results=1)
data = results["documents"][0][0]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 3 - Generate final answer
&lt;/h4&gt;

&lt;p&gt;In the generation step we will use a locally installed &lt;a href="https://huggingface.co/Qwen/Qwen3-0.6B" rel="noopener noreferrer"&gt;Qwen3:0.6b&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: we use Qwen3 in generation step, but any other model could work here (i.e. Gemini, Gemma, Llama, etc.). Nevertheless it’s &lt;strong&gt;critical to use the same embeddings&lt;/strong&gt; model in step 1 (Generate Embeddings) and step 2 (Retrieve).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can install the Qwen3:0.6b model by running the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama pull qwen3:0.6b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we’re ready to combine user’s prompt with search results to generate the final answer (add to &lt;code&gt;example.py&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Final step - generate a response combining the prompt and data we retrieved in step 2
output = ollama.generate(
    model="qwen3:0.6b",
    prompt=f"Using this data: {data}. Respond to this prompt: {prompt}",
)

print(output["response"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the code by executing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python example.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see an answer similar to the one below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;think&amp;gt;
Okay, the user is asking what Poland's official language is, and they provided the information that Poland's official language is Polish, which is a West Slavic language. Let me make sure I understand this correctly.

First, I need to confirm if that's the correct information. I know that Poland is a country in Eastern Europe, and its official language is Polish. But wait, what's the source of this information? The user hasn't provided any other data, so I should stick strictly to the given information.

I should state that Poland's official language is Polish, and that it's a West Slavic language. I need to present this clearly and concisely. Maybe mention that it's the official language to emphasize its significance. Also, check if there's any other detail that needs to be included, but since the user provided only this, I can proceed.
&amp;lt;/think&amp;gt;

Poland's official language is **Polish**. This language is a **West Slavic language**.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well done! You have just created and run your first RAG application with Qwen3 Embedding model under the hood.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;p&gt;At this point you have established a Cloud Run service running Qwen3 Embedding model. You can use it to generate embeddings for a semantic search or a RAG application.&lt;/p&gt;

&lt;p&gt;Stay tuned for more content around leveraging Qwen3 Embedding in your applications.&lt;/p&gt;

&lt;h1&gt;
  
  
  Thanks for reading
&lt;/h1&gt;

&lt;p&gt;I hope this article inspired you to experiment with open embedding models on &lt;a href="https://cloud.google.com/run/?utm_campaign=CDR_0x87fa8d40_platform_b438423716&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;. If you found this article helpful, please consider following me here and giving it a clap 👏 to help others discover it.&lt;/p&gt;

&lt;p&gt;I'm always eager to chat with fellow developers and AI enthusiasts, so feel free to connect with me on &lt;a href="https://www.linkedin.com/in/remigiusz-samborski/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://bsky.app/profile/rsamborski.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>qwen3</category>
      <category>googlecloud</category>
      <category>cloudrun</category>
      <category>ai</category>
    </item>
    <item>
      <title>Polish Large Language Model na Google Cloud (video)</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Tue, 25 Mar 2025 09:20:00 +0000</pubDate>
      <link>https://dev.to/rsamborski/polish-large-language-model-na-google-cloud-video-4daa</link>
      <guid>https://dev.to/rsamborski/polish-large-language-model-na-google-cloud-video-4daa</guid>
      <description>&lt;p&gt;Kontynuuję moją przygodę z polskim dużym modelem językowym (PLLuM) na Google Cloud! Tym razem oddaję w Wasze ręce nowy film, który krok po kroku pokaże Wam, jak przygotować i uruchomić ten model na Vertex AI.&lt;/p&gt;

&lt;p&gt;Zapraszam do oglądania i komentowania:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/fJcCRTmi8Bo"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>polskaai</category>
      <category>aimadeinpoland</category>
      <category>googlecloud</category>
      <category>ai</category>
    </item>
    <item>
      <title>Polish Large Language Model (PLLuM) on Google Cloud</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Mon, 17 Mar 2025 08:44:00 +0000</pubDate>
      <link>https://dev.to/googlecloud/pllum-na-google-cloud-5c6</link>
      <guid>https://dev.to/googlecloud/pllum-na-google-cloud-5c6</guid>
      <description>&lt;p&gt;"Wpadła śliwka w .... Google Cloud" 😉&lt;/p&gt;

&lt;p&gt;Recently, thanks to the Ministry of Digital Affairs, there's been a lot of buzz about the new Polish Large Language Model (PLLuM). I decided to play around with it a bit and show others how to run it on Google Cloud using Vertex AI.&lt;/p&gt;

&lt;p&gt;I invite you to check out &lt;a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/serving/vertex_ai_pytorch_inference_pllum_with_custom_handler.ipynb" rel="noopener noreferrer"&gt;this notebook&lt;/a&gt;, which will guide you through this process step by step.&lt;/p&gt;

&lt;p&gt;Let me know in the comments what applications you see for this new open model.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>locallama</category>
      <category>llm</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>Deploying a Gemini-powered Mesop app to Cloud Run</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Mon, 17 Feb 2025 10:47:01 +0000</pubDate>
      <link>https://dev.to/rsamborski/deploying-a-gemini-powered-mesop-app-to-cloud-run-17m5</link>
      <guid>https://dev.to/rsamborski/deploying-a-gemini-powered-mesop-app-to-cloud-run-17m5</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/rsamborski/unleash-the-power-of-gemini-3g77"&gt;the first video&lt;/a&gt; I showed how to create a Gemini-powered Mesop app.&lt;/p&gt;

&lt;p&gt;In the follow-up video I focus on deploying it to Cloud Run. I hope you find it interesting:&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/s9Ag_YNdl0M"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you have questions or feedback, feel free to reach me out.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>python</category>
      <category>cloudcomputing</category>
    </item>
    <item>
      <title>Unleash the power of Gemini! ✨</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Thu, 13 Feb 2025 11:23:56 +0000</pubDate>
      <link>https://dev.to/rsamborski/unleash-the-power-of-gemini-3g77</link>
      <guid>https://dev.to/rsamborski/unleash-the-power-of-gemini-3g77</guid>
      <description>&lt;p&gt;I recently released hands-on video tutorial which shows how to build a Gemini-powered application using the &lt;a href="https://google.github.io/mesop/" rel="noopener noreferrer"&gt;Mesop framework&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Learn the steps to integrate with the Gemini model and get your project ready for deployment. &lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/KUfPiSUJrwE"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Code is available at &lt;a href="https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gemini/sample-apps/gemini-mesop-cloudrun" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned for the next video where we'll deploy it to Google Cloud.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>python</category>
    </item>
    <item>
      <title>Great read!</title>
      <dc:creator>Remigiusz Samborski</dc:creator>
      <pubDate>Fri, 07 Feb 2025 09:23:30 +0000</pubDate>
      <link>https://dev.to/rsamborski/great-read-183a</link>
      <guid>https://dev.to/rsamborski/great-read-183a</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/googlecloud" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__org__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F809%2Fc7814399-cf4a-4dc9-9f12-d0a97ed21bf6.png" alt="Google Cloud" width="192" height="192"&gt;
      &lt;div class="ltag__link__user__pic"&gt;
        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F164593%2F5fc8f88c-e999-4d1e-805a-673d4c13d128.jpg" alt="" width="400" height="400"&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/googlecloud/leverage-open-models-like-gemma-2-on-gke-with-langchain-29ki" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Leverage open models like Gemma 2 on GKE with LangChain&lt;/h2&gt;
      &lt;h3&gt;Olivier Bourgeois for Google Cloud ・ Feb 6 '25&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#kubernetes&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#langchain&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#googlecloud&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>kubernetes</category>
      <category>ai</category>
      <category>langchain</category>
      <category>googlecloud</category>
    </item>
  </channel>
</rss>
