<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Olivier Bourgeois</title>
    <description>The latest articles on DEV Community by Olivier Bourgeois (@olivi-eh).</description>
    <link>https://dev.to/olivi-eh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F164593%2F5fc8f88c-e999-4d1e-805a-673d4c13d128.jpg</url>
      <title>DEV Community: Olivier Bourgeois</title>
      <link>https://dev.to/olivi-eh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/olivi-eh"/>
    <language>en</language>
    <item>
      <title>Hands-on with Gemma 3 on Google Cloud</title>
      <dc:creator>Olivier Bourgeois</dc:creator>
      <pubDate>Fri, 05 Dec 2025 16:31:49 +0000</pubDate>
      <link>https://dev.to/googleai/hands-on-with-gemma-3-on-google-cloud-6e7</link>
      <guid>https://dev.to/googleai/hands-on-with-gemma-3-on-google-cloud-6e7</guid>
      <description>&lt;p&gt;The landscape of generative AI is shifting. While proprietary APIs are powerful, there is a growing demand for &lt;strong&gt;open models&lt;/strong&gt;—models where the architecture and weights are publicly available. This shift puts control back in the hands of developers, offering transparency, data privacy, and the ability to fine-tune for specific use cases.&lt;/p&gt;

&lt;p&gt;To help you navigate this landscape, we are releasing &lt;strong&gt;two new hands-on labs&lt;/strong&gt; featuring &lt;a href="https://ai.google.dev/gemma?utm_campaign=CDR_0x5723eddc_default_b459438884&amp;amp;utm_medium=external&amp;amp;utm_source=lab" rel="noopener noreferrer"&gt;Gemma 3&lt;/a&gt;, Google’s latest family of lightweight, state-of-the-art open models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Gemma?
&lt;/h2&gt;

&lt;p&gt;Built from the same research and technology as Gemini, Gemma models are designed for responsible AI development. Gemma 3 is particularly exciting because it offers multimodal capabilities (text and image) and fits efficiently on smaller hardware footprints while delivering massive performance.&lt;/p&gt;

&lt;p&gt;But running a model on your laptop is very different from running it in production. You need scale, reliability, and hardware acceleration (GPUs). The question is: &lt;strong&gt;Where should you deploy?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We have prepared two different paths for you, depending on your infrastructure needs: &lt;a href="https://docs.cloud.google.com/run/docs?utm_campaign=CDR_0x5723eddc_default_b459438884&amp;amp;utm_medium=external&amp;amp;utm_source=lab" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; or &lt;a href="https://cloud.google.com/kubernetes-engine/docs?utm_campaign=CDR_0x5723eddc_default_b459438884&amp;amp;utm_medium=external&amp;amp;utm_source=lab" rel="noopener noreferrer"&gt;Google Kubernetes Engine (GKE)&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Path 1: The Serverless Approach (Cloud Run)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Developers who want an API up and running instantly without managing infrastructure, scaling to zero when not in use.&lt;/p&gt;

&lt;p&gt;If your priority is simplicity and cost-efficiency for stateless workloads, Cloud Run is your answer. It abstracts away the server management entirely. With the recent addition of GPU support on Cloud Run, you can now serve modern LLMs without provisioning a cluster.&lt;/p&gt;


&lt;div class="crayons-card c-embed"&gt;

  
&lt;h3&gt;
  
  
  Start the lab!
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Lab:&lt;/strong&gt; &lt;a href="https://codelabs.developers.google.com/devsite/codelabs/serve-gemma3-with-vllm-on-cloud-run#0?utm_campaign=CDR_0x5723eddc_default_b459438884&amp;amp;utm_medium=external&amp;amp;utm_source=lab" rel="noopener noreferrer"&gt;Serving Gemma 3 with vLLM on Cloud Run&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containerize &lt;strong&gt;vLLM&lt;/strong&gt; (a high-throughput serving engine).&lt;/li&gt;
&lt;li&gt;Deploy Gemma 3 to &lt;strong&gt;Cloud Run&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Leverage GPU acceleration for fast inference.&lt;/li&gt;
&lt;li&gt;Expose an OpenAI-compatible API endpoint.&lt;/li&gt;
&lt;/ul&gt;


&lt;/div&gt;


&lt;h2&gt;
  
  
  Path 2: The Platform Approach (GKE)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Engineering teams building complex AI platforms, requiring high throughput, custom orchestration, or integration with a broader microservices ecosystem.&lt;/p&gt;

&lt;p&gt;When your application graduates from a prototype to a high-traffic production system, you need the control of Kubernetes. GKE Autopilot gives you that power while still handling the heavy lifting of node management. This path creates a seamless journey from local testing to cloud production.&lt;/p&gt;


&lt;div class="crayons-card c-embed"&gt;

  
&lt;h3&gt;
  
  
  Start the lab!
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Lab:&lt;/strong&gt; &lt;a href="https://codelabs.developers.google.com/codelabs/production-ready-ai-with-gc/5-deploying-agents/deploying-open-models-gke#0?utm_campaign=CDR_0x5723eddc_default_b459438884&amp;amp;utm_medium=external&amp;amp;utm_source=lab" rel="noopener noreferrer"&gt;Deploying Open Models on GKE&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this lab, you will learn how to:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prototype locally using &lt;strong&gt;&lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Containerize your setup and transition to &lt;strong&gt;&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview?utm_campaign=CDR_0x5723eddc_default_b459438884&amp;amp;utm_medium=external&amp;amp;utm_source=lab" rel="noopener noreferrer"&gt;GKE Autopilot&lt;/a&gt;&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Deploy a scalable inference service using standard Kubernetes manifests.&lt;/li&gt;
&lt;li&gt;Manage resources effectively for production workloads.&lt;/li&gt;
&lt;/ul&gt;


&lt;/div&gt;


&lt;h2&gt;
  
  
  Which Path Will You Choose?
&lt;/h2&gt;

&lt;p&gt;Whether you are looking for the serverless simplicity of Cloud Run or the robust orchestration of GKE, Google Cloud provides the tools to take Gemma 3 from a concept to a deployed application.&lt;/p&gt;

&lt;p&gt;Dive into the labs today and start building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://codelabs.developers.google.com/devsite/codelabs/serve-gemma3-with-vllm-on-cloud-run#0?utm_campaign=CDR_0x5723eddc_default_b459438884&amp;amp;utm_medium=external&amp;amp;utm_source=lab" rel="noopener noreferrer"&gt;Serving Gemma 3 with vLLM on Cloud Run&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://codelabs.developers.google.com/codelabs/production-ready-ai-with-gc/5-deploying-agents/deploying-open-models-gke#0?utm_campaign=CDR_0x5723eddc_default_b459438884&amp;amp;utm_medium=external&amp;amp;utm_source=lab" rel="noopener noreferrer"&gt;Deploying Open Models on GKE&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Share your progress and connect with others on the journey using the hashtag &lt;strong&gt;#ProductionReadyAI&lt;/strong&gt;. Happy learning!&lt;/p&gt;

&lt;p&gt;These labs are part of the &lt;strong&gt;Open Models&lt;/strong&gt; module in our official &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/production-ready-ai-with-google-cloud-learning-path" rel="noopener noreferrer"&gt;Production-Ready AI with Google Cloud&lt;/a&gt; program. Explore the full curriculum for more content that will help you bridge the gap from a promising prototype to a production-grade AI application.&lt;/p&gt;

</description>
      <category>gemma</category>
      <category>ai</category>
      <category>cloud</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Observability in Action: A Google Cloud Next demo</title>
      <dc:creator>Olivier Bourgeois</dc:creator>
      <pubDate>Mon, 05 May 2025 18:17:47 +0000</pubDate>
      <link>https://dev.to/googlecloud/observability-in-action-a-google-cloud-next-demo-2fkb</link>
      <guid>https://dev.to/googlecloud/observability-in-action-a-google-cloud-next-demo-2fkb</guid>
      <description>&lt;p&gt;It was only a few weeks ago that over 32,000 cloud practitioners from all over the world came together in Las Vegas to attend &lt;a href="https://cloud.withgoogle.com/next/25" rel="noopener noreferrer"&gt;Google Cloud Next 2025&lt;/a&gt;. Beyond the keynotes, the workshops, and the multiple jam-packed tracks of talks and sessions, an entire expo hall offered attendees the opportunity to observe or play around with more than 500 live demos. Let’s check out one of these demos!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2j0xjh3gevygsas1jd0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2j0xjh3gevygsas1jd0.png" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Overview of the demo
&lt;/h2&gt;

&lt;p&gt;The main goals of the Observability in Action demo were twofold. We wanted to showcase various ways of interacting with metrics and logs. And we wanted to give attendees a little bit of an interactive experience. For the interactive part of the demo, we utilized various oversized physical buttons and pedals that could be used to select answers or confirm inputs.&lt;/p&gt;

&lt;p&gt;The flow of the demo was as followed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We ask the attendee to type in a prompt that they wanted sent to an AI model.
&lt;/li&gt;
&lt;li&gt;The prompt was sent in the background to three different models: Gemma 3 on Cloud Run, Gemini 2.0 Flash on Vertex AI, and Gemini 2.0 Flash-Lite on Vertex AI. This generated logs and metrics.
&lt;/li&gt;
&lt;li&gt;The attendee was then given a short quiz about these three models. Each quiz input also generated logs and metrics.
&lt;/li&gt;
&lt;li&gt;At the end of the quiz, we give the attendee a rundown of their answers, and then flip over to the Google Cloud Console.
&lt;/li&gt;
&lt;li&gt;In Cloud Monitoring, we showcase the various native metrics that Cloud Run offers, custom metrics implemented using OpenTelemetry, as well as the Cloud Trace functionality.
&lt;/li&gt;
&lt;li&gt;Finally, we turn to BigQuery to showcase how we can mirror logs to a database for further analysis using Jupyter Notebooks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7r633avlam0n78b2rqyv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7r633avlam0n78b2rqyv.png" width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;While the demo frontend runs locally, the backend is deployed as a Cloud Run instance. This instance is then talking to Gemini through the Vertex AI SDK and to Gemma through its own Cloud Run instance. The persistent state of the demo resides in a Firestore database. All Cloud Run logs are mirrored to BigQuery using a simple sink.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcn41o28fet53zgsc05wv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcn41o28fet53zgsc05wv.png" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing metrics using Cloud Monitoring
&lt;/h2&gt;

&lt;p&gt;Cloud Monitoring provides visibility into the performance and health of your cloud applications and infrastructure. It collects metrics, events, and metadata from Google Cloud services and other sources, allowing you to visualize this data on dashboards and create alerts for critical issues. This is useful for proactively identifying and resolving problems, optimizing resource utilization, improving uptime, and understanding system behavior, ultimately leading to more reliable and cost-effective applications.&lt;/p&gt;

&lt;p&gt;For services like Cloud Run which we’re using for the backend of this demo, Cloud Monitoring automatically collects a wide array of native metrics without any setup needed. This includes data points such as request latency, count, container CPU and memory usage, and instance counts. This out-of-the-box integration means developers get immediate insights into their serverless application's performance and resource consumption, simplifying troubleshooting and optimization efforts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52tl8uurp6v1a0fu170k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52tl8uurp6v1a0fu170k.png" alt="Image description" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloud Trace is a distributed tracing system within Google Cloud that helps you understand request latency across your application and its services. It tracks how long different parts of your application take to process requests, visualizing the entire request flow. This is particularly valuable for identifying performance bottlenecks in microservices architectures by showing where time is spent during a request's lifecycle.&lt;/p&gt;

&lt;p&gt;Here’s a real life example: In this demo we send a prompt to multiple models. We were sure we implemented concurrency correctly (so the calls to the three different models should’ve happened in parallel) yet the latency seems significantly higher than expected. When we dug into the trace of a call, we quickly realized that we were accidentally making those calls sequentially! These traces were made available to us via an OpenTelemetry instrumentation we added to our code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1e7t3t9mw6m0r2b40xnz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1e7t3t9mw6m0r2b40xnz.png" alt="Image description" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Interact with your logs with BigQuery
&lt;/h2&gt;

&lt;p&gt;BigQuery is a serverless enterprise data warehouse that enables super-fast SQL queries on large datasets without infrastructure management. It's built for scalable analytics, supports diverse data types, and integrates machine learning, offering a powerful platform for insights from real-time and historical data.&lt;/p&gt;

&lt;p&gt;With &lt;a href="https://cloud.google.com/logging/docs/export/configure_export_v2" rel="noopener noreferrer"&gt;a simple sink&lt;/a&gt;, you can directly stream logs from Cloud Logging into BigQuery, transforming it into a powerful, long-term log analytics platform. This allows you to run complex SQL queries across extensive historical log data, which is invaluable for in-depth security audits, compliance, and identifying subtle operational trends.&lt;/p&gt;

&lt;p&gt;Connecting BigQuery to Jupyter Notebooks further enhances log analysis capabilities. This empowers users to leverage Python and data science libraries for advanced data exploration, custom visualizations, and machine learning on log data, facilitating deeper insights and shareable, interactive analysis beyond standard logging tools.&lt;/p&gt;

&lt;p&gt;For this demo, we &lt;a href="https://github.com/GoogleCloudDevRel/next25-observability-in-action/blob/main/Log_Exploration_in_BigQuery.ipynb" rel="noopener noreferrer"&gt;built a Jupyter Notebook&lt;/a&gt; that did analysis on the various interactive quiz events, cross-referenced answers with an external Firestore database, and built tables and charts of the resulting data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvh6tfxzgqrc1x0srebfh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvh6tfxzgqrc1x0srebfh.png" alt="Image description" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it out!
&lt;/h2&gt;

&lt;p&gt;Want to try this demo from home? The source code is &lt;a href="https://github.com/GoogleCloudDevRel/next25-observability-in-action" rel="noopener noreferrer"&gt;available on GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Want to learn more about observability on Google Cloud? Check out these resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/stackdriver/docs" rel="noopener noreferrer"&gt;Documentation: Observability in Google Cloud&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.cloudskillsboost.google/course_templates/864" rel="noopener noreferrer"&gt;Online course: Observability in Google Cloud&lt;/a&gt; &lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>googlecloud</category>
      <category>observability</category>
      <category>googlecloudnext</category>
      <category>bigquery</category>
    </item>
    <item>
      <title>Streamline your LangChain deployments with LangServe</title>
      <dc:creator>Olivier Bourgeois</dc:creator>
      <pubDate>Fri, 28 Feb 2025 18:31:31 +0000</pubDate>
      <link>https://dev.to/googlecloud/streamline-your-langchain-deployments-with-langserve-1g38</link>
      <guid>https://dev.to/googlecloud/streamline-your-langchain-deployments-with-langserve-1g38</guid>
      <description>&lt;p&gt;Throughout this LangChain series, we've explored the &lt;a href="https://dev.to/googlecloud/simplify-development-of-ai-powered-applications-with-langchain-2pob"&gt;power and flexibility of LangChain&lt;/a&gt;, from deploying it on &lt;a href="https://dev.to/googlecloud/deploy-gemini-powered-langchain-applications-on-gke-42la"&gt;Google Kubernetes Engine (GKE) with Gemini&lt;/a&gt; to &lt;a href="https://dev.to/googlecloud/leverage-open-models-like-gemma-2-on-gke-with-langchain-29ki"&gt;running open models like Gemma&lt;/a&gt;. Now, let's introduce an interesting complement to help us deploy LangChain-powered applications as a REST API: &lt;a href="https://python.langchain.com/docs/langserve/" rel="noopener noreferrer"&gt;LangServe&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is LangServe?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://python.langchain.com/docs/langserve" rel="noopener noreferrer"&gt;LangServe&lt;/a&gt; is a helpful tool designed to simplify the deployment of LangChain applications as REST APIs. Instead of having to manually take care of the REST logic for your LLM deployment (like exposing endpoints or serving API documentation) we can get LangServe to do that for us. It's built by the same team behind LangChain, ensuring seamless integration and a developer-friendly experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use LangServe?
&lt;/h2&gt;

&lt;p&gt;In the previous parts of this LangChain series, we've seen how to deploy a LangChain-powered application and how to talk to it. Isn't that enough? Well, LangServe offers several key advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rapid deployment:&lt;/strong&gt; LangServe drastically reduces the amount of boilerplate code needed to expose your LangChain applications as APIs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic API documentation:&lt;/strong&gt; LangServe automatically generates interactive API documentation for your deployed chains, making it easy for others (or your future self, if you're like me) to understand and use your services.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in playground:&lt;/strong&gt; LangServe provides a simple web playground for interacting with your deployed LangChain applications directly from your browser. This is incredibly helpful for testing and debugging.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standardized interface:&lt;/strong&gt; LangServe helps you create consistent, well-structured APIs for your LangChain applications, making them easier to integrate with other services and front-end applications.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplified client interaction:&lt;/strong&gt; LangServe comes with a corresponding client library that simplifies calling your deployed chains from other Python or JavaScript applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How does LangServe work?
&lt;/h2&gt;

&lt;p&gt;LangServe leverages the power of &lt;a href="https://fastapi.tiangolo.com/" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt; and &lt;a href="https://docs.pydantic.dev/latest/" rel="noopener noreferrer"&gt;pydantic&lt;/a&gt; to create a robust and efficient serving layer for your LangChain applications. It essentially wraps your LangChain chains or agents, turning them into FastAPI endpoints.&lt;/p&gt;

&lt;p&gt;Let's look at an example and see how that all comes together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a LangServe application
&lt;/h2&gt;

&lt;p&gt;Let's say you have the following LangChain application that uses Gemini:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_google_genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatGoogleGenerativeAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatGoogleGenerativeAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that answers questions about a given topic.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's how you would adapt it for LangServe, which you can save as &lt;code&gt;app.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_google_genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatGoogleGenerativeAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.output_parsers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StrOutputParser&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langserve&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;add_routes&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uvicorn&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LangChain Server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A simple API server using LangChain&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Runnable interfaces&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatGoogleGenerativeAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that answers questions about a given topic.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nc"&gt;StrOutputParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;add_routes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/my-chain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;uvicorn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, create a &lt;code&gt;requirements.txt&lt;/code&gt; file with our dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;langserve
langchain-google-genai
uvicorn
fastapi
sse_starlette
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that's it! With these simple changes, your chain is now ready to be served. You can install dependencies and run this application using the following commands. Make sure to replace the &lt;code&gt;your_google_api_key&lt;/code&gt; string with your &lt;a href="https://ai.google.dev/gemini-api/docs/api-key" rel="noopener noreferrer"&gt;Gemini API key&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GOOGLE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_google_api_key"&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will start a server, by default on port 8000.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interacting with your LangServe application
&lt;/h3&gt;

&lt;p&gt;Once your server is running, you can interact with it in several ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Through the automatically generated API docs:&lt;/strong&gt; Navigate to &lt;code&gt;http://localhost:8000/docs&lt;/code&gt; in your browser to see the interactive API documentation.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using the built-in playground:&lt;/strong&gt; Go to &lt;code&gt;http://localhost:8000/my-chain/playground/&lt;/code&gt; to try out your chain directly in a simple web interface.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using the LangServe client:&lt;/strong&gt; You can use the provided client library to interact with your API programmatically from other Python or JavaScript applications. Here's a simple Python example:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langserve&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RemoteRunnable&lt;/span&gt;

&lt;span class="n"&gt;remote_chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RemoteRunnable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000/my-chain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;remote_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tell me about Google Cloud Platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftvpcnn6eamfxkine5xcr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftvpcnn6eamfxkine5xcr.png" width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Containerizing our application
&lt;/h3&gt;

&lt;p&gt;You can also easily containerize your LangServe application to deploy on a platform like GKE, just like we did with our previous examples.&lt;/p&gt;

&lt;p&gt;First, create a &lt;code&gt;Dockerfile&lt;/code&gt; to define how to assemble our image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use an official Python runtime as a parent image&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3-slim&lt;/span&gt;

&lt;span class="c"&gt;# Set the working directory in the container&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="c"&gt;# Copy the current directory contents into the container at /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . /app&lt;/span&gt;

&lt;span class="c"&gt;# Install any needed packages specified in requirements.txt&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Make port 80 available to the world outside this container&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 80&lt;/span&gt;

&lt;span class="c"&gt;# Run app.py when the container launches&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; [ "python", "app.py" ]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, build the container image and push it to &lt;a href="https://cloud.google.com/artifact-registry/docs" rel="noopener noreferrer"&gt;Artifact Registry&lt;/a&gt;. Don't forget to replace &lt;code&gt;PROJECT_ID&lt;/code&gt; with your Google Cloud project ID.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Authenticate with Google Cloud&lt;/span&gt;
gcloud auth login

&lt;span class="c"&gt;# Create the repository&lt;/span&gt;
gcloud artifacts repositories create images &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository-format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us

&lt;span class="c"&gt;# Configure authentication to the desired repository&lt;/span&gt;
gcloud auth configure-docker us-docker.pkg.dev/PROJECT_ID/images

&lt;span class="c"&gt;# Build the image&lt;/span&gt;
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1 &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Push the image&lt;/span&gt;
docker push us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a handful of seconds, your container image should now be stored in your Artifact Registry repository.&lt;/p&gt;

&lt;p&gt;Now, let's deploy this image to our GKE cluster. You can create a GKE cluster through the &lt;a href="https://console.cloud.google.com/kubernetes" rel="noopener noreferrer"&gt;Google Cloud Console&lt;/a&gt; or using the &lt;code&gt;gcloud&lt;/code&gt; command-line tool, again taking care of replacing &lt;code&gt;PROJECT_ID&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud container clusters create-auto langchain-cluster \
  --project=PROJECT_ID \
  --region=us-central1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once your cluster is up and running, create a YAML file with your Kubernetes deployment and service manifests. Let's call it &lt;code&gt;deployment.yaml&lt;/code&gt;, replacing &lt;code&gt;PROJECT_ID&lt;/code&gt; as well as &lt;code&gt;YOUR_GOOGLE_API_KEY&lt;/code&gt; with your &lt;a href="https://ai.google.dev/gemini-api/docs/api-key" rel="noopener noreferrer"&gt;Gemini API key&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt; &lt;span class="c1"&gt;# Scale as needed&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Add selector here&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-app&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-app&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-container&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GOOGLE_API_KEY&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;YOUR_GOOGLE_API_KEY&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-app&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LoadBalancer&lt;/span&gt; &lt;span class="c1"&gt;# Exposes the service externally&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply the manifest to your cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get the context of your cluster&lt;/span&gt;
gcloud container clusters get-credentials langchain-cluster &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1

&lt;span class="c"&gt;# Deploy the manifest&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a deployment with three replicas of your LangChain application and exposes it externally through a load balancer. You can adjust the number of replicas based on your expected load.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;LangServe bridges the gap between development and production, making it easier than ever to share your AI applications with the world. By providing a simple, standardized way to serve your chains as APIs, LangServe unlocks a whole new level of accessibility and usability for your LangChain projects. Whether you're building internal tools or public-facing applications, LangServe streamlines the process, letting you focus on crafting impactful applications with LangChain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next Steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dive into the &lt;a href="https://python.langchain.com/docs/langserve/" rel="noopener noreferrer"&gt;LangServe documentation&lt;/a&gt; for a more in-depth look at its features and capabilities.
&lt;/li&gt;
&lt;li&gt;Experiment with deploying a LangServe application to GKE using the containerization techniques we've covered.
&lt;/li&gt;
&lt;li&gt;Explore the LangServe client library to see how you can easily integrate your deployed chains with other applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With this post, we conclude our journey through the world of LangChain, from its core concepts to advanced deployment strategies with GKE, open models, and now, streamlined serving with LangServe. I hope this series has empowered you to build and deploy your own amazing AI-powered applications!&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>ai</category>
      <category>langchain</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>Two years with Obsidian: How notes changed the way I store information</title>
      <dc:creator>Olivier Bourgeois</dc:creator>
      <pubDate>Mon, 10 Feb 2025 15:38:21 +0000</pubDate>
      <link>https://dev.to/olivi-eh/two-years-with-obsidian-how-notes-changed-the-way-i-store-information-4iaf</link>
      <guid>https://dev.to/olivi-eh/two-years-with-obsidian-how-notes-changed-the-way-i-store-information-4iaf</guid>
      <description>&lt;p&gt;I've been storing and keeping track of information in various ways for a long time. First using physical notes, then simple digital text files, and finally I jumped from app to app as I encountered issues that irritated me that I had no control over.&lt;/p&gt;

&lt;p&gt;Near the tail end of 2022, I came across what I thought might be the answer to all my woes: a note-taking app built by a small team, with the name of &lt;a href="https://obsidian.md/" rel="noopener noreferrer"&gt;Obsidian&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's in an Obsidian?
&lt;/h2&gt;

&lt;p&gt;Obsidian is a multi-platform note-taking and writing app. Simple enough. But aren't there plenty of those around? Yes absolutely, but they each have downsides that I wasn't able to settle with long-term. With &lt;a href="https://docs.google.com/" rel="noopener noreferrer"&gt;Google Cloud&lt;/a&gt; it was the difficult of linking between notes (this has improved since but it's still not quite what I want). With &lt;a href="https://evernote.com/" rel="noopener noreferrer"&gt;Evernote&lt;/a&gt; my notes were in a proprietary format and stuck in the cloud. &lt;a href="https://www.notion.com/" rel="noopener noreferrer"&gt;Notion&lt;/a&gt; also had the cloud-first problem and stored in an awkward non-standard Markdown format. And the list goes on.&lt;/p&gt;

&lt;p&gt;Here's what Obsidian provides that sold it to me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local-first&lt;/strong&gt; providing tangible &lt;code&gt;.md&lt;/code&gt; files in a local directory structure I can interact with.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portable notes&lt;/strong&gt; in standard Markdown format allowing me to easily migrate to other platforms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Links between notes&lt;/strong&gt; gives the ability to quickly move to related notes using wiki-style links.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YAML frontmatter&lt;/strong&gt; rendering key-value pairs of metadata for each note in a beautiful way.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graphs and canvases&lt;/strong&gt; allowing me to easily visualize notes and their connections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile friendly&lt;/strong&gt; with support for all of the same features that the desktop version offers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native sync&lt;/strong&gt; providing &lt;a href="https://obsidian.md/sync" rel="noopener noreferrer"&gt;end-to-end encrypted sync and version control&lt;/a&gt; for a modest monthly fee.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensible&lt;/strong&gt; with a broad catalog of &lt;a href="https://obsidian.md/plugins" rel="noopener noreferrer"&gt;community-created plugins&lt;/a&gt; and themes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A4800%2Fformat%3Awebp%2F0%2AnCWoTgje989PFqSz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A4800%2Fformat%3Awebp%2F0%2AnCWoTgje989PFqSz.png" alt="Screenshot of my Obsidian vault opened on the graph view" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why store information?
&lt;/h2&gt;

&lt;p&gt;The way people interact with pieces of information is very personal and differ from person to person, but these are the main reasons I've been maintaining a repository of notes over the past decade or so:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Noting down information that I know I don't need in the short-term &lt;strong&gt;frees cognitive space&lt;/strong&gt; to think about and remember other things.&lt;/li&gt;
&lt;li&gt;Counter-intuitively, noting down information that I &lt;em&gt;do&lt;/em&gt; need in the short-term &lt;strong&gt;helps me remember&lt;/strong&gt; them better. The simple act of writing down reminders engraves them in my short-term memory.&lt;/li&gt;
&lt;li&gt;Doing research in notes &lt;strong&gt;prevents me from doing duplicative research&lt;/strong&gt; the next year or the next decade. At the very least, it gives me a foundation to work with instead of starting from scratch multiple times.&lt;/li&gt;
&lt;li&gt;It's a sort of &lt;strong&gt;knowledge insurance for the future&lt;/strong&gt;. I intend to live for at least a handful more decades, and that's plenty of time to forget things (either because of an illness, or simply because it's been so long).&lt;/li&gt;
&lt;li&gt;If I were to &lt;strong&gt;author an auto-biography&lt;/strong&gt; in my later years, all of the material would already be there, waiting to be pieced together in a coherent story.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So with that said, what are the kinds of notes that I have in Obsidian? Glad you asked! Here's a non-exhaustive list (in no particular order) of different note categories, with some examples for each:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Journaling &amp;amp; retrospectives&lt;/strong&gt; ("Year 2024", "2025-02-09", ...)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brainstorming&lt;/strong&gt; ("3D printing ideas", "Photography ideas", ...)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge&lt;/strong&gt; ("Interesting urbanism studies", "How to count things in Japanese", ...)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal&lt;/strong&gt; ("5 years life plan &amp;amp; goals", "History of addresses lived at", ...)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Career&lt;/strong&gt; ("Onboarding to a new job", "Employment history", ...)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finances&lt;/strong&gt; ("Tax return forms to expect", "TFSA contribution table", ...)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health&lt;/strong&gt; ("Eye exam &amp;amp; prescription history", "Family health history", ...)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trips &amp;amp; events&lt;/strong&gt; ("Pre-travel checklist", "List of flights taken", ...)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Media consumed &amp;amp; backlog&lt;/strong&gt; ("Books I've read", "Christmas films I want to watch", ...)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkgghdtxnmcpv850bvxal.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkgghdtxnmcpv850bvxal.png" alt="Screenshot of a note I created to act as an overview of my personal notes" width="800" height="576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Templates to reduce repetition
&lt;/h2&gt;

&lt;p&gt;The first community-built plugin that I ended up trying out was &lt;a href="https://obsidian.md/plugins?id=quickadd" rel="noopener noreferrer"&gt;QuickAdd&lt;/a&gt;. This plugin allows you to create custom commands in the command palette configured to duplicate a specific template note. This means that you could create for example a note called "New trip template" and configure a command called "Add new trip" which would duplicate that particular note and open it for you to fill out as desired.&lt;/p&gt;

&lt;p&gt;In my Obsidian vault I've set-up many of these templates which both saves me a lot of time, and ensures consistency between notes of the same category / type. When I open the command palette and search for "QuickAdd", they all show up:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsz9ml4f0d693hy6x89uv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsz9ml4f0d693hy6x89uv.png" alt="Screenshot of the command palette showing QuickAdd" width="800" height="613"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's say I'm going on a trip soon. I select the &lt;strong&gt;Add new trip&lt;/strong&gt; command, enter a name ("Trip to the Land of OOO") and a note is automatically created, stored at the expected location, with the relevant template (both the YAML metadata and the Markdown note itself) ready for me to fill out!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgbf3ji79rl1ydkx5o2w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgbf3ji79rl1ydkx5o2w.png" alt="Screenshot of a generated trip built from its template" width="800" height="575"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since templates mean I get to create a lot of notes really easily, I wanted to prevent an potential issue where my directories would be full of notes of all kinds mixed together. To solve this, I have the templating plugin set-up to place the notes in a relevant &lt;code&gt;_items/&lt;/code&gt; directory within the root-level category directory. This allows me to easily find the non-templated notes (in this case, something like "Packing list").&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7rvxy474tssnynt4kxh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu7rvxy474tssnynt4kxh.png" alt="Screenshot of the directory structure of my vault, showing the Trips notes" width="452" height="305"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Scripting to leverage external metadata
&lt;/h2&gt;

&lt;p&gt;One of the advantages of using a local-first notes app with an open portable format is that I can easily interact with the notes outside of the note taking app itself. This means that I can, among other things, build custom scripts or pipelines that can create or modify notes.&lt;/p&gt;

&lt;p&gt;I currently do this for three types of notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Batch-converting Google Contacts metadata to &lt;em&gt;people notes&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Generating &lt;em&gt;concert notes&lt;/em&gt; from a &lt;a href="https://www.setlist.fm/" rel="noopener noreferrer"&gt;setlist.fm&lt;/a&gt; URL which then auto-fills metadata like venue, tour name, and setlist.&lt;/li&gt;
&lt;li&gt;Injecting metadata into &lt;em&gt;media notes&lt;/em&gt; using public APIs like &lt;a href="https://www.igdb.com/api" rel="noopener noreferrer"&gt;IGDB&lt;/a&gt; to auto-fill metadata like release date, synopsis, rating, and more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffeavuc2qv9rv1z0v7x8s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffeavuc2qv9rv1z0v7x8s.png" alt="Screenshot of the Back to the Future note after injecting IMDb metadata" width="800" height="605"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying notes to render tables
&lt;/h2&gt;

&lt;p&gt;Something that I missed after having used Notion for a few years was the ability to create rendered tables out of notes with custom columns, filters, and sorting. Obsidian doesn't have that built-in (though it is &lt;a href="https://obsidian.md/roadmap/" rel="noopener noreferrer"&gt;on the roadmap&lt;/a&gt;), but there is a community-built plugin called &lt;a href="https://obsidian.md/plugins?id=dataview" rel="noopener noreferrer"&gt;Dataview&lt;/a&gt; that offers most of what I was looking for.&lt;/p&gt;

&lt;p&gt;Dataview works by parsing code blocks starting with&lt;br&gt;
&lt;br&gt;
 ```&lt;code&gt;dataview&lt;/code&gt; containing what they call Dataview Query Language (it's essentially SQL) and renders them based on that query. The query contains statements allowing you to do parsing, filtering, sorting, and grouping. It even has some limited support for expressions and function-calling.&lt;/p&gt;

&lt;p&gt;I currently use Dataview for rendering tables of my media backlog, trips, and events.&lt;/p&gt;

&lt;p&gt;Below you'll find an example of a Dataview table note I created and how it renders. The query essentially translates to: build a table with three columns (title, year, rating) made up of all notes of category "films" (excluding the template note), and sort by &lt;a href="https://www.imdb.com/" rel="noopener noreferrer"&gt;IMDb&lt;/a&gt; rating.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

table without id
    string("[[" + file.path + "|" + title + "]]") as Title,
    year as Released,
    apiRating as "IMDb"
where
    contains(category,[[Films]]) and
    !contains(file.name,"template")
sort apirating desc


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsmku5urftfgbcef2x5w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsmku5urftfgbcef2x5w.png" alt="Screenshot of the " width="800" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Journaling to clear my mind
&lt;/h2&gt;

&lt;p&gt;I have a confession to make. Before 2024 I'd never try journaling. I decided to give it a try early last year and it's been useful so far! It helps me remember what I do on a day-to-day basis, track illnesses like the flu, and put nagging thoughts in order. On that first point, it's already helping me quickly answer questions like "when was the last time I chatted with so-and-so, and what did we talk about?" (the search and the backlink functionalities of Obsidian doing the heavy-lifting).&lt;/p&gt;

&lt;p&gt;Since I was planning to do journaling every day, I wanted to make the process as streamlined and easy as possible for me, as to remove any cognitive friction that would push me towards skipping a day (or ten). This is the workflow I ended up building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A template for the daily notes (&lt;code&gt;_meta/templates/Daily template&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;The built-in &lt;a href="https://help.obsidian.md/Plugins/Daily+notes" rel="noopener noreferrer"&gt;Daily notes plugin&lt;/a&gt; to manage and format daily notes (&lt;code&gt;YYYY/MM-MMMM/YYYY-MM-DD-dddd&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://obsidian.md/plugins?id=calendar" rel="noopener noreferrer"&gt;Calendar plugin&lt;/a&gt; to add a calendar in the sidebar that links to the relevant daily notes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez0l9db7dnxremo9oxjj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez0l9db7dnxremo9oxjj.png" alt="Screenshot of the Calendar plugin" width="589" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujf2hvlmplo6tx2amtwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujf2hvlmplo6tx2amtwi.png" alt="Screenshot of the template I use for daily journaling" width="800" height="519"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;And now, two years with Obsidian, here are my takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The effort of migrating to yet another app was daunting, but now that all my notes are in an open format, I'm much less concerned about any hypothetical migration in the future (if this app were to cease development, for example).&lt;/li&gt;
&lt;li&gt;It's not necessary to come up with all the notes you'll ever need right away. Managing personal notes is a marathon, not a sprint. In fact, it's better to wait until the time that you need a particular note created to create it (instead of trying to proactively come up with future use-cases that haven't come to pass yet).&lt;/li&gt;
&lt;li&gt;Reinventing the wheel is not always the best use of time. It's worth looking if someone else built a similar pipeline, plugin, or system that gets you closer to what you want to achieve.&lt;/li&gt;
&lt;li&gt;Keeping up with challenges (like journaling) is much easier if you reduce the friction necessary to complete these challenges. Make it so easy that skipping a day would sound silly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I even have a small backlog of improvement ideas for the future:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Play around with different themes and styles (I'm still using the default theme).&lt;/li&gt;
&lt;li&gt;Build a sort of "CRM" (using that term very loosely) to help me maintain relationships better.&lt;/li&gt;
&lt;li&gt;Look into plugins to do task management and habit tracking.&lt;/li&gt;
&lt;li&gt;Build a small script that could pull weather data into my daily notes.&lt;/li&gt;
&lt;li&gt;Write periodic year and quarter retrospective notes (highlights, trips taken, people hung out with, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you use Obsidian, or if you are thinking of giving it a try, I would love to hear how you approach note-taking!&lt;/p&gt;

</description>
      <category>obsidian</category>
      <category>notetaking</category>
      <category>notes</category>
      <category>information</category>
    </item>
    <item>
      <title>Leverage open models like Gemma 2 on GKE with LangChain</title>
      <dc:creator>Olivier Bourgeois</dc:creator>
      <pubDate>Thu, 06 Feb 2025 19:13:57 +0000</pubDate>
      <link>https://dev.to/googlecloud/leverage-open-models-like-gemma-2-on-gke-with-langchain-29ki</link>
      <guid>https://dev.to/googlecloud/leverage-open-models-like-gemma-2-on-gke-with-langchain-29ki</guid>
      <description>&lt;p&gt;In my previous posts, we explored how &lt;a href="https://dev.to/googlecloud/simplify-development-of-ai-powered-applications-with-langchain-2pob"&gt;LangChain simplifies AI application development&lt;/a&gt; and how to &lt;a href="https://dev.to/googlecloud/deploy-gemini-powered-langchain-applications-on-gke-42la"&gt;deploy Gemini-powered LangChain applications on GKE&lt;/a&gt;. Now, let's take a look at a slightly different approach: running your own instance of &lt;a href="https://ai.google.dev/gemma" rel="noopener noreferrer"&gt;Gemma&lt;/a&gt;, Google's open large language model, directly within your GKE cluster and integrating it with LangChain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why choose Gemma on GKE?
&lt;/h2&gt;

&lt;p&gt;While using an LLM endpoint like Gemini is convenient, running an open model like Gemma 2 on your GKE cluster can offer several advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Control:&lt;/strong&gt; You have complete control over the model, its resources, and its scaling. This is particularly important for applications with strict performance or security requirements.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customization:&lt;/strong&gt; You can fine-tune the model on your own datasets to optimize it for specific tasks or domains.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost optimization:&lt;/strong&gt; For high-volume usage, running your own instance can potentially be more cost-effective than using the API.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data locality:&lt;/strong&gt; Keep your data and model within your controlled environment, which can be crucial for compliance and privacy.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experimentation:&lt;/strong&gt; You can experiment with the latest research and techniques without being limited by the API's features.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deploying Gemma on GKE
&lt;/h2&gt;

&lt;p&gt;Deploying Gemma on GKE involves several steps, from setting up your GKE cluster to configuring LangChain to use your Gemma instance as its LLM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Set up credentials
&lt;/h3&gt;

&lt;p&gt;To be able to use the Gemma 2 model, you first need a &lt;a href="https://huggingface.co/" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; account. Start by creating one if you don't already have one, and create a token key with &lt;code&gt;read&lt;/code&gt; permissions from &lt;a href="https://huggingface.co/settings/tokens" rel="noopener noreferrer"&gt;your settings page&lt;/a&gt;. Make sure to note down the token value, which we'll need in a bit.&lt;/p&gt;

&lt;p&gt;Then, go to the &lt;a href="https://www.kaggle.com/models/google/gemma" rel="noopener noreferrer"&gt;model consent page&lt;/a&gt; to accept the terms and conditions of using the Gemma 2 model. Once that is done, we're ready to deploy our open model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Set up your GKE Cluster
&lt;/h3&gt;

&lt;p&gt;If you don't already have a GKE cluster, you can create one through the &lt;a href="https://console.cloud.google.com/kubernetes" rel="noopener noreferrer"&gt;Google Cloud Console&lt;/a&gt; or using the &lt;code&gt;gcloud&lt;/code&gt; command-line tool. Make sure to choose a machine type with sufficient resources to run Gemma, such as the &lt;code&gt;g2-standard&lt;/code&gt; family which includes an attached NVIDIA L4 GPU. To simplify this, we can simply create a &lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview" rel="noopener noreferrer"&gt;GKE Autopilot cluster&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud container clusters create-auto langchain-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;PROJECT_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploy a Gemma 2 instance
&lt;/h3&gt;

&lt;p&gt;For this example we'll be deploying an instruction-tuned instance of Gemma 2 using a vLLM image. The following manifest describes a deployment and corresponding service for the &lt;code&gt;gemma-2-2b-it&lt;/code&gt; model. Replace &lt;code&gt;HUGGINGFACE_TOKEN&lt;/code&gt; with the token you generated earlier.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemma-deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemma-server&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemma-server&lt;/span&gt;
        &lt;span class="na"&gt;ai.gke.io/model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemma-2-2b-it&lt;/span&gt;
        &lt;span class="na"&gt;ai.gke.io/inference-server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm&lt;/span&gt;
        &lt;span class="na"&gt;examples.ai.gke.io/source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model-garden&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inference-server&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250114_0916_RC00_maas&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;34Gi&lt;/span&gt;
            &lt;span class="na"&gt;ephemeral-storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10Gi&lt;/span&gt;
            &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
          &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;34Gi&lt;/span&gt;
            &lt;span class="na"&gt;ephemeral-storage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10Gi&lt;/span&gt;
            &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
        &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;python&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-m&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;vllm.entrypoints.api_server&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--host=0.0.0.0&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--port=8000&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--model=google/gemma-2-2b-it&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--tensor-parallel-size=1&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--swap-space=16&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--gpu-memory-utilization=0.95&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--enable-chunked-prefill&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;--disable-log-stats&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MODEL_ID&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;google/gemma-2-2b-it&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DEPLOY_SOURCE&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UI_NATIVE_MODEL"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HUGGING_FACE_HUB_TOKEN&lt;/span&gt;
          &lt;span class="na"&gt;valueFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;secretKeyRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hf-secret&lt;/span&gt;
              &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hf_api_token&lt;/span&gt;
        &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/dev/shm&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dshm&lt;/span&gt;
      &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dshm&lt;/span&gt;
        &lt;span class="na"&gt;emptyDir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;medium&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Memory&lt;/span&gt;
      &lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;cloud.google.com/gke-accelerator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia-l4&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemma-server&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ClusterIP&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
    &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8000&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hf-secret&lt;/span&gt;
&lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Opaque&lt;/span&gt;
&lt;span class="na"&gt;stringData&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hf_api_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;HUGGINGFACE_TOKEN&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save this to a file called &lt;code&gt;gemma-2-deployment.yaml&lt;/code&gt;, then deploy it to your cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; gemma-2-deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deploying LangChain on GKE
&lt;/h2&gt;

&lt;p&gt;Now that we have our GKE cluster and Gemma deployed, we need to create our LangChain application and deploy it. If you've followed my previous post, you'll notice that these steps are very similar. The main differences are that we're pointing LangChain to Gemma instead of Gemini, and that our LangChain application uses a &lt;a href="https://python.langchain.com/docs/how_to/custom_llm/" rel="noopener noreferrer"&gt;custom LLM class&lt;/a&gt; to ingest our local instance of Gemma.&lt;/p&gt;

&lt;h3&gt;
  
  
  Containerize your LangChain application
&lt;/h3&gt;

&lt;p&gt;First, we need to package our LangChain application into a Docker container. This involves creating a &lt;code&gt;Dockerfile&lt;/code&gt; that specifies the environment and dependencies for our application. Here is a Python application using LangChain and Gemma, which we'll save as &lt;code&gt;app.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.callbacks.manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CallbackManagerForLLMRun&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.language_models.llms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VLLMServerLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LLM&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;vllm_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;

    &lt;span class="nd"&gt;@property&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_llm_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vllm_server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;run_manager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CallbackManagerForLLMRun&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vllm_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;json_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;predictions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;json_response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
              &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;predictions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
              &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unexpected response format from vLLM server: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RequestException&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error communicating with vLLM server: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;KeyError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
          &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error parsing vLLM server response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Response was: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VLLMServerLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vllm_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://llm-service:8000/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that answers questions about a given topic.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_app&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/ask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;talkToGemini&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_app&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, create a &lt;code&gt;Dockerfile&lt;/code&gt; to define how to assemble our image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use an official Python runtime as a parent image&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3-slim&lt;/span&gt;

&lt;span class="c"&gt;# Set the working directory in the container&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="c"&gt;# Copy the current directory contents into the container at /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . /app&lt;/span&gt;

&lt;span class="c"&gt;# Install any needed packages specified in requirements.txt&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Make port 80 available to the world outside this container&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 80&lt;/span&gt;

&lt;span class="c"&gt;# Run app.py when the container launches&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; [ "python", "app.py" ]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For our dependencies, create the &lt;code&gt;requirements.txt&lt;/code&gt; file containing LangChain and a web framework, Flask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;langchain
flask
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, build the container image and push it to &lt;a href="https://cloud.google.com/artifact-registry/docs" rel="noopener noreferrer"&gt;Artifact Registry&lt;/a&gt;. Don't forget to replace &lt;code&gt;PROJECT_ID&lt;/code&gt; with your Google Cloud project ID.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Authenticate with Google Cloud&lt;/span&gt;
gcloud auth login

&lt;span class="c"&gt;# Create the repository&lt;/span&gt;
gcloud artifacts repositories create images &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository-format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us

&lt;span class="c"&gt;# Configure authentication to the desired repository&lt;/span&gt;
gcloud auth configure-docker us-docker.pkg.dev/PROJECT_ID/images

&lt;span class="c"&gt;# Build the image&lt;/span&gt;
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1 &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Push the image&lt;/span&gt;
docker push us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a handful of seconds, your container image should now be stored in your Artifact Registry repository.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploy to GKE
&lt;/h3&gt;

&lt;p&gt;Create a YAML file with your Kubernetes deployment and service manifests. Let's call it &lt;code&gt;deployment.yaml&lt;/code&gt;, replacing &lt;code&gt;PROJECT_ID&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt; &lt;span class="c1"&gt;# Scale as needed&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Add selector here&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-app&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-app&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-container&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-app&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LoadBalancer&lt;/span&gt; &lt;span class="c1"&gt;# Exposes the service externally&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply the manifest to your cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get the context of your cluster&lt;/span&gt;
gcloud container clusters get-credentials langchain-cluster &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1

&lt;span class="c"&gt;# Deploy the manifest&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a deployment with three replicas of your LangChain application and exposes it externally through a load balancer. You can adjust the number of replicas based on your expected load.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interact with your deployed application
&lt;/h3&gt;

&lt;p&gt;Once the service is deployed, you can get the external IP address of your application using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;EXTERNAL_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;kubectl get service/langchain-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.loadBalancer.ingress[0].ip}'&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can now send requests to your LangChain application running on GKE. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"input": "Tell me a fun fact about hummingbirds"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  http://&lt;span class="nv"&gt;$EXTERNAL_IP&lt;/span&gt;/ask
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Considerations and enhancements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scaling:&lt;/strong&gt; You can scale your Gemma deployment independently of your LangChain application based on the load generated by the model.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring:&lt;/strong&gt; Use &lt;a href="https://cloud.google.com/monitoring" rel="noopener noreferrer"&gt;Cloud Monitoring&lt;/a&gt; and &lt;a href="https://cloud.google.com/logging" rel="noopener noreferrer"&gt;Cloud Logging&lt;/a&gt; to track the performance of both Gemma and your LangChain application. Look for error rates, latency, and resource utilization.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning:&lt;/strong&gt; Consider fine-tuning Gemma on your own dataset to improve its performance on your specific use case.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security:&lt;/strong&gt; Implement appropriate security measures, such as network policies and authentication, to protect your Gemma instance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Deploying Gemma on GKE and integrating it with LangChain provides a powerful and flexible way to build AI-powered applications. You gain fine-grained control over your model and infrastructure while still leveraging the developer-friendly features of LangChain. This approach allows you to tailor your setup to your specific needs, whether it's optimizing for performance, cost, or control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explore the &lt;a href="https://ai.google.dev/gemma" rel="noopener noreferrer"&gt;Gemma documentation&lt;/a&gt; for more details on the model and its capabilities.
&lt;/li&gt;
&lt;li&gt;Check out the &lt;a href="https://python.langchain.com/docs/introduction/" rel="noopener noreferrer"&gt;LangChain documentation&lt;/a&gt; for advanced use cases and integrations.
&lt;/li&gt;
&lt;li&gt;Dive deeper into &lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/kubernetes-engine-overview" rel="noopener noreferrer"&gt;GKE documentation&lt;/a&gt; for running production workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the next post, we will take a look at how to streamline LangChain deployments using LangServe.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>ai</category>
      <category>langchain</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>Deploy Gemini-powered LangChain applications on GKE</title>
      <dc:creator>Olivier Bourgeois</dc:creator>
      <pubDate>Tue, 28 Jan 2025 19:38:15 +0000</pubDate>
      <link>https://dev.to/googlecloud/deploy-gemini-powered-langchain-applications-on-gke-42la</link>
      <guid>https://dev.to/googlecloud/deploy-gemini-powered-langchain-applications-on-gke-42la</guid>
      <description>&lt;p&gt;In my previous post, we explored how &lt;a href="https://dev.to/googlecloud/simplify-development-of-ai-powered-applications-with-langchain-2pob"&gt;LangChain simplifies the development of AI-powered applications&lt;/a&gt;. We saw how its modularity, flexibility, and extensibility make it a powerful tool for working with large language models (LLMs) like &lt;a href="https://ai.google.dev/gemini-api" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;. Now, let's take it a step further and see how we can deploy and scale our LangChain applications using the robust infrastructure of &lt;a href="https://cloud.google.com/kubernetes-engine" rel="noopener noreferrer"&gt;Google Kubernetes Engine (GKE)&lt;/a&gt; and the power of Gemini!&lt;/p&gt;

&lt;h2&gt;
  
  
  Why GKE for LangChain?
&lt;/h2&gt;

&lt;p&gt;You might be wondering, "Why bother with Kubernetes? Isn't it complex?" While Kubernetes does have a learning curve (trust me, I've been through that!) GKE simplifies its management significantly by handling the heavy lifting for you so you can focus on your application.&lt;/p&gt;

&lt;p&gt;Here's why GKE is an excellent choice for deploying LangChain applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalability:&lt;/strong&gt; GKE allows you to easily scale your application up or down based on demand. This is crucial for handling fluctuating traffic to your AI-powered features. Imagine your chatbot suddenly going viral - GKE ensures it doesn't crash under the load.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability:&lt;/strong&gt; With GKE, your application runs on a cluster of machines, providing high availability and fault tolerance. If one machine fails, your application keeps running seamlessly.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource efficiency:&lt;/strong&gt; GKE optimizes resource utilization, ensuring your application uses only what it needs. This can lead to cost savings, especially when dealing with resource-intensive LLMs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seamless integration with Google Cloud:&lt;/strong&gt; GKE integrates smoothly with other Google Cloud services like &lt;a href="https://cloud.google.com/storage" rel="noopener noreferrer"&gt;Cloud Storage&lt;/a&gt;, &lt;a href="https://cloud.google.com/sql" rel="noopener noreferrer"&gt;Cloud SQL&lt;/a&gt;, and, importantly, &lt;a href="https://cloud.google.com/vertex-ai" rel="noopener noreferrer"&gt;Vertex AI&lt;/a&gt;, where Gemini and other LLMs are hosted.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versioning and rollbacks:&lt;/strong&gt; GKE allows you to easily manage different versions of your application, making updates and rollbacks a breeze. This is incredibly useful when experimenting with different prompts or model parameters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But that's enough talking, let's build something!&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying LangChain on GKE
&lt;/h2&gt;

&lt;p&gt;Let's walk through an example of deploying a simple LangChain application that uses Gemini on GKE. We'll build a basic service, similar to the example from the previous post, but this time, it will be packaged as a containerized application ready for deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Containerize your LangChain application
&lt;/h3&gt;

&lt;p&gt;First, we need to package our LangChain application into a Docker container. This involves creating a &lt;code&gt;Dockerfile&lt;/code&gt; that specifies the environment and dependencies for our application. Here is a Python application using LangChain and Gemini, which we'll save as &lt;code&gt;app.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_google_genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatGoogleGenerativeAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatGoogleGenerativeAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-1.5-pro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that answers questions about a given topic.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_app&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/ask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;talkToGemini&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_app&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, create a &lt;code&gt;Dockerfile&lt;/code&gt; to define how to assemble our image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use an official Python runtime as a parent image&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3-slim&lt;/span&gt;

&lt;span class="c"&gt;# Set the working directory in the container&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="c"&gt;# Copy the current directory contents into the container at /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . /app&lt;/span&gt;

&lt;span class="c"&gt;# Install any needed packages specified in requirements.txt&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Make port 80 available to the world outside this container&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 80&lt;/span&gt;

&lt;span class="c"&gt;# Run app.py when the container launches&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; [ "python", "app.py" ]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For our dependencies, create the &lt;code&gt;requirements.txt&lt;/code&gt; file containing LangChain and a web framework, Flask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;langchain
langchain-google-genai
flask
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, build the container image and push it to &lt;a href="https://cloud.google.com/artifact-registry/docs" rel="noopener noreferrer"&gt;Artifact Registry&lt;/a&gt;. Don't forget to replace &lt;code&gt;PROJECT_ID&lt;/code&gt; with your Google Cloud project ID.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Authenticate with Google Cloud&lt;/span&gt;
gcloud auth login

&lt;span class="c"&gt;# Create the repository&lt;/span&gt;
gcloud artifacts repositories create images &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository-format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us

&lt;span class="c"&gt;# Configure authentication to the desired repository&lt;/span&gt;
gcloud auth configure-docker us-docker.pkg.dev/PROJECT_ID/images

&lt;span class="c"&gt;# Build the image&lt;/span&gt;
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1 &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Push the image&lt;/span&gt;
docker push us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a handful of seconds, your container image should now be stored in your Artifact Registry repository.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploy to GKE
&lt;/h3&gt;

&lt;p&gt;Now, let's deploy this image to our GKE cluster. You can create a GKE cluster through the &lt;a href="https://console.cloud.google.com/kubernetes" rel="noopener noreferrer"&gt;Google Cloud Console&lt;/a&gt; or using the &lt;code&gt;gcloud&lt;/code&gt; command-line tool, again taking care of replacing &lt;code&gt;PROJECT_ID&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud container clusters create-auto langchain-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;PROJECT_ID &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once your cluster is up and running, create a YAML file with your Kubernetes deployment and service manifests. Let's call it &lt;code&gt;deployment.yaml&lt;/code&gt;, replacing &lt;code&gt;PROJECT_ID&lt;/code&gt; as well as &lt;code&gt;YOUR_GOOGLE_API_KEY&lt;/code&gt; with your &lt;a href="https://ai.google.dev/gemini-api/docs/api-key" rel="noopener noreferrer"&gt;Gemini API key&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-deployment&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt; &lt;span class="c1"&gt;# Scale as needed&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Add selector here&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-app&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-app&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-container&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;us-docker.pkg.dev/PROJECT_ID/images/my-langchain-app:v1&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GOOGLE_API_KEY&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;YOUR_GOOGLE_API_KEY&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;langchain-app&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LoadBalancer&lt;/span&gt; &lt;span class="c1"&gt;# Exposes the service externally&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply the manifest to your cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get the context of your cluster&lt;/span&gt;
gcloud container clusters get-credentials langchain-cluster &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1

&lt;span class="c"&gt;# Deploy the manifest&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a deployment with three replicas of your LangChain application and exposes it externally through a load balancer. You can adjust the number of replicas based on your expected load.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interact with your deployed application
&lt;/h3&gt;

&lt;p&gt;Once the service is deployed, you can get the external IP address of your application using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;EXTERNAL_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;kubectl get service/langchain-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.loadBalancer.ingress[0].ip}'&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can now send requests to your LangChain application running on GKE. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"input": "Tell me a fun fact about hummingbirds"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  http://&lt;span class="nv"&gt;$EXTERNAL_IP&lt;/span&gt;/ask
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Taking it further
&lt;/h2&gt;

&lt;p&gt;This is just a basic example, but you can expand on it in many ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Integrate with other Google Cloud services:&lt;/strong&gt; Use Cloud SQL to store conversation history, or Cloud Storage to load documents for your chatbot to reference.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement more complex LangChain flows:&lt;/strong&gt; Build sophisticated applications with chains, agents, and memory, all running reliably on GKE.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set up CI/CD:&lt;/strong&gt; Automate the build and deployment process using tools like &lt;a href="https://cloud.google.com/build" rel="noopener noreferrer"&gt;Cloud Build&lt;/a&gt; and &lt;a href="https://cloud.google.com/deploy" rel="noopener noreferrer"&gt;Cloud Deploy&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor and optimize:&lt;/strong&gt; Use &lt;a href="https://cloud.google.com/monitoring" rel="noopener noreferrer"&gt;Cloud Monitoring&lt;/a&gt; and &lt;a href="https://cloud.google.com/logging" rel="noopener noreferrer"&gt;Cloud Logging&lt;/a&gt; to track the performance and health of your application.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Continue your journey
&lt;/h2&gt;

&lt;p&gt;Deploying LangChain applications on GKE with Gemini unlocks a new level of scalability, reliability, and efficiency. You can now build and run powerful AI-powered applications that can handle real-world demands. By combining the developer-friendly nature of LangChain, the power of Gemini, and the robustness of GKE, you have all the tools you need to create truly impressive and impactful applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dive deeper into the &lt;a href="https://cloud.google.com/kubernetes-engine/docs" rel="noopener noreferrer"&gt;GKE documentation&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;Explore the &lt;a href="https://cloud.google.com/vertex-ai/docs" rel="noopener noreferrer"&gt;Vertex AI documentation&lt;/a&gt; for more advanced LLM management and deployment options.
&lt;/li&gt;
&lt;li&gt;Check out the &lt;a href="https://python.langchain.com/docs/get_started/introduction" rel="noopener noreferrer"&gt;LangChain documentation&lt;/a&gt; for more complex use cases and examples.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a future post, I will look into using an open model called Gemma!&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>ai</category>
      <category>langchain</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>Simplify development of AI-powered applications with LangChain</title>
      <dc:creator>Olivier Bourgeois</dc:creator>
      <pubDate>Tue, 01 Oct 2024 15:10:38 +0000</pubDate>
      <link>https://dev.to/googlecloud/simplify-development-of-ai-powered-applications-with-langchain-2pob</link>
      <guid>https://dev.to/googlecloud/simplify-development-of-ai-powered-applications-with-langchain-2pob</guid>
      <description>&lt;p&gt;Large language models (LLMs) like &lt;a href="https://ai.google.dev/gemini-api" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; can generate human-quality text, translate languages, and answer questions in an informative way. But writing applications that use these LLMs effectively can be tricky, and models all have their own distinct APIs and supported features. That’s where &lt;a href="https://www.langchain.com/" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is LangChain?
&lt;/h2&gt;

&lt;p&gt;LangChain is an open source framework designed to help developers build applications that use LLMs. It provides a standardized interface and set of tools for interacting with a variety of different LLMs, making it easier to incorporate them into your applications. Think of it like a universal adapter that lets you plug in any LLM and start using it with a consistent set of commands. This simplifies development by abstracting away the complexities of individual LLM APIs and allowing you to focus on building your application logic.&lt;/p&gt;

&lt;p&gt;With LangChain, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use different language models&lt;/strong&gt; by easily switching between multiple models without rewriting your application logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect to various data sources&lt;/strong&gt; such as documents and databases to provide context and grounding to the LLM responses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create complex flows&lt;/strong&gt; by chaining together multiple pre-built components.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engage in dynamic conversations&lt;/strong&gt; by building chatbots that can remember past interactions and user preferences.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access and manage external knowledge&lt;/strong&gt; by integrating with APIs and other sources of real-time information.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why would you use LangChain?
&lt;/h2&gt;

&lt;p&gt;Imagine you're building a chatbot that needs to answer questions based on your company's internal documents. You have to write custom code to load those documents, format them for the LLM, send the API request, parse the response, and potentially even handle errors. Now, imagine needing to do this across multiple projects with different LLMs and data sources. That's a lot of repetitive and complex code!&lt;/p&gt;

&lt;p&gt;LangChain simplifies the development process of LLM-powered applications by abstracting away shared concepts between models, by providing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Modularity&lt;/strong&gt; by breaking down your application into reusable components.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexibility&lt;/strong&gt; by being able to easily swap out models and components.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensibility&lt;/strong&gt; by allowing you to customize and extend the framework based on your needs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This allows you to focus on your application logic instead of reinventing the wheel.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you use LangChain?
&lt;/h2&gt;

&lt;p&gt;Getting started is simple! LangChain is available as a package for both &lt;a href="https://python.langchain.com/docs/introduction/" rel="noopener noreferrer"&gt;Python&lt;/a&gt; and &lt;a href="https://js.langchain.com/docs/introduction/" rel="noopener noreferrer"&gt;JavaScript&lt;/a&gt;, and offers extensive documentation and resources. In addition, the LangChain developer community is vast and lots of bindings have been created for other languages, such as &lt;a href="https://docs.langchain4j.dev/" rel="noopener noreferrer"&gt;LangChain4j&lt;/a&gt; for Java. To see which LLM models (and related features) are supported by LangChain, you can take a look at the &lt;a href="https://python.langchain.com/docs/integrations/llms/" rel="noopener noreferrer"&gt;official tables for LLM models&lt;/a&gt; and for &lt;a href="https://python.langchain.com/docs/integrations/chat/" rel="noopener noreferrer"&gt;chat models&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Let’s take a look at how quickly you can get a LangChain application running in Python. In this example we’ll use the latest Gemini Pro model, but the steps are similar for any model you choose.&lt;/p&gt;

&lt;p&gt;First, you install the required packages. In our case, the core LangChain package as well as the LangChain Google AI package.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain langchain-google-genai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then set your Gemini API key, which you can generate following &lt;a href="https://ai.google.dev/gemini-api/docs/api-key" rel="noopener noreferrer"&gt;these instructions&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GOOGLE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-api-key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And with only a few lines of code, you have a working Q&amp;amp;A application powered by both templating and chaining features!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_google_genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatGoogleGenerativeAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatGoogleGenerativeAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-1.5-pro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that creates poems in {input_language} containing {line_count} lines about a given topic.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{input}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_language&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;French&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;line_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Google Cloud&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Try it out on Google Cloud!
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/" rel="noopener noreferrer"&gt;Google Cloud&lt;/a&gt; is a great platform for developing and running enterprise-ready LangChain applications. With powerful compute resources, seamless integration with other Google Cloud services, and an extensive collection of pre-hosted LLMs to choose from in &lt;a href="https://cloud.google.com/model-garden" rel="noopener noreferrer"&gt;Model Garden on Vertex AI&lt;/a&gt;, you have everything you need to build and deploy your AI-powered applications.&lt;/p&gt;

&lt;p&gt;Explore the following resources to get started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/reasoning-engine/overview" rel="noopener noreferrer"&gt;Documentation - LangChain on Vertex AI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/products/databases/build-rag-applications-with-langchain-and-google-cloud" rel="noopener noreferrer"&gt;Article - Build supercharged gen AI applications with LangChain and Google Cloud databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=l7tNx52bnsc" rel="noopener noreferrer"&gt;Video - Building generative AI apps on Google Cloud with LangChain&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a later post, I will take a look at how you can use LangChain to connect to a local Gemma instance, all running in a &lt;a href="https://cloud.google.com/kubernetes-engine/" rel="noopener noreferrer"&gt;Google Kubernetes Engine (GKE)&lt;/a&gt; cluster.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>ai</category>
      <category>langchain</category>
      <category>googlecloud</category>
    </item>
  </channel>
</rss>
