<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Greg Mate</title>
    <description>The latest articles on DEV Community by Greg Mate (@gerimate).</description>
    <link>https://dev.to/gerimate</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1079609%2Feb9cc87f-0fcf-4df9-bf8e-5ce57cfc4060.png</url>
      <title>DEV Community: Greg Mate</title>
      <link>https://dev.to/gerimate</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gerimate"/>
    <language>en</language>
    <item>
      <title>Your AI agent is leaking memory across users. Here's why and how to stop it</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Mon, 22 Jun 2026 11:13:06 +0000</pubDate>
      <link>https://dev.to/gerimate/the-part-most-agent-demos-skip-acting-as-a-specific-user-with-real-memory-3b4h</link>
      <guid>https://dev.to/gerimate/the-part-most-agent-demos-skip-acting-as-a-specific-user-with-real-memory-3b4h</guid>
      <description>&lt;p&gt;&lt;strong&gt;Most agent demos connect to a CRM and update a record. Impressive in a presentation. Broken the moment a second user shows up.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The hard part is not the tool call. It is acting as the &lt;em&gt;right person&lt;/em&gt;, remembering only &lt;em&gt;their&lt;/em&gt; context, and making sure nothing leaks across users. This is the part that gets hand-waved in demos because it is genuinely annoying to get right.&lt;/p&gt;

&lt;p&gt;We ran into this building a reference implementation for the &lt;a href="https://luma.com/83kxpwnj" rel="noopener noreferrer"&gt;Scalekit x Actian x Render Agents in Production Hackathon&lt;/a&gt; in San Francisco on June 27. Here is what we learned.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with agent memory in multi-user systems
&lt;/h2&gt;

&lt;p&gt;When you build an agent that acts on behalf of a user, there are two separate scoping problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is the agent &lt;em&gt;allowed to do&lt;/em&gt; on their behalf (identity, permissions, tokens)&lt;/li&gt;
&lt;li&gt;What does the agent &lt;em&gt;remember&lt;/em&gt; about them (context, history, prior decisions)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most implementations treat these as the same problem. They are not. &lt;a href="https://www.scalekit.com/" rel="noopener noreferrer"&gt;Scalekit&lt;/a&gt; handles the first one well. &lt;a href="https://www.actian.com/databases/vectorai-db/" rel="noopener noreferrer"&gt;VectorAI DB&lt;/a&gt; handles the second. But getting them to agree on who the current user &lt;em&gt;is&lt;/em&gt; requires deliberate wiring.&lt;/p&gt;

&lt;p&gt;The naive approach, a single shared vector index for all users, fails quietly. Alice's agent starts pulling context that belongs to Bob. Nothing crashes. No errors. The output just gets subtly wrong in ways that are hard to debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  One collection per user
&lt;/h2&gt;

&lt;p&gt;VectorAI DB does not have a native multi-tenancy API. There are no user-scoped namespaces, no per-user tokens, no RBAC on individual collections. The Community Edition ships one isolation primitive: collections.&lt;/p&gt;

&lt;p&gt;So the pattern is simple. One collection per user, named after the same identifier your auth layer already uses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;actian_vectorai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VectorAIClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VectorParams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CollectionExistsError&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VectorAIClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost:6574&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_or_create_user_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-memories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collections&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;vectors_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;VectorParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Distance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Cosine&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;CollectionExistsError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key detail: whatever string Scalekit uses as the user identifier for its connected account is the string you pass here. One source of truth, no mapping table, no sync to maintain.&lt;/p&gt;

&lt;p&gt;Two things you cannot change after collection creation: vector dimension and distance metric. Pick your embedding model before you create any collections. Changing either later requires deleting the collection and losing all its data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting it running locally
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;actian-vectorai-client

docker pull actian/vectorai:latest
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; vectorai &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ./local_data:/var/lib/actian-vectorai &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 6573-6575:6573-6575 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;ACTIAN_VECTORAI_ACCEPT_EULA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YES &lt;span class="se"&gt;\&lt;/span&gt;
  actian/vectorai:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The container will not start without &lt;code&gt;ACTIAN_VECTORAI_ACCEPT_EULA=YES&lt;/code&gt;. No error, just an immediate exit with code 1.&lt;/p&gt;

&lt;p&gt;One thing that will trip you up: the pip package is &lt;code&gt;actian-vectorai-client&lt;/code&gt; but the import is &lt;code&gt;actian_vectorai&lt;/code&gt;. Different strings. It will fail at import time if you use the package name.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;actian_vectorai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VectorAIClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VectorParams&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Distance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PointStruct&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VectorAIClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost:6574&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The dependency conflict nobody warns you about
&lt;/h2&gt;

&lt;p&gt;If you are combining this with &lt;code&gt;scalekit-sdk-python&lt;/code&gt;, you will hit a dependency conflict that is not version-specific and not obvious from the error message.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;scalekit-sdk-python==2.12.0&lt;/code&gt; pins &lt;code&gt;protobuf&amp;lt;7.0.0&lt;/code&gt;. &lt;code&gt;actian-vectorai-client&lt;/code&gt; needs &lt;code&gt;protobuf&amp;gt;=6.31.1&lt;/code&gt;. When pip resolves this, it downgrades protobuf, and then &lt;code&gt;actian_vectorai&lt;/code&gt; fails at import time with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;google.protobuf.runtime_version.VersionError: Detected incompatible Protobuf 
Gencode/Runtime versions when loading actian_vectorai_common.proto: 
gencode 6.31.1 runtime 5.29.6.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install everything except scalekit normally&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; scalekit-sdk-python requirements.txt &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /tmp/req.txt
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; /tmp/req.txt

&lt;span class="c"&gt;# Then install scalekit without its dependency resolution&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;scalekit-sdk-python&lt;span class="o"&gt;==&lt;/span&gt;2.12.0 &lt;span class="nt"&gt;--no-deps&lt;/span&gt;

&lt;span class="c"&gt;# Explicitly reinstate the versions actian-vectorai-client needs&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"protobuf&amp;gt;=6.31.1"&lt;/span&gt; &lt;span class="s2"&gt;"grpcio-status&amp;gt;=1.67.0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works because scalekit's &lt;code&gt;&amp;lt;1.67&lt;/code&gt; grpcio-status constraint is stale metadata. At runtime, the newer versions are compatible. The &lt;code&gt;--no-deps&lt;/code&gt; flag skips the constraint check. Not a blessed install path from Scalekit's side, but it works and the combination has been stable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The capacity behavior you should know before demo day
&lt;/h2&gt;

&lt;p&gt;Community Edition caps at 5,000 vectors total, across all your collections combined. That is not the surprising part.&lt;/p&gt;

&lt;p&gt;The surprising part: the cap is enforced asynchronously. Writes succeed past the limit. About 30 seconds later, a background enforcement task runs and blocks further writes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CapacityExceededError: Vector capacity exceeded: 5,005 vectors stored, 
limit is 5,000. Delete vectors or upgrade your licence to continue.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During a demo, this means your inserts can succeed, your reads can succeed, and then your next write silently fails half a minute later with no indication of why at the point of the call. Worth knowing before you have 10 people watching.&lt;/p&gt;

&lt;p&gt;The 30-day trial unlocks 1 million vectors. Get that set up before the day if you are planning anything beyond a few users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying to Render
&lt;/h2&gt;

&lt;p&gt;VectorAI DB is Docker-only right now, which &lt;a href="https://render.com/" rel="noopener noreferrer"&gt;Render&lt;/a&gt; handles natively. Pull &lt;code&gt;actian/vectorai:latest&lt;/code&gt; directly as a private Docker service, no custom Dockerfile needed.&lt;/p&gt;

&lt;p&gt;Two things the service needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ACTIAN_VECTORAI_ACCEPT_EULA=YES&lt;/code&gt; as an environment variable&lt;/li&gt;
&lt;li&gt;A persistent disk mounted at &lt;code&gt;/var/lib/actian-vectorai&lt;/code&gt;, or you lose all data on every redeploy. This cannot be set via &lt;code&gt;render.yaml&lt;/code&gt; on an existing service. It has to be added manually through the Render dashboard.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep the VectorAI DB service private, not public-facing. Your agent app connects to it over Render's internal network at &lt;code&gt;vectorai-db:6574&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build this at the hackathon
&lt;/h2&gt;

&lt;p&gt;On June 27 in San Francisco, Scalekit, Actian, and Render are running a build day focused on agents that act as real users with real permissions. If this is the problem you want to work on, &lt;a href="https://luma.com/83kxpwnj" rel="noopener noreferrer"&gt;register here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Our team will be on-site all day. The participant guide is live &lt;a href="https://gerimate.github.io/actian-hackathon-guide/" rel="noopener noreferrer"&gt;here&lt;/a&gt; with the install commands, the per-user pattern, and everything else in this post in a format you can keep open during the build.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>python</category>
      <category>docker</category>
    </item>
    <item>
      <title>I Built a Python Agent That Uses a Vector DB as Memory, Not Retrieval</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Thu, 11 Jun 2026 17:37:12 +0000</pubDate>
      <link>https://dev.to/gerimate/i-built-a-python-agent-that-uses-a-vector-db-as-memory-not-retrieval-135e</link>
      <guid>https://dev.to/gerimate/i-built-a-python-agent-that-uses-a-vector-db-as-memory-not-retrieval-135e</guid>
      <description>&lt;p&gt;&lt;strong&gt;Vector databases are almost always talked about in the context of RAG. Store your documents, embed them, retrieve the relevant chunks at inference time. That's the default pattern and it works — until it doesn't.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;I've been working on &lt;a href="https://www.actian.com/databases/vectorai-db/" rel="noopener noreferrer"&gt;Actian VectorAI DB&lt;/a&gt; and started wondering: what if the vector DB isn't a document store at all? What if it's a memory layer for an agent?&lt;/p&gt;

&lt;p&gt;So I built it to find out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;The distinction sounds subtle but it matters. In a classic RAG setup, you pre-load a vector store with documents. The corpus is static. The agent queries it but never changes it.&lt;/p&gt;

&lt;p&gt;What I wanted to build was different. An agent that writes to the vector store as it runs — storing every interaction as a vector — and then searches its own past conversations semantically when it needs context. The corpus is built from the agent's own history, not from documents you loaded upfront.&lt;/p&gt;

&lt;p&gt;The agent is the author of its own knowledge base.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.vectoraidb.actian.com/home/quickstart/quickstart" rel="noopener noreferrer"&gt;Everything runs locally&lt;/a&gt;. No cloud, no external API calls, nothing leaving the machine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Actian VectorAI DB&lt;/strong&gt;: vector store and semantic search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama + llama3.2&lt;/strong&gt;: local LLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.vectoraidb.actian.com/academy/tutorials/leverage-open-source-embedding-models" rel="noopener noreferrer"&gt;BAAI/bge-small-en-v1.5&lt;/a&gt;&lt;/strong&gt;: embedding model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python&lt;/strong&gt;: the glue&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fully local constraint wasn't just a preference, rather the core to the premise. If the agent is storing personal memory, it shouldn't be doing it in someone else's cloud.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;Every time you send the agent a message, it does four things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Embeds your message as a vector&lt;/li&gt;
&lt;li&gt;Searches VectorAI DB for semantically similar past interactions&lt;/li&gt;
&lt;li&gt;Injects the relevant memories into the system prompt&lt;/li&gt;
&lt;li&gt;Responds, then stores the full exchange back into VectorAI DB&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;See:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process a user message and return the assistant reply.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Embed the incoming message for semantic search
&lt;/span&gt;    &lt;span class="n"&gt;query_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Recall semantically relevant memories (cross-session by default).
&lt;/span&gt;    &lt;span class="c1"&gt;# score_threshold=0.50 prevents loosely-related memories from being injected
&lt;/span&gt;    &lt;span class="c1"&gt;# as context. min_importance=0.5 excludes low-confidence episodic fragments
&lt;/span&gt;    &lt;span class="c1"&gt;# (episodes are stored at 0.3, explicit facts at 0.9).
&lt;/span&gt;    &lt;span class="n"&gt;past_memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_vec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;score_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Build system prompt with injected memories
&lt;/span&gt;    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_system_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;past_memories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 4. Extend short-term conversation window
&lt;/span&gt;    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# 5. Call the local LLM via Ollama
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;assistant_reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="c1"&gt;# 6. Append reply to short-term window
&lt;/span&gt;    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;assistant_reply&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c1"&gt;# 7. Persist this exchange as an episodic long-term memory
&lt;/span&gt;    &lt;span class="c1"&gt;# Episodic importance is kept low (0.3) intentionally: the agent's own
&lt;/span&gt;    &lt;span class="c1"&gt;# replies may contain errors or hallucinations. Explicit facts stored via
&lt;/span&gt;    &lt;span class="c1"&gt;# remember_fact() use importance=0.9 and will always rank above episodes.
&lt;/span&gt;    &lt;span class="n"&gt;memory_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User said: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Agent replied: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;assistant_reply&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;memory_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remember&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_vec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;memory_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;importance&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;assistant_reply&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The search is cross-session by default. A memory from last Tuesday will surface today if it's semantically close enough to what you're asking. The collection lives on disk via Docker volume so it persists across restarts.&lt;/p&gt;

&lt;p&gt;There's also a &lt;code&gt;remember: &amp;lt;fact&amp;gt;&lt;/code&gt; command to store explicit high-importance facts at a higher importance score, separately from the episodic conversation log.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Broke Along the Way
&lt;/h2&gt;

&lt;p&gt;The embedding model defaulted to a HuggingFace download on first run, which immediately broke the fully local setup. Fixed it by loading the model with &lt;code&gt;local_files_only=True&lt;/code&gt; and requiring a one-time manual download before the first run — so the embedding step is fully offline on every subsequent run.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Memory Decay Problem
&lt;/h2&gt;

&lt;p&gt;The first version had a flat importance score for every interaction. Every exchange stored at &lt;code&gt;0.6&lt;/code&gt;, explicit facts at &lt;code&gt;0.9&lt;/code&gt;. No decay, no forgetting — the collection just grew indefinitely. That's fine as a proof of concept but it's not how memory actually works. Old, rarely referenced memories shouldn't compete equally with recent, frequently accessed ones.&lt;/p&gt;

&lt;p&gt;So I added importance-weighted decay. Every memory now gets scored on four signals before being returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;age_hours&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;
&lt;span class="n"&gt;recency&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;age_hours&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;168&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# half-life ~1 week
&lt;/span&gt;&lt;span class="n"&gt;freq&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;access_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# saturates at 10 accesses
&lt;/span&gt;
&lt;span class="n"&gt;final_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="mf"&gt;0.6&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;cosine_similarity&lt;/span&gt;
  &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;importance&lt;/span&gt;
  &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;recency&lt;/span&gt;
  &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;access_frequency&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cosine similarity still does the heavy lifting — it has to, otherwise semantically irrelevant memories would surface. But recency and access frequency now influence ranking. A memory from six weeks ago that's never been referenced again will lose ground to a recent one, even if the raw cosine similarity is similar.&lt;/p&gt;

&lt;p&gt;The weights and half-life are module-level constants so they're easy to tune without touching the logic.&lt;/p&gt;

&lt;p&gt;The recall path also tracks access — every time a memory surfaces in a query, its &lt;code&gt;access_count&lt;/code&gt; increments and &lt;code&gt;last_accessed&lt;/code&gt; updates. Memories that keep coming up stay relevant. Ones that don't, fade.&lt;/p&gt;

&lt;p&gt;Here's what the ranked output looks like against four synthetic test memories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Rank  Score    Imp   Content
  1   0.9135   0.9   recent + high access (1 hr old, 8 accesses)
  2   0.6776   0.9   old + high importance (30 days, 0 accesses)
  3   0.6704   0.3   recent + no access (2 hrs old, 0 accesses)
  4   0.5112   0.3   old + low importance (60 days, 0 accesses)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The recent, frequently accessed memory dominates. The old, low-importance one drops to the bottom regardless of semantic similarity. That's the behavior you want from something calling itself memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hallucination Problem
&lt;/h2&gt;

&lt;p&gt;Persistent memory introduces a risk that RAG pipelines don't have in the same way: if the agent hallucinates something and stores it, that hallucination gets recalled as a confident memory in the next session. The wrong information compounds.&lt;/p&gt;

&lt;p&gt;Three risks needed fixing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The LLM had no instruction to stay within recalled memories.&lt;/strong&gt; The original system prompt said "use these memories when relevant" — permissive enough that the model would freely supplement from its training data when memory was thin. Three explicit rules were added: only use facts from the listed memories for personal claims, say "I don't know" when no memory covers a question, and never infer or guess personal details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hallucinated replies were stored and recalled as truth.&lt;/strong&gt; Every exchange was stored at &lt;code&gt;importance=0.6&lt;/code&gt;, meaning a hallucinated reply could be recalled next session and treated as a confident memory. Episodic importance was lowered to &lt;code&gt;0.3&lt;/code&gt; — well below explicit facts at &lt;code&gt;0.9&lt;/code&gt; — so bad replies can never outrank things the user deliberately told the agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weakly-matched memories were being injected as context.&lt;/strong&gt; The &lt;a href="https://docs.vectoraidb.actian.com/docs/fundamentals/semantic-search/score-threshold-search-task#score-threshold-search" rel="noopener noreferrer"&gt;recall threshold&lt;/a&gt; was low enough to pull in semantically distant memories that could mislead the LLM. The threshold was raised and a &lt;code&gt;min_importance&lt;/code&gt; filter added so episodic fragments are excluded from injection entirely. Only explicitly stored facts ever reach the LLM.&lt;/p&gt;

&lt;p&gt;The importance ladder now looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;importance&lt;/span&gt;=&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;9&lt;/span&gt;  -&amp;gt;  &lt;span class="n"&gt;explicit&lt;/span&gt; &lt;span class="n"&gt;facts&lt;/span&gt; (&lt;span class="n"&gt;remember&lt;/span&gt;: &amp;lt;&lt;span class="n"&gt;fact&lt;/span&gt;&amp;gt;)   &lt;span class="n"&gt;always&lt;/span&gt; &lt;span class="n"&gt;recalled&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; ≥ &lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;50&lt;/span&gt;
&lt;span class="n"&gt;importance&lt;/span&gt;=&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;5&lt;/span&gt;  -&amp;gt;  &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;min_importance&lt;/span&gt; &lt;span class="n"&gt;gate&lt;/span&gt;             &amp;lt;- &lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;
&lt;span class="n"&gt;importance&lt;/span&gt;=&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;3&lt;/span&gt;  -&amp;gt;  &lt;span class="n"&gt;episodic&lt;/span&gt; &lt;span class="n"&gt;exchanges&lt;/span&gt; (&lt;span class="n"&gt;chat&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;)   &lt;span class="n"&gt;never&lt;/span&gt; &lt;span class="n"&gt;recalled&lt;/span&gt;, &lt;span class="n"&gt;never&lt;/span&gt; &lt;span class="n"&gt;injected&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A test suite with 5 offline pytest tests guards all three risks — mocking both the memory store and the LLM call, then inspecting the messages array sent to the model before it responds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;5 passed in 10.56s ✓
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I Found
&lt;/h2&gt;

&lt;p&gt;When I examined how VectorAI DB was actually being used in the implementation, the key finding was this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The corpus is built dynamically from the agent's own past conversations, not from a pre-loaded document index. The agent is the author of its own knowledge base, which accumulates at runtime.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the thing that makes this memory rather than retrieval. It's a small shift in how you think about what a vector DB is for: not a document store you query at inference time, but a persistent layer that grows with the agent over time, and now one that forgets appropriately too.&lt;/p&gt;

&lt;p&gt;The agent works. Cross-session recall is functioning, decay is verified, the stack is fully local.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Testing retrieval quality as the memory grows over longer periods&lt;/li&gt;
&lt;li&gt;Exploring what other use cases this pattern unlocks beyond conversation memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Find the repo &lt;a href="https://github.com/gerimate/vectorai-db-agent-memory" rel="noopener noreferrer"&gt;here&lt;/a&gt;. If you're working on anything in this space — agentic memory, local-first AI stacks, or just fighting with MCP setup — I'd love to hear what you're seeing in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>vectordatabase</category>
      <category>llm</category>
    </item>
    <item>
      <title>We're running our first hackathon: Build with VectorAI DB, win Claude subscriptions</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Thu, 09 Apr 2026 09:39:39 +0000</pubDate>
      <link>https://dev.to/gerimate/were-running-our-first-hackathon-build-with-vectorai-db-win-claude-subscriptions-2f0c</link>
      <guid>https://dev.to/gerimate/were-running-our-first-hackathon-build-with-vectorai-db-win-claude-subscriptions-2f0c</guid>
      <description>&lt;p&gt;The Actian VectorAI DB Build Challenge is our first community hackathon, and we want to see what you build. Solo or team, beginner or experienced, local or cloud. If you've been looking for a reason to actually ship something with a vector database, this is it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;April 13-18, 2026 | Virtual | &lt;a href="https://dorahacks.io/hackathon/2097/detail" rel="noopener noreferrer"&gt;Register on DoraHacks&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  What you're building
&lt;/h3&gt;

&lt;p&gt;An AI application that solves a real, tangible problem using Actian VectorAI DB. It can run on your laptop, on a server, in the cloud, wherever. The only rule: VectorAI DB has to be a core part of your stack, not something you bolted on at the end.&lt;/p&gt;

&lt;p&gt;Your project also needs to go beyond basic similarity search. Pick at least one of these:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid Fusion&lt;/strong&gt; - combine multiple search signals into one ranked result. Not just meaning, not just keywords. Both, fused together.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What that looks like in practice:&lt;/em&gt; A job board that ranks candidates by semantic fit ("backend engineer who gets distributed systems") AND keyword match ("Golang, Kubernetes") merged into one list using RRF or DBSF.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filtered Search&lt;/strong&gt; - pair vector search with structured filters on your data so results are actually useful, not just semantically close.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What that looks like in practice:&lt;/em&gt; A campus event finder that understands what you're looking for but also filters by date, location, and student org. So you're finding events you can go to, not just events that sound similar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Named Vectors / Multimodal&lt;/strong&gt; - store and search across different data types in the same collection. Text, images, audio, whatever fits your idea.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What that looks like in practice:&lt;/em&gt; A study tool where you search your notes by typing a question or uploading a diagram. Both hit the same knowledge base, just through different vector spaces.&lt;/p&gt;

&lt;p&gt;Bonus points for running locally, on ARM, or offline. No fixed weight, judges' call.&lt;/p&gt;




&lt;h3&gt;
  
  
  Not sure what to build?
&lt;/h3&gt;

&lt;p&gt;Some starting points, but don't let these limit you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A RAG app over any dataset you actually care about (research papers, course notes, documentation, news)&lt;/li&gt;
&lt;li&gt;A semantic search tool with smart filters (campus events, job listings, study materials)&lt;/li&gt;
&lt;li&gt;A recommendation engine that combines meaning and metadata&lt;/li&gt;
&lt;li&gt;An anomaly detection or monitoring system&lt;/li&gt;
&lt;li&gt;An AI agent with vector-powered memory&lt;/li&gt;
&lt;li&gt;A multimodal search tool across text and images&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Getting started
&lt;/h3&gt;

&lt;p&gt;The database runs in Docker and works natively on Mac (including Apple Silicon), Linux, and Windows. No Rosetta, no platform flags needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo and start the database&lt;/span&gt;
docker compose up

&lt;span class="c"&gt;# Install the Python client&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;actian-vectorai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not sure where to begin? Start with the featured RAG example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; examples/rag/requirements.txt
python examples/rag/rag_example.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It walks you through building a complete retrieval-augmented generation app from scratch. You'll have something running in under 10 minutes.&lt;/p&gt;

&lt;p&gt;VectorAI DB handles storage and search. You bring your own embedding model. A good default to start with is &lt;code&gt;sentence-transformers/all-MiniLM-L6-v2&lt;/code&gt;, fast, lightweight, and works well for most text use cases.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;sentence-transformers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the full API docs and more examples, check the repo README linked in Discord.&lt;/p&gt;




&lt;h3&gt;
  
  
  Prizes
&lt;/h3&gt;

&lt;p&gt;🥇 1st place team: Claude Max 5x, 3 months per person&lt;/p&gt;

&lt;p&gt;🥈 2nd place team: Claude Max 5x, 1 month per person&lt;/p&gt;

&lt;p&gt;🥉 3rd place team: Claude Pro, 1 month per person&lt;/p&gt;

&lt;p&gt;Teams of up to 4. Solo submissions welcome.&lt;/p&gt;




&lt;h3&gt;
  
  
  How we judge
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use of Actian VectorAI DB (30%):&lt;/strong&gt; Is VectorAI DB doing real work in this app? Does the team know why they used it the way they did?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-world impact (25%):&lt;/strong&gt; Does it solve something people actually care about? Would someone use this?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical execution (25%):&lt;/strong&gt; Does it work? Is the code coherent and the architecture thought through?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demo and presentation (20%):&lt;/strong&gt; Can you explain what you built and why it matters?&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  How to submit
&lt;/h3&gt;

&lt;p&gt;All submissions go through DoraHacks. You'll need a public GitHub or GitLab repo with a README, a working demo (video, Loom, or live link), and a short write-up covering what you built, why, and which technical requirement you used.&lt;/p&gt;

&lt;p&gt;Results announced April 20 on Discord.&lt;/p&gt;




&lt;h3&gt;
  
  
  Join us
&lt;/h3&gt;

&lt;p&gt;Register: &lt;a href="https://dorahacks.io/hackathon/2097/detail" rel="noopener noreferrer"&gt;dorahacks.io/hackathon/2097/detail&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Discord for support, team formation, and progress sharing: &lt;a href="https://discord.gg/432A2M63Py" rel="noopener noreferrer"&gt;discord.gg/432A2M63Py&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drop a comment if you're in. See you April 13.&lt;/p&gt;

</description>
      <category>hackathon</category>
      <category>vectordatabase</category>
      <category>database</category>
      <category>ai</category>
    </item>
    <item>
      <title>Building Your First AI Agent Without Frameworks</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Fri, 13 Jun 2025 10:50:56 +0000</pubDate>
      <link>https://dev.to/gerimate/building-your-first-ai-agent-without-frameworks-l5p</link>
      <guid>https://dev.to/gerimate/building-your-first-ai-agent-without-frameworks-l5p</guid>
      <description>&lt;p&gt;&lt;strong&gt;Want to understand how AI agents actually work? Let's build one from scratch before jumping into frameworks.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Most AI agent tutorials start with &lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; or &lt;a href="https://www.crewai.com/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt;, which are great tools, but they can make it hard to understand what's happening underneath. &lt;/p&gt;

&lt;p&gt;An agent is really just a language model that can call functions. Once you understand that, frameworks make way more sense.&lt;/p&gt;

&lt;p&gt;Today we're building a customer support system using &lt;a href="https://platform.openai.com/docs/api-reference" rel="noopener noreferrer"&gt;OpenAI's API&lt;/a&gt; and Python. This will give you the fundamentals that make any agent framework easier to use and debug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What we're building:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A routing system that decides which "specialist" handles each query&lt;/li&gt;
&lt;li&gt;Function-calling agents that can search FAQs and analyze sentiment
&lt;/li&gt;
&lt;li&gt;Simple state management to track conversations&lt;/li&gt;
&lt;li&gt;Logic to escalate to humans when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you'll understand how agents work under the hood, making you much more effective when you do use frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  An Agent is Just an LLM with Tools
&lt;/h2&gt;

&lt;p&gt;Seriously, that's all there is to it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Language model&lt;/strong&gt; with a specific job&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functions&lt;/strong&gt; it can call &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logic&lt;/strong&gt; to decide when to use them&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else is just orchestration.&lt;/p&gt;

&lt;p&gt;Let's start with the simplest possible agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="c1"&gt;# Set up OpenAI (get your API key from https://platform.openai.com/api-keys)
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SimpleAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;callable&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Create tool descriptions for the model
&lt;/span&gt;        &lt;span class="n"&gt;tool_descriptions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;tool_descriptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__doc__&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Function &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The input query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                        &lt;span class="p"&gt;},&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Call OpenAI with function calling
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_descriptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Let the model decide when to use tools
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Handle function calls
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;function_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
            &lt;span class="n"&gt;arguments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# Execute the function
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;function_name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Regular response if no function call
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Test it out
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_faq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the FAQ database for answers&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;faqs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shipping&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Standard shipping takes 3-5 business days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refunds processed within 5-7 business days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;return&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Returns accepted within 30 days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;faqs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No FAQ found for that topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Create an FAQ agent
&lt;/span&gt;&lt;span class="n"&gt;faq_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FAQ Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a helpful FAQ assistant. Use the search_faq function to find answers to customer questions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_faq&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Test it
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;faq_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How long does shipping take?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# FAQ Assistant: Standard shipping takes 3-5 business days
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Done.&lt;/strong&gt; You just built an AI agent. It understands questions, knows when to use its tool, and gives helpful answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding More Specialists
&lt;/h2&gt;

&lt;p&gt;Now let's add agents that handle different stuff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_sentiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Analyze the emotional tone of customer messages&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Simple keyword approach - you could use [Transformers](https://huggingface.co/docs/transformers/index) for a real sentiment model
&lt;/span&gt;    &lt;span class="n"&gt;negative_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;angry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frustrated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;terrible&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;awful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;urgent_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;immediately&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;asap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;emergency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;urgent_words&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;URGENT: Customer needs immediate attention&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;negative_words&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEGATIVE: Customer is frustrated, handle with care&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NEUTRAL: Standard response appropriate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_escalation_needed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Determine if human escalation is needed&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;escalation_triggers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speak to manager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancel account&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;legal action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complaint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lawsuit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;terrible service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trigger&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;trigger&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;escalation_triggers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ESCALATE: Route to human agent immediately&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONTINUE: AI agent can handle this query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Create specialized agents
&lt;/span&gt;&lt;span class="n"&gt;sentiment_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sentiment Analyzer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You analyze customer emotions. Use analyze_sentiment to understand how the customer is feeling.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;analyze_sentiment&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;escalation_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SimpleAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Escalation Manager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You decide when customers need human help. Use check_escalation_needed to evaluate queries.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;check_escalation_needed&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Router: Deciding Who Handles What
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting - we need something to decide which agent handles each message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentRouter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;faq_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sentiment_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;escalation_agent&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Decide which agent should handle this query&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Save the conversation
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Basic routing - you could make this way smarter
&lt;/span&gt;        &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Check for escalation triggers first
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;manager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complaint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lawsuit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="c1"&gt;# Check for emotional language
&lt;/span&gt;        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query_lower&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;angry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frustrated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;urgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;terrible&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
            &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="c1"&gt;# Default to FAQ for standard questions
&lt;/span&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Get response from the right agent
&lt;/span&gt;        &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Save that too
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Routed to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_conversation_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get a summary of the conversation so far&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No conversation yet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Conversation with &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; exchanges:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:]):&lt;/span&gt;  &lt;span class="c1"&gt;# Last 2 exchanges
&lt;/span&gt;            &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;

&lt;span class="c1"&gt;# Test the complete system
&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Customer Support Agent System ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Test different types of queries
&lt;/span&gt;&lt;span class="n"&gt;test_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How long does shipping take?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m really frustrated with this terrible service!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I want to speak to your manager right now!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s your return policy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Conversation Summary:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_conversation_summary&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Making It Smarter: Let the AI Do the Routing
&lt;/h2&gt;

&lt;p&gt;Keyword matching works, but we can do better. Let's use the LLM itself to make routing decisions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SmartRouter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;faq_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sentiment_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;escalation_agent&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Use AI to decide which agent should handle the query&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;routing_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re routing customer queries to specialists.

        Options:
        - faq: Standard questions about policies, shipping, returns
        - sentiment: Upset or frustrated customers  
        - escalation: Complex complaints or requests for managers

        Customer: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

        Which specialist? Just answer: faq, sentiment, or escalation&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;routing_prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Default to FAQ if something weird happens
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Get response from chosen agent
&lt;/span&gt;        &lt;span class="n"&gt;agent_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;agent_choice&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Smart routed to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_choice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Test smart routing
&lt;/span&gt;&lt;span class="n"&gt;smart_router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SmartRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Smart Routing Test ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;smart_test_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My package is late and I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m getting married tomorrow!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Do you accept international credit cards?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is absolutely ridiculous, I want my money back immediately!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can I return something I bought 3 weeks ago?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;smart_test_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;smart_router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;smart_route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Adding Memory: Making Conversations Actually Work
&lt;/h2&gt;

&lt;p&gt;Real support conversations build on what happened before. Here's how to add memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryAwareRouter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;faq_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sentiment_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;escalation_agent&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resolved_issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_with_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Process query with full conversation context&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="c1"&gt;# Save current message
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;now&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="c1"&gt;# Build context summary
&lt;/span&gt;        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;routing_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Previous conversation context:
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Current message: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;

        Which specialist should handle this?
        - faq: Standard questions
        - sentiment: Emotional customers
        - escalation: Complex issues or if already escalated

        Consider the conversation history. Answer: faq, sentiment, or escalation&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;routing_prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Update customer context based on routing
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="c1"&gt;# Get enhanced response with context
&lt;/span&gt;        &lt;span class="n"&gt;agent_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_contextual_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_choice&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Add to memory
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent_choice&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Contextual routing to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_choice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_build_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Build conversation context summary&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New conversation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Conversation history: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; messages&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer escalated: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Negative sentiment detected: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sentiment_history&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; times&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="c1"&gt;# Include last few exchanges
&lt;/span&gt;        &lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_memory&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;recent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_contextual_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get response with conversation context&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Add context to the agent's response
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;agent_name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Customer previously escalated] &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Customer has been frustrated multiple times] &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;prefix&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="c1"&gt;# Test memory-aware system
&lt;/span&gt;&lt;span class="n"&gt;memory_router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryAwareRouter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Memory-Aware Conversation ===&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;conversation_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s your return policy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;That&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s not good enough, I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m really frustrated!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I want to speak to someone who can actually help me!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fine, what information do you need for the return?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;conversation_flow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_with_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What You Actually Built
&lt;/h2&gt;

&lt;p&gt;You just created a complete customer support system using basic Python and OpenAI. Here's what you learned:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fundamentals:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Agents = LLM + functions + routing logic&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Function calling&lt;/strong&gt; lets agents take actions&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Smart routing&lt;/strong&gt; decides who handles what&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;State management&lt;/strong&gt; keeps conversations coherent&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Memory&lt;/strong&gt; makes agents context-aware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You'll understand what frameworks actually do for you&lt;/li&gt;
&lt;li&gt;Easier to debug when things go wrong&lt;/li&gt;
&lt;li&gt;You can customize behavior exactly how you want&lt;/li&gt;
&lt;li&gt;Works with any LLM provider&lt;/li&gt;
&lt;li&gt;Good foundation before learning frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Making It Production Ready
&lt;/h2&gt;

&lt;p&gt;To actually deploy this, you'd need:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The basics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Error handling (APIs fail)&lt;/li&gt;
&lt;li&gt;Database for conversation storage&lt;/li&gt;
&lt;li&gt;Rate limiting (prevent abuse)&lt;/li&gt;
&lt;li&gt;Proper logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The nice-to-haves:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real sentiment analysis model&lt;/li&gt;
&lt;li&gt;Integration with your FAQ database&lt;/li&gt;
&lt;li&gt;Actual escalation to humans (&lt;a href="https://api.slack.com/" rel="noopener noreferrer"&gt;Slack API&lt;/a&gt;, email, etc.)&lt;/li&gt;
&lt;li&gt;Analytics on what's working&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When frameworks make sense:&lt;/strong&gt;&lt;br&gt;
Now you understand what &lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt;, &lt;a href="https://www.crewai.com/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt;, and &lt;a href="https://microsoft.github.io/autogen/" rel="noopener noreferrer"&gt;AutoGen&lt;/a&gt; do - they handle the routing and orchestration you just built manually. They're great when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need complex multi-step workflows&lt;/li&gt;
&lt;li&gt;You want pre-built integrations and tools&lt;/li&gt;
&lt;li&gt;You're working on a team that benefits from standardized patterns&lt;/li&gt;
&lt;li&gt;You need features like human-in-the-loop or advanced state management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key is knowing when the abstraction helps versus when you need more control.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Lesson
&lt;/h2&gt;

&lt;p&gt;AI agents are organized LLMs with specific jobs and the ability to call functions. The "multi-agent" part is smart routing and state management.&lt;/p&gt;

&lt;p&gt;Understanding these fundamentals makes you better at using any framework because you know what's happening underneath. Start here, then use frameworks when their features solve real problems you're facing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built something cool with this? I'd love to see what you made - drop it in the comments!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>python</category>
    </item>
    <item>
      <title>How to Prevent AI Agents From Breaking in Production</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Fri, 06 Jun 2025 12:21:12 +0000</pubDate>
      <link>https://dev.to/gerimate/how-to-prevent-ai-agents-from-breaking-in-production-24c3</link>
      <guid>https://dev.to/gerimate/how-to-prevent-ai-agents-from-breaking-in-production-24c3</guid>
      <description>&lt;p&gt;Deploying AI agents in production is trickier than most teams expect. What works perfectly in development often becomes a reliability nightmare once real traffic hits.&lt;/p&gt;

&lt;p&gt;After looking at incident reports, some clear patterns emerge. The same few issues keep causing the majority of production failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://hugobowne.substack.com/p/why-ai-agents-fail-in-productionand" rel="noopener noreferrer"&gt;42% of AI agent failures come from hallucinated API calls&lt;/a&gt;&lt;/strong&gt;, and another &lt;strong&gt;&lt;a href="https://www.bankinfosecurity.com/popular-gpus-used-ai-systems-vulnerable-to-memory-leak-flaw-a-24135" rel="noopener noreferrer"&gt;23% are GPU memory leaks&lt;/a&gt;&lt;/strong&gt;. These aren't edge cases - they're systematic problems that need systematic solutions.&lt;/p&gt;

&lt;p&gt;Here's what's actually breaking and how to prevent it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common failure patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hallucinated API calls
&lt;/h3&gt;

&lt;p&gt;LLMs generate code that looks correct but calls non-existent methods or deprecated endpoints. Traditional validation tools miss this because the code is syntactically valid - it just references APIs that don't exist in your environment.&lt;/p&gt;

&lt;p&gt;Teams often spend significant time debugging what appears to be infrastructure issues when the root cause is the AI making incorrect assumptions about available APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  GPU memory leaks
&lt;/h3&gt;

&lt;p&gt;A &lt;a href="https://www.bankinfosecurity.com/popular-gpus-used-ai-systems-vulnerable-to-memory-leak-flaw-a-24135" rel="noopener noreferrer"&gt;known vulnerability in AMD, Apple, and Qualcomm GPUs&lt;/a&gt; can cause AI workloads to leak over 180MB per inference cycle. In Kubernetes environments, this can cascade across pods and eventually crash entire nodes.&lt;/p&gt;

&lt;p&gt;Standard monitoring often doesn't catch this until resource exhaustion is already occurring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cascading failures
&lt;/h3&gt;

&lt;p&gt;AI agents are more interconnected than typical microservices. A single malformed operation can stall agent threads for extended periods, and recovery processes often reset accumulated context, leading to broader system failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Insufficient observability
&lt;/h3&gt;

&lt;p&gt;Most teams monitor traditional infrastructure metrics but lack visibility into AI-specific behavior like GPU utilization patterns, token consumption, and model performance degradation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Constrain API generation
&lt;/h3&gt;

&lt;p&gt;Instead of relying on post-generation validation, limit what the LLM can suggest in the first place by providing explicit API context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract what's actually available
&lt;/span&gt;&lt;span class="n"&gt;global_deps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_imports&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;codebase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;local_deps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_function_calls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_module&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Tell the LLM what it can actually use
&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Available APIs: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;global_deps&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Local functions: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;local_deps&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Task: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Teams using dependency-constrained prompting report fewer API hallucinations. The approach is straightforward: if you don't tell the LLM about APIs that don't exist, it's less likely to invent them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implement GPU resource controls
&lt;/h3&gt;

&lt;p&gt;Set explicit resource limits in your container orchestration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4Gi"&lt;/span&gt;
  &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4Gi"&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Monitor GPU memory usage and restart containers before they crash:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nv"&gt;vram_usage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;nvidia-smi &lt;span class="nt"&gt;--query-gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;memory.used &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;csv,noheader,nounits&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$vram_usage&lt;/span&gt; &lt;span class="nt"&gt;-gt&lt;/span&gt; 7500 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;  &lt;span class="c"&gt;# 90% of 8GB&lt;/span&gt;
    kubectl rollout restart deployment/ai-agent
  &lt;span class="k"&gt;fi
  &lt;/span&gt;&lt;span class="nb"&gt;sleep &lt;/span&gt;30
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This type of proactive monitoring has reduced OOM crashes in production environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Version AI components as units
&lt;/h3&gt;

&lt;p&gt;AI agents consist of multiple interdependent components: models, vector databases, prompt templates, and configuration. These should be &lt;a href="https://www.dbos.dev/blog/durable-execution-crashproof-ai-agents" rel="noopener noreferrer"&gt;versioned and deployed together&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ai-agent-chart/Chart.yaml&lt;/span&gt;
&lt;span class="na"&gt;dependencies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-model&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.2.3"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vector-db&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.9.1"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prompt-templates&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2.1.0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploying the entire bundle as a unit prevents version mismatches that can cause subtle but significant failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add AI-specific monitoring
&lt;/h3&gt;

&lt;p&gt;Traditional APM tools don't capture AI-specific metrics. You need to track GPU utilization, token consumption, and model performance alongside business outcomes. &lt;a href="https://latitude-blog.ghost.io/blog/best-practices-for-llm-observability-in-cicd/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt; provides a good foundation for this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ai_inference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ai_inference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt.length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user.id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response.length&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inference.duration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tokens.consumed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;count_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correlating these metrics with infrastructure data helps identify when GPU pressure affects response quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build resilient fallback systems
&lt;/h3&gt;

&lt;p&gt;Implement &lt;a href="https://botpress.com/blog/ai-agent-routing" rel="noopener noreferrer"&gt;circuit breakers&lt;/a&gt; for external API calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tenacity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stop_after_attempt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wait_exponential&lt;/span&gt;

&lt;span class="nd"&gt;@retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;stop_after_attempt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;wait_exponential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;multiplier&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_external_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Have a clear escalation path when AI components fail:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ai_with_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ai_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;AIAgentError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rule_based_handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;escalate_to_human&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request escalated to support team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Making AI agents production-ready
&lt;/h2&gt;

&lt;p&gt;AI agents in production require the same operational discipline as any other critical system. The difference is that they have unique failure modes that traditional monitoring and deployment practices don't address.&lt;/p&gt;

&lt;p&gt;Teams that succeed treat AI agents as complex distributed systems with proper observability, resource management, and graceful degradation. The ones that struggle try to deploy them like traditional applications.&lt;/p&gt;

&lt;p&gt;The good news is that once you address these systematic issues, AI agents become much more predictable and reliable in production environments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Deploy AI Agents Without Infrastructure Headaches</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Fri, 30 May 2025 11:17:40 +0000</pubDate>
      <link>https://dev.to/gerimate/deploy-ai-agents-without-infrastructure-headaches-4230</link>
      <guid>https://dev.to/gerimate/deploy-ai-agents-without-infrastructure-headaches-4230</guid>
      <description>&lt;p&gt;Platform engineers have a new nightmare: explaining to their CTO why the AI agent deployment that worked perfectly in staging is now burning through $50,000/month in production. The Terraform config looks flawless. The security groups are properly configured. The ECS tasks are healthy. But somehow, the vector database is choking on embeddings, the LLM gateway is routing traffic to the wrong regions, and the workflow orchestration is stuck in an infinite retry loop.&lt;/p&gt;

&lt;p&gt;Traditional IaC tools weren't built for this complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Traditional IaC Can't Handle AI Workloads
&lt;/h2&gt;

&lt;p&gt;When ChatGPT generates your Terraform config, it looks perfect. But deploy it and everything breaks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This looks right but will fail in production&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"ai_agent"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ai-agent-sg"&lt;/span&gt;

  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0/0"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# ❌ Too permissive&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_ecs_service"&lt;/span&gt; &lt;span class="s2"&gt;"ai_agent"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ai-agent"&lt;/span&gt;
  &lt;span class="nx"&gt;cluster&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ecs_cluster&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;task_definition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_ecs_task_definition&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;

  &lt;span class="c1"&gt;# ❌ Missing: vector DB networking, LLM provider configs, &lt;/span&gt;
  &lt;span class="c1"&gt;# retry policies, cost controls, monitoring...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LLMs generating IaC are trained on public examples, not production systems. They miss vector database networking, multi-provider LLM failover, and other complexities that break under real traffic.&lt;/p&gt;

&lt;p&gt;AI agents need &lt;a href="https://www.madrona.com/ai-agent-infrastructure-three-layers-tools-data-orchestration/" rel="noopener noreferrer"&gt;completely different infrastructure&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Traditional Layer:         AI-Specific Layer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Compute (ECS/Lambda)     - Vector Database (Pinecone/Weaviate)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Storage (S3/EBS)         - LLM Gateway (Multi-provider routing)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Database (RDS)           - Workflow Orchestration (Temporal/Prefect)&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Networking (VPC/ALB)     - Model Serving &amp;amp; State Management&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each has its own failure modes and scaling patterns that traditional IaC treats as generic cloud resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pulumi for AI Infrastructure
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.pulumi.com/solutions/ai/" rel="noopener noreferrer"&gt;Pulumi has native AI providers&lt;/a&gt; that treat vector databases and LLM gateways as real infrastructure. The trade-off? Your team needs to learn TypeScript/Python instead of HCL, and you're betting on a smaller ecosystem than Terraform's.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alternative approaches:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom Terraform providers&lt;/strong&gt; - Build your own for AI services (more work, but stays in Terraform)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terraform + scripts&lt;/strong&gt; - Use Terraform for basic infra, scripts for AI-specific parts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS CDK&lt;/strong&gt; - Good if you're AWS-only
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;pinecone&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@pulumi/pinecone&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;temporal&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@pulumi/temporal&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Native vector database support&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vectorIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;pinecone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;knowledge-base&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer-support-kb&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cosine&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;dimension&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;serverless&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aws&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;us-east-1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Workflow orchestration as code&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiWorkflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;temporal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Namespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai-workflows&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer-support&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;retention&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;7d&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Temporal Handles Complex AI Workflows
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://temporal.io/blog/nine-ways-to-use-temporal-in-your-ai-workflows" rel="noopener noreferrer"&gt;Temporal manages the orchestration&lt;/a&gt; that AI agents need. Downsides: another system to operate, and your team needs to learn workflow concepts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alternatives:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prefect&lt;/strong&gt; - Similar to Temporal but more Python-native&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step Functions&lt;/strong&gt; - AWS-native, simpler but less powerful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes Jobs&lt;/strong&gt; - If you want to stay close to K8s
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@workflow.defn&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CustomerSupportAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nd"&gt;@workflow.run&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Survives infrastructure failures
&lt;/span&gt;        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_activity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;search_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;start_to_close_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Automatic retries with backoff
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_activity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;call_llm_with_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;retry_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;RetryPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maximum_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Long-running workflows (hours/days/weeks)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;needs_human_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_condition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;search_attributes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostOptimizedAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pulumi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ComponentResource&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Spot instances for training
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;training_cluster&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Cluster&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-training&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;capacity_providers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FARGATE_SPOT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Reserved capacity for production
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inference_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ecs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-inference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;desired_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_optimal_capacity&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Security and Operational Considerations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;API Key Management:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use AWS Secrets Manager or Azure Key Vault for LLM API keys&lt;/li&gt;
&lt;li&gt;Rotate keys automatically (most AI providers support this)&lt;/li&gt;
&lt;li&gt;Never put API keys in your IaC code - use secret references&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rollback Strategy:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI infrastructure changes can break in subtle ways&lt;/li&gt;
&lt;li&gt;Always test rollbacks in staging first&lt;/li&gt;
&lt;li&gt;Keep vector database backups before schema changes&lt;/li&gt;
&lt;li&gt;Use blue-green deployments for model updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Team Training:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Budget 2-4 weeks for engineers to learn Pulumi + Temporal&lt;/li&gt;
&lt;li&gt;Start with one person, then spread knowledge&lt;/li&gt;
&lt;li&gt;Document your AI infrastructure patterns for the team&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Monitoring That Actually Matters
&lt;/h2&gt;

&lt;p&gt;Regular monitoring misses what's important for AI systems. &lt;a href="https://my.idc.com/getdoc.jsp?containerId=prUS52758624" rel="noopener noreferrer"&gt;AI infrastructure spending hits $223 billion by 2028&lt;/a&gt;, so you need proper observability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiMetrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dashboard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai-observability&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;dashboardBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;pulumi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;jsonStringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;widgets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;metric&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="na"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="c1"&gt;// Traditional metrics&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AWS/ECS&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CPUUtilization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AWS/ECS&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;MemoryUtilization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;

                    &lt;span class="c1"&gt;// AI-specific metrics that actually matter&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/VectorDB&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;QueryLatency&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/LLM&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;TokensPerSecond&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/LLM&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ResponseQuality&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/Workflow&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CompletionRate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI/Cost&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;DollarPerInteraction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI System Health&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Alert on cost spikes&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;costSpike&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;aws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MetricAlarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ai-cost-spike&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;comparisonOperator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GreaterThanThreshold&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metricName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;DollarPerInteraction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Alert if cost per interaction &amp;gt; $0.50&lt;/span&gt;
    &lt;span class="na"&gt;alarmDescription&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AI infrastructure costs spiking&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Teams Are Seeing
&lt;/h2&gt;

&lt;p&gt;People adopting AI-native infrastructure report significant improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10-100x lower costs&lt;/strong&gt; with &lt;a href="https://www.pulumi.com/blog/pinecone-serverless/" rel="noopener noreferrer"&gt;serverless vector databases&lt;/a&gt; vs. provisioned capacity&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.qwak.com/post/llm-cost" rel="noopener noreferrer"&gt;Self-hosted models can cost significantly less&lt;/a&gt; than API-based solutions for high-volume workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies using &lt;a href="https://temporal.io/blog/build-resilient-agentic-ai-with-temporal" rel="noopener noreferrer"&gt;Temporal for AI workflows&lt;/a&gt; report significantly reduced debugging time and improved reliability for long-running AI processes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start here:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check your AI costs&lt;/strong&gt; - How much are you spending compared to self-hosted options?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick one AI workflow&lt;/strong&gt; to rebuild as a test&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try &lt;a href="https://docs.pinecone.io/integrations/pulumi" rel="noopener noreferrer"&gt;Pulumi with Pinecone&lt;/a&gt;&lt;/strong&gt; - deploy a test vector database&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Next month:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Move critical AI workflows to Temporal&lt;/li&gt;
&lt;li&gt;Set up cost monitoring and alerts&lt;/li&gt;
&lt;li&gt;Add AI-specific observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Companies building reliable, cheap AI infrastructure stopped using traditional IaC tools. They switched to AI-native approaches that treat AI workloads properly.&lt;/p&gt;

&lt;p&gt;Your call: Keep fighting with Terraform and burning money, or use patterns that actually work.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>infrastructureascode</category>
      <category>terraform</category>
    </item>
    <item>
      <title>AI Deployment: Why Serverless is Perfect (and Terrible)</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Wed, 28 May 2025 10:40:29 +0000</pubDate>
      <link>https://dev.to/gerimate/ai-deployment-why-serverless-is-perfect-and-terrible-4phl</link>
      <guid>https://dev.to/gerimate/ai-deployment-why-serverless-is-perfect-and-terrible-4phl</guid>
      <description>&lt;p&gt;Your AI agent works perfectly in development. You've tested the reasoning chains, the tool integrations are solid, and the responses are exactly what users need. Then you deploy to production and everything breaks.&lt;/p&gt;

&lt;p&gt;The timeout kills your multi-step workflows after 15 minutes. Your bundle exceeds the 250MB limit because you need scikit-learn, pandas, and a vector database client. Cold starts take 6+ seconds while your models load, making real-time interactions impossible.&lt;/p&gt;

&lt;p&gt;Sound familiar? You're not alone. One developer working on an e-commerce recommendation engine discovered that "scikit-learn and pandas libraries increased the size of my deployment package beyond the AWS Lambda package limits." Another found their TensorFlow model loading caused API calls to timeout after 29 seconds.&lt;/p&gt;

&lt;p&gt;Here's the thing: serverless isn't broken for AI. You're just hitting the boundaries of what it was designed for. Traditional serverless platforms were built for quick, stateless web requests—not long-running AI agent workflows that need to maintain context, load large models, and perform complex reasoning chains.&lt;/p&gt;

&lt;p&gt;But before you abandon serverless entirely, understand this: for certain AI workloads, serverless is absolutely perfect. The question isn't whether to use serverless for AI—it's knowing when it works brilliantly and when it fails catastrophically.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Serverless Shines for AI Deployments
&lt;/h2&gt;

&lt;p&gt;Serverless excels in three specific AI scenarios that traditional infrastructure can't match.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unpredictable Traffic Patterns&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI applications often experience extreme traffic variability. Your chatbot gets mentioned in a tweet and suddenly handles 1000x normal load. A content generation API processes 10 requests per hour during quiet periods, then 1000 requests during marketing campaigns.&lt;/p&gt;

&lt;p&gt;Serverless platforms automatically scale from zero to thousands of concurrent executions without configuration. AWS Lambda provides &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html" rel="noopener noreferrer"&gt;1,000 concurrent executions by default&lt;/a&gt;, scaling instantly based on demand. You pay only for actual compute time—not idle servers waiting for the next AI inference request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Event-Driven AI Processing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many AI workflows fit perfectly into event-driven patterns. Document uploaded → extract text → summarize content. New customer signup → analyze preferences → generate personalized recommendations. Code commit → run AI code review → post feedback.&lt;/p&gt;

&lt;p&gt;These discrete, triggered operations align with serverless strengths. Each event spawns an independent function execution that processes the task and terminates. No need to manage background services or polling mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple Inference Tasks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lightweight AI operations—sentiment analysis, text classification, simple embeddings generation—work excellently in serverless environments. These tasks typically complete within seconds, use manageable dependencies, and don't require complex state management.&lt;/p&gt;

&lt;p&gt;A sentiment analysis API using a pre-trained model can process requests in under 100ms with warm starts, providing excellent user experience while benefiting from serverless cost efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Serverless Reality Check
&lt;/h2&gt;

&lt;p&gt;The problems start when your AI workloads bump against fundamental serverless constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timeout Limitations Kill Complex Workflows
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-timeout.html" rel="noopener noreferrer"&gt;AWS Lambda caps execution at &lt;strong&gt;15 minutes maximum&lt;/strong&gt;&lt;/a&gt;. &lt;a href="https://vercel.com/docs/functions/configuring-functions/duration" rel="noopener noreferrer"&gt;Vercel Functions limits vary by plan&lt;/a&gt;: &lt;strong&gt;60 seconds on Hobby, 300 seconds on Pro, 900 seconds on Enterprise&lt;/strong&gt;. &lt;a href="https://developers.cloudflare.com/workers/platform/limits/" rel="noopener noreferrer"&gt;Cloudflare Workers allows unlimited wall-clock time&lt;/a&gt; but restricts &lt;strong&gt;CPU time to 5 minutes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Multi-step AI agent workflows routinely exceed these limits. Consider a research agent that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Searches multiple data sources (2-3 minutes)&lt;/li&gt;
&lt;li&gt;Processes and analyzes findings (3-5 minutes)
&lt;/li&gt;
&lt;li&gt;Generates comprehensive report (5-8 minutes)&lt;/li&gt;
&lt;li&gt;Formats and delivers output (1-2 minutes)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total runtime: 11-18 minutes. This workflow will fail on most serverless platforms or hit timeout limits that kill execution before completion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world example&lt;/strong&gt;: AI agents performing "extract, transform, and load (ETL) jobs and content generation workflows such as creating PDF files or media transcoding require fast, scalable local storage to process large amounts of data quickly"—operations that frequently exceed serverless timeout constraints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bundle Size Problems Block AI Dependencies
&lt;/h3&gt;

&lt;p&gt;Traditional serverless deployments face &lt;a href="https://stackoverflow.com/questions/54632009/how-to-increase-the-maximum-size-of-the-aws-lambda-deployment-package-requesten" rel="noopener noreferrer"&gt;severe size restrictions&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Lambda ZIP packages&lt;/strong&gt;: 50MB compressed, 250MB uncompressed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel Functions&lt;/strong&gt;: 250MB uncompressed including layers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Workers&lt;/strong&gt;: 3MB free, 10MB paid plans&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Popular AI libraries routinely exceed these limits. Scikit-learn, pandas, numpy, and scipy together often surpass 250MB. Add a vector database client like Pinecone or Weaviate, plus an LLM SDK, and you're well beyond platform constraints.&lt;/p&gt;

&lt;p&gt;The introduction of &lt;a href="https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/" rel="noopener noreferrer"&gt;&lt;strong&gt;AWS Lambda container images&lt;/strong&gt;&lt;/a&gt; (up to 10GB) fundamentally changes this landscape, but requires more complex deployment processes and sacrifices some serverless simplicity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cold Start Performance Destroys User Experience
&lt;/h3&gt;

&lt;p&gt;AI workloads suffer dramatically from cold start penalties. &lt;a href="https://www.bounteous.com/insights/improving-your-lambda-coldstart-performance-with-aws-lambda-snapstart/" rel="noopener noreferrer"&gt;Research shows that &lt;strong&gt;99.9% of cold starts take up to 6.99 seconds&lt;/strong&gt;&lt;/a&gt; for Java-based AI applications, while warm starts complete in just 33 milliseconds.&lt;/p&gt;

&lt;p&gt;Loading TensorFlow models can cause initial API calls to timeout after 29 seconds during cold starts, though subsequent warm function calls process images in under one second. This unpredictable performance makes serverless unsuitable for real-time AI interactions where users expect immediate responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cold start penalty compounds with AI complexity&lt;/strong&gt;: larger models, more dependencies, and initialization-heavy frameworks all extend startup times beyond acceptable user experience thresholds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making Serverless Work: Practical Patterns
&lt;/h2&gt;

&lt;p&gt;You can work around serverless limitations with architectural patterns designed for AI workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Workflow Suspension and Resume
&lt;/h3&gt;

&lt;p&gt;Break long-running AI processes into discrete steps with state persistence between invocations. Each step saves progress to external storage, enabling the next function to continue from checkpoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 1: Initial Analysis&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;analyzeInput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;performAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Save state to Redis/DynamoDB&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; 
    &lt;span class="na"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;analysis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;nextStep&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;generate&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Trigger next step&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;triggerNextStep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Step 2: Content Generation  &lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generateContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;loadState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateFromAnalysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workflowId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;finalResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern enables unlimited workflow duration by staying within individual function timeout limits while maintaining progress state.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. External State Management
&lt;/h3&gt;

&lt;p&gt;AI agents require sophisticated state management beyond serverless stateless models. Externalize all persistent data to dedicated storage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Redis/ElastiCache&lt;/strong&gt;: Conversation context, short-term agent memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL/MongoDB&lt;/strong&gt;: Long-term user preferences, interaction history
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.pinecone.io/" rel="noopener noreferrer"&gt;Vector databases&lt;/a&gt;&lt;/strong&gt;: Embeddings storage for semantic search and RAG
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chatAgent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Load conversation context&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`chat:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Process with context&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Update conversation state&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`chat:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;lastActivity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Container-Based Deployment
&lt;/h3&gt;

&lt;p&gt;Use AWS Lambda container images to eliminate bundle size constraints. Include complete AI frameworks and pre-trained models within container deployments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; public.ecr.aws/lambda/python:3.9&lt;/span&gt;

&lt;span class="c"&gt;# Copy model files during build&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; models/ ${LAMBDA_TASK_ROOT}/models/&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; app.py ${LAMBDA_TASK_ROOT}&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["app.lambda_handler"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Container deployment enables 10GB packages while maintaining serverless operational benefits, though with increased deployment complexity.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Smart Cold Start Mitigation
&lt;/h3&gt;

&lt;p&gt;Implement strategies to minimize cold start impact:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Pre-warming&lt;/strong&gt;: Use scheduled functions to keep models loaded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Scheduled every 5 minutes&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;keepWarm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;modelExists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkModelAvailability&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;modelExists&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;downloadAndCacheModel&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;model ready&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Progressive Response&lt;/strong&gt;: Return immediate acknowledgment, then stream results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiInference&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Immediate response&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responseId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateId&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendInitialResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;responseId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Background processing with streaming updates&lt;/span&gt;
  &lt;span class="nf"&gt;processInBackground&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;responseId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;responseId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Platform-Specific Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS Lambda: Enterprise-Grade with Complexity Trade-offs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Longest timeouts (15 minutes), container support up to 10GB, mature ecosystem, &lt;a href="https://aws.amazon.com/lambda/provisioned-concurrency/" rel="noopener noreferrer"&gt;Provisioned Concurrency for predictable performance&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Complex AI workflows, enterprise deployments requiring compliance and integration with AWS services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;: Cold start performance, complex configuration for container deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vercel Functions: Developer Experience with Timeout Constraints
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Excellent developer experience, edge distribution, &lt;a href="https://vercel.com/guides/what-can-i-do-about-vercel-serverless-functions-timing-out" rel="noopener noreferrer"&gt;Fluid Compute for extended durations&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Simple AI APIs, content generation workflows, applications prioritizing deployment simplicity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;: Aggressive timeout limits (60 seconds on free tier), bundle size restrictions persist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloudflare Workers: Global Edge with Memory Constraints
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;: Global edge distribution, unlimited wall-clock time, &lt;a href="https://developers.cloudflare.com/changelog/2025-03-25-higher-cpu-limits/" rel="noopener noreferrer"&gt;recent CPU limit increases to 5 minutes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Real-time AI inference requiring global distribution, lightweight AI operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;: 128MB memory limit, 10MB maximum bundle size, V8 runtime restrictions.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to Use Serverless for AI
&lt;/h2&gt;

&lt;p&gt;Certain AI workloads fundamentally conflict with serverless constraints:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always-On AI Agents&lt;/strong&gt;: Customer service bots, monitoring systems, and agents requiring continuous availability benefit from dedicated infrastructure avoiding cold start penalties.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Heavy Model Inference&lt;/strong&gt;: Large language models requiring substantial memory (8GB+ RAM) or &lt;a href="https://aws.amazon.com/ec2/instance-types/p4/" rel="noopener noreferrer"&gt;specialized hardware (GPUs)&lt;/a&gt; exceed serverless platform capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complex Multi-Agent Systems&lt;/strong&gt;: Workflows requiring persistent communication between multiple AI agents, shared memory, or complex coordination patterns work better with traditional infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High-Volume Production Workloads&lt;/strong&gt;: Applications processing thousands of AI requests per minute may find dedicated infrastructure more cost-effective than per-invocation serverless pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid Architectures: Best of Both Worlds
&lt;/h2&gt;

&lt;p&gt;Most production AI systems benefit from hybrid approaches combining serverless and traditional infrastructure. &lt;a href="https://aws.amazon.com/step-functions/" rel="noopener noreferrer"&gt;AWS Step Functions&lt;/a&gt; provides excellent orchestration for these patterns:&lt;/p&gt;

&lt;h3&gt;
  
  
  Router Pattern
&lt;/h3&gt;

&lt;p&gt;Use serverless functions as intelligent routers directing requests to appropriate processing infrastructure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiRouter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;complexity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;analyzeRequestComplexity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;complexity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;simple&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processServerless&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;queueForContainerProcessing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hot/Cold Architecture
&lt;/h3&gt;

&lt;p&gt;Maintain always-on infrastructure for baseline load, serverless for traffic spikes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Containers handle predictable, consistent traffic&lt;/li&gt;
&lt;li&gt;Serverless functions scale for demand peaks&lt;/li&gt;
&lt;li&gt;Cost optimization through usage pattern matching&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Making the Right Choice for Your AI Deployment
&lt;/h2&gt;

&lt;p&gt;Use this decision framework when evaluating serverless for AI workloads:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Serverless When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Execution time consistently under 10 minutes&lt;/li&gt;
&lt;li&gt;Traffic patterns are unpredictable or bursty
&lt;/li&gt;
&lt;li&gt;Dependencies fit within platform bundle limits (or container deployment acceptable)&lt;/li&gt;
&lt;li&gt;Workflow can be broken into discrete steps&lt;/li&gt;
&lt;li&gt;Cold start latency is acceptable for use case&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Traditional Infrastructure When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workflows require 15+ minutes execution time&lt;/li&gt;
&lt;li&gt;Always-on availability is critical&lt;/li&gt;
&lt;li&gt;Memory requirements exceed 10GB&lt;/li&gt;
&lt;li&gt;Complex multi-agent coordination needed&lt;/li&gt;
&lt;li&gt;Consistent sub-second response times required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider Hybrid When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic patterns combine baseline and spike loads&lt;/li&gt;
&lt;li&gt;Some workflows fit serverless constraints, others don't&lt;/li&gt;
&lt;li&gt;Cost optimization across variable usage patterns is priority&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Serverless isn't universally perfect or terrible for AI deployment—it's contextual. Simple, discrete AI operations work excellently in serverless environments, providing cost efficiency and automatic scaling. Complex, long-running AI agent workflows require architectural adaptations or alternative infrastructure.&lt;/p&gt;

&lt;p&gt;The key is matching your specific AI workload characteristics to platform capabilities rather than forcing incompatible patterns. As serverless platforms continue evolving—container support, extended timeouts, &lt;a href="https://aws.amazon.com/blogs/compute/optimizing-cold-start-performance-of-aws-lambda-using-advanced-priming-strategies-with-snapstart/" rel="noopener noreferrer"&gt;better cold start performance&lt;/a&gt;—the viable use cases for serverless AI will expand.&lt;/p&gt;

&lt;p&gt;Start by auditing your current AI deployment challenges against serverless constraints. If timeout limits, bundle sizes, or cold start performance block your use case, consider hybrid architectures or traditional infrastructure. If your workflows fit serverless patterns, you'll benefit from simplified operations and automatic scaling.&lt;/p&gt;

&lt;p&gt;The serverless AI landscape changes rapidly. What's impossible today may be trivial next year. But right now, success depends on honest assessment of your requirements against current platform realities—not wishful thinking about what serverless should support.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>serverless</category>
      <category>devops</category>
    </item>
    <item>
      <title>5 Developer Pain Points Solved by Internal Developer Platforms</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Fri, 16 May 2025 12:03:56 +0000</pubDate>
      <link>https://dev.to/gerimate/5-developer-pain-points-solved-by-internal-developer-platforms-1bd6</link>
      <guid>https://dev.to/gerimate/5-developer-pain-points-solved-by-internal-developer-platforms-1bd6</guid>
      <description>&lt;p&gt;Ever feel like you spend more time wrestling with tools than actually building stuff? You're not alone.&lt;/p&gt;

&lt;p&gt;According to &lt;a href="https://about.gitlab.com/the-source/platform/devops-teams-want-to-shake-off-diy-toolchains-a-platform-is-the-answer/" rel="noopener noreferrer"&gt;GitLab's research&lt;/a&gt;, developers waste up to 75% of their time just maintaining toolchains rather than coding. Even worse, over 78% of DevOps professionals report wasting between 25-100% of their time keeping their toolchain running.&lt;/p&gt;

&lt;p&gt;Traditional development is like being handed a giant bin of unsorted LEGO bricks and told to build a castle. You spend most of your time digging through the pile looking for the right pieces, and everyone builds differently.&lt;/p&gt;

&lt;p&gt;Platform engineering is like getting those official LEGO kits with sorted pieces, clear instructions, and modular components. You still have creative freedom, but you're not wasting hours hunting for that one specific brick or reinventing foundations that have already been perfected.&lt;/p&gt;

&lt;p&gt;I've spent years documenting developer workflows and watching teams struggle with the same problems over and over. Let's look at five major pain points and how Internal Developer Platforms (IDPs) actually solve them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's an Internal Developer Platform anyway?
&lt;/h2&gt;

&lt;p&gt;Before diving in, a quick definition: an IDP is a &lt;a href="https://spacelift.io/blog/what-is-an-internal-developer-platform" rel="noopener noreferrer"&gt;self-service layer&lt;/a&gt; that sits on top of your infrastructure and tools, abstracting away complexity so developers can focus on building rather than configuring. Think of it as a unified interface for your entire development lifecycle.&lt;/p&gt;

&lt;p&gt;No more jumping between 10+ tools just to deploy a simple feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pain Point #1: Deployment Bottlenecks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;How long does it take your team to get code from commit to production? For most teams, it's days or weeks. &lt;a href="https://shipyard.build/blog/improve-dora-change-lead-time/" rel="noopener noreferrer"&gt;Elite teams deploy in under a day&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The bottleneck isn't usually the code—it's the deployment process itself. When deployments require specialized knowledge or manual steps, everything slows down. If the one person who knows how to deploy is on vacation, you're stuck.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;IDPs provide self-service templates for deployments. Instead of developers needing to understand the underlying infrastructure, they get standardized workflows with the right guardrails.&lt;/p&gt;

&lt;p&gt;With a platform approach, your team can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy without waiting for DevOps/platform teams&lt;/li&gt;
&lt;li&gt;Use templates that enforce best practices&lt;/li&gt;
&lt;li&gt;Automate the entire CI/CD pipeline&lt;/li&gt;
&lt;li&gt;Deploy with a single click or command&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;You don't need a huge budget to implement this. Start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/features/actions" rel="noopener noreferrer"&gt;GitHub Actions&lt;/a&gt; or GitLab CI for automated pipelines&lt;/li&gt;
&lt;li&gt;Docker (used by &lt;a href="https://survey.stackoverflow.co/2024/technology" rel="noopener noreferrer"&gt;59% of professional developers&lt;/a&gt;) for consistent environments&lt;/li&gt;
&lt;li&gt;Standardized deployment scripts checked into your repo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set up templates for your most common deployment types and build from there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pain Point #2: Context Switching Costs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Each interruption costs developers &lt;a href="https://axolo.co/blog/p/cost-context-switching-developer-workflow" rel="noopener noreferrer"&gt;20+ minutes to regain focus&lt;/a&gt;. When developers have to switch between different tasks, tools, and contexts, productivity tanks.&lt;/p&gt;

&lt;p&gt;The math is brutal: for a team of 10 engineers losing 10 minutes per context switch at $72/hour, that's $120 lost per build. With 50 builds per day and 22 working days, you're burning $132,000 monthly in lost productivity.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.cortex.io/report/the-2024-state-of-developer-productivity" rel="noopener noreferrer"&gt;2024 State of Developer Productivity report&lt;/a&gt; found "time spent gathering project context" tied for the biggest productivity leak (26%).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;Platform engineering attacks this by creating unified interfaces and standardized workflows. Instead of switching between CI/CD tools, cloud consoles, monitoring dashboards, and ticketing systems, developers get a single interface.&lt;/p&gt;

&lt;p&gt;Implementing an IDP gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One portal for accessing all development resources&lt;/li&gt;
&lt;li&gt;Integrated workflows that reduce tool-switching&lt;/li&gt;
&lt;li&gt;Standardized processes that become muscle memory&lt;/li&gt;
&lt;li&gt;Fewer interruptions due to missing context&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;For smaller teams, you can start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A centralized dashboard linking to your most-used tools&lt;/li&gt;
&lt;li&gt;Consistent CLI tools that work across projects&lt;/li&gt;
&lt;li&gt;Documentation that follows the same structure for all services&lt;/li&gt;
&lt;li&gt;Automating workflows that currently require multiple tools&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pain Point #3: Environment Inconsistency
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;"It works on my machine" might be the most frustrating phrase in software development. Environment inconsistencies waste countless hours on debugging issues that only appear in specific environments.&lt;/p&gt;

&lt;p&gt;When dev, test, and production environments don't match, you're essentially testing different systems. Problems appear out of nowhere during deployment, and fixing them becomes a painful guessing game.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;IDPs provide standardized environment templates and self-service provisioning. This ensures consistency across all stages of development.&lt;/p&gt;

&lt;p&gt;With a platform approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every environment uses identical configurations&lt;/li&gt;
&lt;li&gt;Developers can spin up environments on-demand&lt;/li&gt;
&lt;li&gt;Configuration changes propagate consistently&lt;/li&gt;
&lt;li&gt;Local development matches production&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;Begin with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.docker.com/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt; for containerizing applications&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.docker.com/compose/" rel="noopener noreferrer"&gt;Docker Compose&lt;/a&gt; for local development environments&lt;/li&gt;
&lt;li&gt;Environment configuration stored as code&lt;/li&gt;
&lt;li&gt;Automated environment provisioning scripts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even small teams can implement these practices incrementally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pain Point #4: Cognitive Load from Multiple Tools
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Most teams juggle 6+ different tools, with 13% managing up to 14 different tools in their development chain. Each tool has its own interface, quirks, and mental model.&lt;/p&gt;

&lt;p&gt;Learning and remembering how to use all these tools creates massive cognitive overhead, especially for new team members.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/resources/articles/software-development/what-is-platform-engineering" rel="noopener noreferrer"&gt;Platform engineering&lt;/a&gt; streamlines development by providing standardized tools and interfaces. IDPs create a single point of entry for developers to access everything they need.&lt;/p&gt;

&lt;p&gt;Implementing a platform approach gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uniform interfaces across different tools&lt;/li&gt;
&lt;li&gt;Standardized workflows that work the same way everywhere&lt;/li&gt;
&lt;li&gt;Simplified onboarding for new team members&lt;/li&gt;
&lt;li&gt;Lower learning curve for daily tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;Start by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auditing your current toolchain to identify redundancies&lt;/li&gt;
&lt;li&gt;Creating consistent interfaces for your most-used tools&lt;/li&gt;
&lt;li&gt;Building wrapper scripts that standardize common commands&lt;/li&gt;
&lt;li&gt;Setting up a simple internal portal or wiki that provides single-point access&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pain Point #5: Security &amp;amp; Compliance Overhead
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Security is crucial but often becomes a productivity killer. Manual security reviews, compliance checks, and remediations consume valuable development time and delay deployments.&lt;/p&gt;

&lt;p&gt;When security is bolted on at the end rather than built in from the start, it creates friction and frustration.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;Platform engineering embraces "self-service with guardrails." IDPs build security into workflows rather than tacking it on afterward.&lt;/p&gt;

&lt;p&gt;With a platform approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security scanning happens automatically in pipelines&lt;/li&gt;
&lt;li&gt;Compliance checks run continuously&lt;/li&gt;
&lt;li&gt;Policy enforcement happens transparently&lt;/li&gt;
&lt;li&gt;Developers get instant feedback on security issues&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;Even small teams can implement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-commit hooks for basic security checks&lt;/li&gt;
&lt;li&gt;Automated vulnerability scanning in CI pipelines&lt;/li&gt;
&lt;li&gt;Compliance-as-code using tools like &lt;a href="https://www.openpolicyagent.org/" rel="noopener noreferrer"&gt;OPA&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Security templates for new projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Leveraging What You Already Have
&lt;/h2&gt;

&lt;p&gt;The good news? You probably already have the foundation for platform engineering in place. The trick is connecting these pieces into a cohesive experience:&lt;/p&gt;

&lt;p&gt;Your Git workflow can expand beyond code versioning to include configuration and &lt;a href="https://www.redhat.com/en/topics/automation/what-is-infrastructure-as-code-iac" rel="noopener noreferrer"&gt;Infrastructure as Code&lt;/a&gt; specs.&lt;/p&gt;

&lt;p&gt;Those Docker containers you use for local development? With some standardization, they become the basis for consistent environments across your pipeline.&lt;/p&gt;

&lt;p&gt;That CI/CD pipeline you built for testing? It can become the backbone of a self-service deployment platform.&lt;/p&gt;

&lt;p&gt;The key isn't getting new tools—it's connecting what you have in smarter ways. Focus on eliminating the manual steps between these systems first, then build interfaces that make the process seamless.&lt;/p&gt;

&lt;p&gt;What's your team's biggest development pain point? Let me know in the comments!&lt;/p&gt;

</description>
      <category>devops</category>
      <category>cicd</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Streamlining Multi-Tenant Kubernetes: A Practical Implementation Guide for 2025</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Wed, 14 May 2025 14:55:05 +0000</pubDate>
      <link>https://dev.to/gerimate/streamlining-multi-tenant-kubernetes-a-practical-implementation-guide-for-2025-1bin</link>
      <guid>https://dev.to/gerimate/streamlining-multi-tenant-kubernetes-a-practical-implementation-guide-for-2025-1bin</guid>
      <description>&lt;p&gt;Let's face it: running multiple applications on separate clusters is a resource nightmare. If you've got different teams or customers needing isolated environments, you're probably spending way more on infrastructure than you need to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kubernetes.io/docs/concepts/security/multi-tenancy/" rel="noopener noreferrer"&gt;Multi-tenancy in Kubernetes&lt;/a&gt; offers a solution, but it comes with its own set of challenges. How do you ensure proper isolation? What about resource allocation? And the big one – security?&lt;/p&gt;

&lt;p&gt;This guide provides practical steps for implementing multi-tenant Kubernetes that actually works in production environments. By the end, you'll have a roadmap for consolidating your infrastructure while maintaining isolation where it matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Multi-Tenancy Actually Means in 2025
&lt;/h2&gt;

&lt;p&gt;Multi-tenancy has become a bit of a buzzword, but at its core, it still means the same thing: multiple users sharing the same infrastructure. In Kubernetes, we typically see two flavors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.loft.sh/blog/kubernetes-multi-tenancy-10-essential-considerations" rel="noopener noreferrer"&gt;Multiple teams within an organization&lt;/a&gt;&lt;/strong&gt;: Different departments or projects sharing a cluster, where team members have access through kubectl or GitOps controllers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multiple customer instances&lt;/strong&gt;: SaaS applications running customer workloads on shared infrastructure&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key tradeoffs haven't changed much over the years, either. You're always balancing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt;: Keeping tenants from accessing or messing with each other's resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource efficiency&lt;/strong&gt;: Maximizing hardware utilization and reducing costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational complexity&lt;/strong&gt;: Making sure your team can actually manage this setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What has changed are the tools and patterns. Pure namespace-based isolation is still common, but we've seen a shift toward more sophisticated approaches using hierarchical namespaces, virtual clusters, and service meshes. Let's start with the building blocks you'll need for a practical implementation.&lt;/p&gt;

&lt;p&gt;For more details about how the platform approaches multi-tenancy, check &lt;a href="https://kubernetes.io/docs/concepts/security/multi-tenancy/" rel="noopener noreferrer"&gt;Kubernetes documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Building Blocks: Practical Implementation Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Namespace Configuration That Actually Works
&lt;/h3&gt;

&lt;p&gt;Namespaces are your first line of defense in multi-tenancy. Here's a modern namespace configuration with isolation in mind:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespace&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;tenant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/enforce&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;baseline&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/audit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restricted&lt;/span&gt;
    &lt;span class="na"&gt;pod-security.kubernetes.io/warn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restricted&lt;/span&gt;
    &lt;span class="na"&gt;networking.k8s.io/isolation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;enabled&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does a few key things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creates a dedicated namespace for the tenant&lt;/li&gt;
&lt;li&gt;Labels it for easier filtering and policy targeting&lt;/li&gt;
&lt;li&gt;Applies Pod Security Standards (the modern replacement for Pod Security Policies)&lt;/li&gt;
&lt;li&gt;Marks it for network isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When organizing namespaces, many teams follow a pattern like &lt;code&gt;{tenant}-{environment}&lt;/code&gt; (e.g., &lt;code&gt;marketing-dev&lt;/code&gt;, &lt;code&gt;marketing-prod&lt;/code&gt;). For SaaS applications, you might use customer IDs or similar identifiers.&lt;/p&gt;

&lt;p&gt;The key thing to remember: namespaces alone aren't enough for true isolation. They're just containers for resources – you need additional controls to enforce boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  RBAC That Actually Isolates Tenants
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.illumio.com/cybersecurity-101/rbac" rel="noopener noreferrer"&gt;Role-Based Access Control (RBAC)&lt;/a&gt; is essential for preventing tenants from accessing each other's resources. Here's a pattern that works well in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Tenant admin role&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Role&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-admin&lt;/span&gt;
&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;apps"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batch"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pods"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;services"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deployments"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jobs"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;verbs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;watch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;networking.k8s.io"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ingresses"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;verbs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;watch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patch"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;apiGroups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configmaps"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secrets"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;verbs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;watch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;update"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patch"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delete"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="c1"&gt;# Binding for tenant admin&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RoleBinding&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-admin-binding&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
&lt;span class="na"&gt;subjects&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;User&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-admin&lt;/span&gt;
  &lt;span class="na"&gt;apiGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io&lt;/span&gt;
&lt;span class="na"&gt;roleRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Role&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-admin&lt;/span&gt;
  &lt;span class="na"&gt;apiGroup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rbac.authorization.k8s.io&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice a few important things here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The role is scoped to a specific namespace (&lt;code&gt;tenant-a&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;It grants permissions for common resources but nothing cluster-wide&lt;/li&gt;
&lt;li&gt;The binding associates a user with this role&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is simple but effective: create a set of standard roles for each tenant (admin, developer, viewer), each scoped to the tenant's namespace(s). &lt;/p&gt;

&lt;p&gt;One mistake I see teams make is being too generous with permissions. Start restrictive and loosen gradually as needed – it's much easier than trying to lock things down after a breach.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Policies That Actually Isolate Traffic
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://workos.com/blog/tenant-isolation-in-multi-tenant-systems" rel="noopener noreferrer"&gt;Network isolation&lt;/a&gt; is critical for multi-tenancy. By default, all pods in a Kubernetes cluster can talk to each other – not what you want in a multi-tenant environment.&lt;/p&gt;

&lt;p&gt;Here's a practical network policy that isolates tenant traffic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-isolation&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;podSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;  &lt;span class="c1"&gt;# Applies to all pods in namespace&lt;/span&gt;
  &lt;span class="na"&gt;policyTypes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Egress&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;tenant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
  &lt;span class="na"&gt;egress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;tenant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;common-services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This policy does two important things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allows ingress traffic only from the same tenant's namespace&lt;/li&gt;
&lt;li&gt;Allows egress traffic only to the same tenant's namespace or to namespaces labeled as common services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second part is particularly important – your tenants probably need access to shared services like monitoring, logging, or databases. By labeling those namespaces as &lt;code&gt;common-services: "true"&lt;/code&gt;, you create controlled exceptions to your isolation rules.&lt;/p&gt;

&lt;p&gt;A common mistake is forgetting about DNS and other cluster services. Make sure your network policies allow access to kube-system services that tenants need, or you'll have some very confusing debugging sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resource Quotas to Prevent Noisy Neighbors
&lt;/h3&gt;

&lt;p&gt;One bad tenant can ruin the party for everyone by consuming all available resources. Resource quotas prevent this "noisy neighbor" problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ResourceQuota&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-quota&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hard&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;requests.cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10"&lt;/span&gt;
    &lt;span class="na"&gt;requests.memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20Gi&lt;/span&gt;
    &lt;span class="na"&gt;limits.cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;20"&lt;/span&gt; 
    &lt;span class="na"&gt;limits.memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;40Gi&lt;/span&gt;
    &lt;span class="na"&gt;persistentvolumeclaims&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;20"&lt;/span&gt;
    &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;30"&lt;/span&gt;
    &lt;span class="na"&gt;count/deployments.apps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;25"&lt;/span&gt;
    &lt;span class="na"&gt;count/statefulsets.apps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This quota sets limits on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU and memory consumption (both requests and limits)&lt;/li&gt;
&lt;li&gt;Number of persistent volume claims (storage)&lt;/li&gt;
&lt;li&gt;Number of services and workloads (deployments, statefulsets)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Setting appropriate quota sizes takes some experimentation. Monitor actual usage patterns and adjust accordingly – too restrictive and legitimate workloads fail, too loose and you're back to the noisy neighbor problem.&lt;/p&gt;

&lt;p&gt;Pro tip: In addition to ResourceQuotas (which operate at namespace level), use LimitRanges to set default and maximum limits for individual containers. This prevents tenants from creating resource-hungry pods that still fit within their overall quota.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Implementation Benefits
&lt;/h2&gt;

&lt;p&gt;Research and industry reports show clear benefits when organizations implement proper multi-tenancy in Kubernetes environments:&lt;/p&gt;

&lt;p&gt;According to documented implementations, organizations typically see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;30-40% reduction in infrastructure costs by consolidating multiple single-tenant clusters&lt;/li&gt;
&lt;li&gt;Significant decrease in time spent on cluster maintenance and updates&lt;/li&gt;
&lt;li&gt;Improved resource utilization, often doubling from around 30-35% to 70% or more&lt;/li&gt;
&lt;li&gt;Better standardization across development teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, implementation isn't without challenges. Common issues include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Resistance from teams concerned about workload security and isolation&lt;/li&gt;
&lt;li&gt;Migration complexity for existing applications&lt;/li&gt;
&lt;li&gt;Learning curve for new multi-tenant tooling and workflows&lt;/li&gt;
&lt;li&gt;Special accommodations needed for resource-intensive or security-sensitive workloads&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This highlights an important point: multi-tenancy isn't all-or-nothing. Many successful implementations use a hybrid approach, keeping some high-security or high-performance workloads on dedicated clusters while consolidating standard workloads in shared environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving the Big Three Challenges
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Challenge 1: Security Vulnerabilities
&lt;/h3&gt;

&lt;p&gt;Cross-tenant data leakage and escalation attacks are the nightmare scenarios in multi-tenant environments. Here's a practical security checklist:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Enforce Pod Security Standards&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Namespace&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
     &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;pod-security.kubernetes.io/enforce&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restricted&lt;/span&gt;
       &lt;span class="na"&gt;pod-security.kubernetes.io/enforce-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1.29&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "restricted" profile prevents pods from running as privileged, accessing host namespaces, or using dangerous capabilities.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Isolate tenant storage&lt;/strong&gt;:&lt;br&gt;
Use StorageClasses with tenant-specific access controls, or better yet, separate storage backends for sensitive data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Implement regular security scanning&lt;/strong&gt;:&lt;br&gt;
Tools like Trivy, Falco, and Kube-bench can identify vulnerabilities in your multi-tenant setup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Audit, audit, audit&lt;/strong&gt;:&lt;br&gt;
Enable audit logging and regularly review access patterns – many breaches are detected through unusual access.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Challenge 2: Resource Contention
&lt;/h3&gt;

&lt;p&gt;Even with resource quotas, you can still run into contention issues. Here are some practical solutions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pod Priority and Preemption&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scheduling.k8s.io/v1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PriorityClass&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-high-priority&lt;/span&gt;
   &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1000000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Assign different priority classes to tenant workloads based on their importance.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Node Anti-Affinity&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;affinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;podAntiAffinity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;requiredDuringSchedulingIgnoredDuringExecution&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;labelSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="na"&gt;matchExpressions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
           &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant&lt;/span&gt;
             &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;In&lt;/span&gt;
             &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
             &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
         &lt;span class="na"&gt;topologyKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kubernetes.io/hostname"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents multiple pods from the same tenant being scheduled on the same node, distributing the load.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Quality of Service Classes&lt;/strong&gt;:
Set appropriate QoS classes (Guaranteed, Burstable, BestEffort) for different tenant workloads to influence how they're treated under resource pressure.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Challenge 3: Operational Complexity
&lt;/h3&gt;

&lt;p&gt;Managing dozens or hundreds of tenants manually isn't feasible. Here's how to simplify operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automate tenant provisioning&lt;/strong&gt;:&lt;br&gt;
Create a standardized process for spinning up new tenant namespaces, applying policies, and setting quotas.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use a tenant operator&lt;/strong&gt;:&lt;br&gt;
Tools like &lt;a href="https://projectcapsule.dev/" rel="noopener noreferrer"&gt;Capsule&lt;/a&gt; or the &lt;a href="https://developers.redhat.com/articles/2024/02/14/deep-dive-stakaters-multi-tenant-operator" rel="noopener noreferrer"&gt;Multi-Tenant Operator&lt;/a&gt; can handle tenant lifecycle management, from creation to termination:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;   &lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenancy.stakater.com/v1alpha1&lt;/span&gt;
   &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Tenant&lt;/span&gt;
   &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a&lt;/span&gt;
   &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="na"&gt;owners&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-admin&lt;/span&gt;
       &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;User&lt;/span&gt;
     &lt;span class="na"&gt;namespaces&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tenant-a-dev&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tenant-a-prod&lt;/span&gt;
     &lt;span class="na"&gt;quota&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
       &lt;span class="na"&gt;hard&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
         &lt;span class="na"&gt;requests.cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;10'&lt;/span&gt;
         &lt;span class="na"&gt;requests.memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20Gi&lt;/span&gt;
     &lt;span class="na"&gt;resourcePooling&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
     &lt;span class="na"&gt;namespacePrefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tenant-a-&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Implement tenant-aware monitoring&lt;/strong&gt;:&lt;br&gt;
Tag all metrics and logs with tenant identifiers to simplify debugging and enable tenant-specific dashboards.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create self-service capabilities&lt;/strong&gt;:&lt;br&gt;
Build internal tools that let tenants manage their own resources within the constraints you define.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Wrapping Up: Is Multi-Tenancy Right for You?
&lt;/h2&gt;

&lt;p&gt;Multi-tenant Kubernetes isn't a silver bullet, but it can significantly reduce costs and operational overhead when implemented correctly. Here's a quick checklist to decide if it's right for your organization:&lt;/p&gt;

&lt;p&gt;✅ You have multiple teams or customers using similar infrastructure&lt;br&gt;
✅ You're comfortable with the security implications of shared infrastructure&lt;br&gt;
✅ You have the operational maturity to implement and maintain isolation&lt;br&gt;
✅ The cost savings outweigh the increased complexity&lt;/p&gt;

&lt;p&gt;The implementation patterns we've covered – namespace isolation, RBAC, network policies, and resource quotas – provide a solid foundation for most multi-tenant environments. Start small, perhaps with just two teams or customers, and expand as you gain confidence in your isolation mechanisms.&lt;/p&gt;

&lt;p&gt;Remember, you don't have to go all-in on multi-tenancy. Many organizations use a hybrid approach, with shared clusters for most workloads and dedicated clusters for high-security or high-performance applications.&lt;/p&gt;

&lt;p&gt;Whatever approach you choose, make sure your teams understand the boundaries and limitations of your multi-tenant setup. Technical controls are important, but so is user education – a confused tenant can unintentionally cause problems for everyone.&lt;/p&gt;

&lt;p&gt;What's your experience with multi-tenant Kubernetes? Have you implemented any of these patterns, or do you have alternative approaches? Share your thoughts in the comments below.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>multitenancy</category>
    </item>
    <item>
      <title>Goodbye, 2023! dyrector.io’s Annual Recap</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Wed, 20 Dec 2023 11:04:33 +0000</pubDate>
      <link>https://dev.to/dyrectorio/goodbye-2023-dyrectorios-annual-recap-abl</link>
      <guid>https://dev.to/dyrectorio/goodbye-2023-dyrectorios-annual-recap-abl</guid>
      <description>&lt;p&gt;&lt;strong&gt;2023 is coming to an end, which means it's time to revisit what happened with the team and the project of dyrector.io in the past 12 months.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  January – Full Stack Highlighted dyrector.io
&lt;/h2&gt;

&lt;p&gt;After the lengthy Christmas break with a full stomach and a couple extra kilograms the real surprise caught us blind-sided. &lt;strong&gt;&lt;a href="https://thefullstack.network/" rel="noopener noreferrer"&gt;The Full Stack&lt;/a&gt;&lt;/strong&gt; platform featured dyrector.io in its highlights.&lt;/p&gt;

&lt;p&gt;Team-wise the most notable event was our Minus 30 hike in the pleasant January weather, which was a great occasion to have a chat about both technology related and unrelated things, and also to taste some pálinka.&lt;/p&gt;

&lt;h2&gt;
  
  
  February – dyrector.io Alpha Dropped
&lt;/h2&gt;

&lt;p&gt;The first weeks of February were all about attending &lt;strong&gt;&lt;a href="https://fosdem.org/2024/" rel="noopener noreferrer"&gt;FOSDEM&lt;/a&gt;&lt;/strong&gt; and the upcoming launch of dyrector.io on Product Hunt. On the day of the launch we made alpha access available.&lt;/p&gt;

&lt;p&gt;Our &lt;strong&gt;&lt;a href="https://www.producthunt.com/products/dyrector-io-platform#dyrector-io" rel="noopener noreferrer"&gt;Product Hunt launch&lt;/a&gt;&lt;/strong&gt; turned out to be a shot at the buzzer, but we still did nice. With a launch 6 hours into the voting, we reached the #11 spot. The same day we made a new release and a demo video. Busier than planned, but we did good.&lt;/p&gt;

&lt;p&gt;At the conference in Belgium, we were able to catch up with a lot of likeminded people eager to learn about open-source software.&lt;/p&gt;

&lt;p&gt;At the same time, our teammate, Levi showed up in the local cloud meetup scene as organizer and a presenter, too. Another teammate of ours, Nándi was interviewed in the &lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=_qFJ5GEs2w4" rel="noopener noreferrer"&gt;podcast&lt;/a&gt;&lt;/strong&gt; series of Uptime Community about DevOps, ChatGPT, and open-source.&lt;/p&gt;

&lt;h2&gt;
  
  
  March – Three (Hundred) Is the Magic Number
&lt;/h2&gt;

&lt;p&gt;We doubled down on catering to a self-hosting audience in the first months of 2023, which helped us reach 300 stars on GitHub on the 3rd of March. We published a bunch of blog posts about self-hosting certain types of applications, which you can find here.&lt;/p&gt;

&lt;p&gt;In March, we published our &lt;strong&gt;&lt;a href="https://github.com/dyrector-io/awesome-infrastructure-questions" rel="noopener noreferrer"&gt;Awesome repository&lt;/a&gt;&lt;/strong&gt; containing infrastructure related questions. We consider it useful when someone is onboarded to a new project maintaining infrastructure.&lt;/p&gt;

&lt;p&gt;Another important event of the month was when &lt;strong&gt;&lt;a href="https://blog.dyrector.io/2023-03-21-docker-hub-registry-alternatives/" rel="noopener noreferrer"&gt;Docker announced the end of Free Teams on Docker Hub&lt;/a&gt;&lt;/strong&gt;. Backlash was inevitable and so was the organization backing out of their plans of monetizing Free Teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  April – Adventures in the UK and Hungary
&lt;/h2&gt;

&lt;p&gt;A portion of our team took a business trip in the UK to visit Hanover Displays at their HQ in Brighton. While Levi and Gopher was there, they paid a visit to the LEGO HQ for a meetup, as well.&lt;/p&gt;

&lt;p&gt;After the trip in the UK, we went out of the office for a few days of team building when we could unwind with the whole team.&lt;/p&gt;

&lt;p&gt;Levi attended KubeCon in Amsterdam, too, which turned out to be the funniest way to reach 420 stars on GitHub on April 20th. Trust me, we didn’t plan this whatsoever.&lt;/p&gt;

&lt;h2&gt;
  
  
  May – 0.4.0 &amp;amp; Roadmap Published
&lt;/h2&gt;

&lt;p&gt;After a Q1 busy with refactoring and making dyrector.io’s code more efficient, we started to make new releases faster. The first step was making 0.4.0, which didn’t deliver any significant changes to functionality, but it was important to accelerate our release cycle in the long run.&lt;/p&gt;

&lt;p&gt;At the same time, we published our &lt;strong&gt;&lt;a href="https://github.com/orgs/dyrector-io/projects/2" rel="noopener noreferrer"&gt;roadmap&lt;/a&gt;&lt;/strong&gt; on GitHub and added new issues to the repository.&lt;/p&gt;

&lt;p&gt;We also made some new friends: ConfigCat reviewed the platform on their &lt;strong&gt;&lt;a href="https://configcat.com/blog/2023/05/16/introducing-dyrectorio-to-configcat-users/" rel="noopener noreferrer"&gt;blog&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  June – Team Building in Croatia &amp;amp; 1000000000 Stars
&lt;/h2&gt;

&lt;p&gt;Release 0.5.0 was a special moment for our team. It was the first version in months that included new features. To celebrate this special moment, we went to Croatia to finish working on the new version and chill at the sunny beach.&lt;/p&gt;

&lt;p&gt;This was the perfect way to kick off our summer. After the trip to Croatia, we were able to consistently release on a bi-weekly basis, shipping new features again and again.&lt;/p&gt;

&lt;p&gt;After 0.5.0 dropped, we passed 512 stars on GitHub, or 1000000000 in binary.&lt;/p&gt;

&lt;h2&gt;
  
  
  July – Automated Deployments With dyrector.io
&lt;/h2&gt;

&lt;p&gt;One of the most significant features we added this year was the auto-deployment capability. The &lt;strong&gt;&lt;a href="https://blog.dyrector.io/2023-07-20-dyrector-io-github-actions-continuous-deployment/" rel="noopener noreferrer"&gt;GitHub Actions compatible feature&lt;/a&gt;&lt;/strong&gt; came out on July 14 in release 0.6.0.&lt;/p&gt;

&lt;p&gt;A very pleasant surprise was when Nevo David mentioned dyrector.io in his &lt;strong&gt;&lt;a href="https://dev.to/github20k/7-open-source-projects-you-should-contribute-to-in-2023-1nph"&gt;blog post&lt;/a&gt;&lt;/strong&gt;, which resulted in increased exposure and interest in the platform. In a few days we gained hundreds of stars on GitHub.&lt;/p&gt;

&lt;p&gt;Even though it was the middle of the summer, we took no breaks. Between publishing new releases full of new features, we went to Lake Balaton to sail and Nándi and Geri even completed the Lake Balaton Cross Swimming.&lt;/p&gt;

&lt;p&gt;At the end of July Levi attended WeAreDevelopers 2023 in Berlin.&lt;/p&gt;

&lt;h2&gt;
  
  
  August – dyrector.io Turns International
&lt;/h2&gt;

&lt;p&gt;The most significant change was an internal change: our teammate, Nándi moved to the Netherlands with his girlfriend. We officially became a remote-first company, while the rest of the team still showed up at the office every day. We had a goodbye party for him where we said farewell with a few cans of his favorite beverages for the road.&lt;/p&gt;

&lt;p&gt;We launched dyrector.io on a new platform called &lt;strong&gt;&lt;a href="https://devhunt.org/tool/dyrectorio" rel="noopener noreferrer"&gt;Dev Hunt&lt;/a&gt;&lt;/strong&gt;, which is an open-source Product Hunt alternative. With the help of our community, we were able to reach the #1 spot and the Developer Tool of the Week title that comes with it.&lt;/p&gt;

&lt;p&gt;In other cloud-related news HashiCorp &lt;strong&gt;&lt;a href="https://www.hashicorp.com/blog/hashicorp-adopts-business-source-license" rel="noopener noreferrer"&gt;announced&lt;/a&gt;&lt;/strong&gt; they're changing their products' license, including Terraform’s, to Business Source License, which sparked the foundation of OpenTF, which later was named OpenTofu.&lt;/p&gt;

&lt;h2&gt;
  
  
  September – Product Hunt Launch #2
&lt;/h2&gt;

&lt;p&gt;The majority of August was spent on preparations for our &lt;strong&gt;&lt;a href="https://www.producthunt.com/products/dyrector-io-platform#dyrector-io-3" rel="noopener noreferrer"&gt;Product Hunt launch&lt;/a&gt;&lt;/strong&gt; in September. The date was set – September 8th. We knew a product like ours only has a chance of a significant result on a Friday.&lt;/p&gt;

&lt;p&gt;The result: #6 in the daily rankings, top 50 in the weekly with around 260 votes. Definitely an impressive result with a heavily developer-focused tool.&lt;/p&gt;

&lt;p&gt;In the meantime, Levi took care of networking: he appeared in the &lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=ZU6ql5Wlcs0" rel="noopener noreferrer"&gt;Follow The Pattern&lt;/a&gt;&lt;/strong&gt; podcast, attended InfoBip’s Shift conference in Croatia, and went to Kubernetes Community Days in Vienna.&lt;/p&gt;

&lt;h2&gt;
  
  
  October – darklens Enters the Scene
&lt;/h2&gt;

&lt;p&gt;The biggest achievement of October in our household was a one-week sprint when more than half of the team was on vacation. Three teammates of ours joined forces, two developers and one marketer, to develop a complimentary product to dyrector.io.&lt;/p&gt;

&lt;p&gt;We named this tool &lt;strong&gt;&lt;a href="https://github.com/dyrector-io/darklens" rel="noopener noreferrer"&gt;darklens&lt;/a&gt;&lt;/strong&gt;, which makes Docker logs and container settings available in your browser. A week after the sprint we launched darklens on Product Hunt for an impressive #14 spot with 140 upvotes.&lt;/p&gt;

&lt;h2&gt;
  
  
  November – Team Building in Portugal
&lt;/h2&gt;

&lt;p&gt;Over the summer, the whole team was able to snag developer tickets to Web Summit in Lisbon. Soon as we got the confirmation, we started planning our travel to Portugal. With a little sightseeing and networking at the conference, the week we spent in Lisbon turned out to be a blast. We made a lot of new connections.&lt;/p&gt;

&lt;p&gt;One of the coolest things of the year was when people found the invitation card for our CTF puzzle and came to our Discord channel or stopped by to say hi at Web Summit.&lt;/p&gt;

&lt;h2&gt;
  
  
  December – 0.10.0. Dropped
&lt;/h2&gt;

&lt;p&gt;The latest release of dyrector.io, 0.10.0 dropped in early December. You can find out more about it on &lt;strong&gt;&lt;a href="https://github.com/dyrector-io/dyrectorio/releases/tag/0.10.0" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s it for 2023. So long, and thanks for all the fish!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This blogpost was written by the team of &lt;a href="https://dyrectorio.com" rel="noopener noreferrer"&gt;dyrector.io&lt;/a&gt;. dyrector.io is an open-source continuous delivery &amp;amp; deployment platform with version management.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support us with a star on &lt;a href="https://github.com/dyrector-io/dyrectorio/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>startup</category>
    </item>
    <item>
      <title>5 Use Cases When Containerization Is Absolutely Useless for You</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Thu, 30 Nov 2023 14:19:59 +0000</pubDate>
      <link>https://dev.to/dyrectorio/5-use-cases-when-containerization-is-absolutely-useless-for-you-c3p</link>
      <guid>https://dev.to/dyrectorio/5-use-cases-when-containerization-is-absolutely-useless-for-you-c3p</guid>
      <description>&lt;h2&gt;
  
  
  #1 Static, Unchanging Environments
&lt;/h2&gt;

&lt;p&gt;If your application has minimal dependencies and operates consistently across different environments without the need for isolation, containerization may offer little benefit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If your application will be the only process executed on the machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  #2 Limited Scalability Needs
&lt;/h2&gt;

&lt;p&gt;For applications with predictable and steady workloads that do not require rapid scaling or dynamic resource allocation, the overhead of containerization might outweigh the advantages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small scale IoT apps.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  #3 Simple, Standalone Applications
&lt;/h2&gt;

&lt;p&gt;In cases where your application is straightforward, lacks dependencies, and isn't part of a larger ecosystem with varied technologies, containerization may introduce unnecessary complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero dependency binaries, and also debugging a host process is more straightforward than doing the same with a container.&lt;/li&gt;
&lt;li&gt;Offline applications installed from external medium, running without internet connection.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  #4 Resource-Constrained Environments
&lt;/h2&gt;

&lt;p&gt;On systems with extremely limited resources, such as embedded devices or constrained hardware, the overhead of running containerization platforms might not be justified.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Microelectronics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  #5 Desktop Applications
&lt;/h2&gt;

&lt;p&gt;Sounds exotic, huh? For a good reason. It would be very unusual to use containers for desktop applications. Though similar isolation techniques exist, it is not widespread.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cs_16_nosteam_portable.exe😅&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  If You Really Need to Containerize...
&lt;/h2&gt;

&lt;p&gt;You can use dyrector.io to deploy and manage containerized services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;⭐ Star dyrector.io on GitHub:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/dyrector-io/dyrectorio" rel="noopener noreferrer"&gt;https://github.com/dyrector-io/dyrectorio&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>containers</category>
      <category>docker</category>
      <category>kubernetes</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Dagger 101: How to Get Started with Containerized CI Workflows</title>
      <dc:creator>Greg Mate</dc:creator>
      <pubDate>Thu, 23 Nov 2023 11:04:26 +0000</pubDate>
      <link>https://dev.to/dyrectorio/dagger-101-how-to-get-started-with-containerized-ci-workflows-105m</link>
      <guid>https://dev.to/dyrectorio/dagger-101-how-to-get-started-with-containerized-ci-workflows-105m</guid>
      <description>&lt;p&gt;&lt;strong&gt;Continuous Integration and Continuous Delivery are the secret sauces of shipping new features consistently and reliably to your software. However, the effectiveness of this process is closely tied to the tooling that orchestrates it. Some of the pain points of CI/CD systems are slow feedback loops, vendor lock-in, lack of abstraction, limited composability, or YAML itself. This is where Dagger comes into the spotlight, promising a more unified and accelerated path.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The development and deployment process at dyrector.io has already become much faster each year as we adopt and integrate better tools and methods. However, we aim to further unify and accelerate this. Dagger philosophy aligns with what we consider crucial for a truly rapid and seamless process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local testing: Enable developers to test their code instantly, locally&lt;/li&gt;
&lt;li&gt;Programmable CI: Replace messy YAML-based, complex CI with code&lt;/li&gt;
&lt;li&gt;Compatibility: If it runs in a container, you can add it to your pipeline&lt;/li&gt;
&lt;li&gt;Portability: The same pipeline can run on your local machine, a CI runner, a dedicated server, or any container hosting service&lt;/li&gt;
&lt;li&gt;Universal caching: Every operation is cached by default, and caching works the same everywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Currently, we have the option to use our own dyrector.io (we’ll refer to it as dyo many times in this blog post) go CLI with our commands or Docker Compose with its YAML to spin up our stack for local testing, while we also maintain a GitHub Actions workflow for running end-to-end tests on GitHub. This setup lacks coherence, as we cannot employ the specialized GitHub Actions workflow YAML in a local setting or with a different CI/CD environment.&lt;/p&gt;

&lt;p&gt;We want to get closer to being able to ship every single day, or even multiple times a day, as quickly as we possibly can, using the same tool running locally and in CI. Dagger feels like an actual innovation in CI/CD, and it seems it will enable us to do that. There is also a strong focus on getting feedback from the community and utilizing it when we’re designing and building something that people really need.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up Dagger CI/CD
&lt;/h2&gt;

&lt;p&gt;We would like to use Dagger locally with the dyo Go CLI, and for this we need the Dagger Go SDK for integration (there are many Dagger SDKs) and the Dagger Engine, which will run our pipelines. We developed a small proof of concept (POC) to evaluate if we could use our entire stack locally with Dagger. If this POC will be successful, we plan to use the same setup in our GitHub workflow, essentially using GitHub Actions just to trigger the Dagger pipeline.&lt;/p&gt;

&lt;p&gt;Steps to set up Dagger for our project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the Dagger Go SDK
(again, you can use any other Dagger SDK for your project, but we use Go)
Go to your existing project – in our case it is dyrectorio.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ go get dagger.io/dagger
$ go mod tidy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Add local Dagger test to our Makefile
It is for simple and fast “make test” (similarly to our other commands).
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Shortcut for local testing
.PHONY: test
test:
    go run golang/cmd/dagger/main.go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Create Dagger main.go&lt;br&gt;
We already have dyo, dagent and crane in our golang/cmd, so put dagger here too.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Import Dagger SDK&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Create a Dagger client using the SDK&lt;br&gt;
This will allow you to interact with the Dagger Engine and create pipelines.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create Dagger pipelines&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Additional note:&lt;br&gt;
We can also install the Dagger CLI if we want to, but this is an optional tool to interact with the Dagger Engine from the command-line – it has a nice terminal UI though, with parallel progress bars that are visually impressive if you are into that sort of thing.&lt;/p&gt;

&lt;p&gt;Install the Dagger CLI&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ cd /usr/local
$ curl -L https://dl.dagger.io/dagger/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Workflow Integration
&lt;/h2&gt;

&lt;p&gt;As you will see, the “Dagger way” is a very “Docker-ish” way - no surprise, one of the co-founders of Dagger is Solomon Hykes, earlier founder and technical director of Docker.&lt;/p&gt;

&lt;p&gt;To show you concrete code examples from our POC:&lt;/p&gt;

&lt;p&gt;Import Dagger SDK&lt;br&gt;
In our main.go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import (
    "context"
    "dagger.io/dagger"
    …)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a Dagger client using the SDK&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func initDaggerClient(ctx context.Context) *dagger.Client {
    client, err := dagger.Connect(ctx, dagger.WithLogOutput(os.Stdout))
    if err != nil {
        panic(err)
    }
    return client
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we can call this initDaggerClient() function in our main() like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    ctx := context.Background()
    client := initDaggerClient(ctx)
    defer client.Close()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run unit tests on our NestJS-based Crux backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func runCruxUnitTestPipeline(ctx context.Context, client *dagger.Client) {
    log.Info().Msg("Run crux unit test pipeline...")

    _, err := client.Container().From("node:20-alpine").
        WithDirectory("/src", client.Host().Directory("web/crux/"), dagger.ContainerWithDirectoryOpts{
            Exclude: []string{"node_modules"},
        }).
        WithWorkdir("/src").
        WithExec([]string{"npm", "ci"}).
        WithExec([]string{"npm", "run", "test"}).
        Stdout(ctx)
    if err != nil {
        panic(err)
    }

    log.Info().Msg("Crux unit test pipeline done.")
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can call this runCruxUnitTestPipeline() function in our main():&lt;br&gt;
    runCruxUnitTestPipeline(ctx, client)&lt;/p&gt;

&lt;p&gt;Run unit tests on our Next.js-based Crux UI frontend is very similar to the above code, we only need to change the host directory to “web/crux-ui/” and an additional “.next” exclusion, everything else remains the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    WithDirectory("/src", client.Host().Directory("web/crux-ui/"), dagger.ContainerWithDirectoryOpts{
        Exclude: []string{"node_modules", ".next"},
    }).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A slightly more advanced example when we run our Crux backend in production mode (as we do for e2e test) with a connected PostgreSQL DB service container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func getEnv(envPath string) map[string]string {
    cruxEnv, err := godotenv.Read(envPath)
    if err != nil {
        panic(err)
    }
    return cruxEnv
}

func getCruxPostgres(client *dagger.Client, cruxEnv map[string]string) *dagger.Container {
    databaseURL := cruxEnv["DATABASE_URL"]
    parsedURL, err := url.Parse(databaseURL)
    if err != nil {
        panic(err)
    }
    postgresUsername := parsedURL.User.Username()
    postgresPassword, _ := parsedURL.User.Password()
    postgresDB := strings.TrimPrefix(parsedURL.Path, "/")

    dataCache := client.CacheVolume("data")

    cruxPostgres := client.Pipeline("crux-postgres").Container().From("postgres:14.2-alpine").
        WithMountedCache("/data", dataCache).
        WithEnvVariable("POSTGRES_USER", postgresUsername).
        WithEnvVariable("POSTGRES_PASSWORD", postgresPassword).
        WithEnvVariable("POSTGRES_DB", postgresDB).
        WithEnvVariable("PGDATA", "/data/postgres").
        WithExposedPort(5432)

    return cruxPostgres
}

func runCruxProd(ctx context.Context, client *dagger.Client, cruxPostgres *dagger.Container) *dagger.Container {
    crux := client.Pipeline("crux").Container().From("node:20-alpine")
    crux = crux.
        WithDirectory("/src", client.Host().Directory("web/crux/"), dagger.ContainerWithDirectoryOpts{
            Exclude: []string{"node_modules"},
        }).
        WithWorkdir("/src").
        WithServiceBinding("localhost", cruxPostgres).
        // WithEnvVariable("NOCACHE", time.Now().String()).
        WithExec([]string{"npm", "ci"}).
        WithExec([]string{"npm", "run", "build"}).
        WithExec([]string{"npm", "run", "prisma:migrate"}).
        WithExec([]string{"npm", "run", "start:prod"})

    _, err := crux.Stdout(ctx)
    if err != nil {
        panic(err)
    }

    return crux
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can run the above code in our main() like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    cruxEnv := getEnv("web/crux/.env") 
    cruxPostgres := getCruxPostgres(client, cruxEnv) 
    runCruxProd(ctx, client, cruxPostgres) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We would like to note that we made our POC with Dagger 0.8.x during September, so the code snippets above will show that. But even then the new API development of Dagger Services v2 (which we will need for our complex e2e pipeline) was in progress at Dagger in a separate feature branch and they promised on their Discord forum back then that this new API with some breaking changes will be included in Dagger 0.9. It wasn’t just us showing demand for parallel long running service containers - and they kept their word and it is indeed included in Dagger 0.9.0 released at the end of October. Shouts to Team Dagger!&lt;/p&gt;

&lt;p&gt;We put our POC on hold in October, but we have been keeping an eye on Service v2 developments and news. We will try out Service v2 in the near future and dedicate another blog post to whether we managed to solve our entire e2e pipeline with Dagger.&lt;/p&gt;

&lt;p&gt;Dagger efficiently caches each step of the pipelines, automatically handling the caching of source code copies, containers and builds, and when developers configure it programmatically, it also caches mounted volumes such as database data, node_modules, and Go build-cache. Our logs provide clear examples of this on reruns without code modifications.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    copy web/crux/ CACHED
    &amp;gt; in host.directory web/crux/
    …
    pull docker.io/library/postgres:14.2-alpine CACHED
    &amp;gt; in crux-postgres &amp;gt; from postgres:14.2-alpine
    &amp;gt; in crux &amp;gt; service bvqf991cmob5i.97ul8ph8qf1qc.dagger.local
    …
    exec docker-entrypoint.sh postgres
    &amp;gt; in crux &amp;gt; service bvqf991cmob5i.97ul8ph8qf1qc.dagger.local
    [0.15s] PostgreSQL Database directory appears to contain a database; Skipping initialization
    …
    [0.30s] 2023-11-08 10::11.131 UTC [15] LOG:  database system is ready to accept connections
    ...
    exec docker-entrypoint.sh npm run build CACHED
    &amp;gt; in crux
    exec docker-entrypoint.sh npm run prisma:migrate CACHED
    &amp;gt; in crux
    exec docker-entrypoint.sh npm ci CACHED
    &amp;gt; in crux
    copy / /src CACHED
    &amp;gt; in crux
    exec docker-entrypoint.sh npm run start:prod
    &amp;gt; in crux
    [0.57s] &amp;gt; crux@0.7.0 start:prod
    [0.57s] &amp;gt; node dist/main
    [2.31s] [Nest] 33  - 11/07/2023, 14:24:13.142 AM     LOG [NestFactory] Starting Nest application...
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Challenges and Lessons Learned
&lt;/h2&gt;

&lt;p&gt;We were able to run most of our stack with Dagger 0.8.x, the Crux backend and the Crux-UI frontend separately, but our entire e2e test will require Dagger 0.9.x with the Services v2 API that we can run Crux, Crux-ui, Traefik and Kratos as long running service containers for the Playwright e2e container. &lt;/p&gt;

&lt;p&gt;If you want to know more about the Services v2, Dagger wrote a blog post about it here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dagger 0.9: Host-to-container, container-to-host, and other networking improvements:&lt;/strong&gt; &lt;a href="https://dagger.io/blog/dagger-0-9" rel="noopener noreferrer"&gt;https://dagger.io/blog/dagger-0-9&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices for Dagger CI/CD
&lt;/h2&gt;

&lt;p&gt;The fact that we can write the CI/CD code in Go and in a docker-like style had a refreshing effect on us. Here are some general tips:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Iterate small:&lt;/strong&gt; Start with a small POC to understand how Dagger fits into your workflow before scaling up&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community engagement:&lt;/strong&gt; Stay active in Dagger's community forums or Discord channels for support and to keep up with the latest developments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation:&lt;/strong&gt; Keep your Dagger configurations well-documented to ease onboarding and maintenance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor and optimize:&lt;/strong&gt; Regularly review the performance of your pipelines and optimize caching strategies for better efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We have seen firsthand the transformative nature of Dagger and the flexibility of its programmable pipelines. It stands out as a forward-thinking solution, addressing typical CI/CD bottlenecks with a developer-centric approach. Since Dagger is relatively new and evolving, keeping an eye on updates and community feedback can help in adopting best practices as they emerge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dagger Resources
&lt;/h2&gt;

&lt;p&gt;There's still lot to learn about Dagger, so it might be worth the time to check out the following resources to learn about this tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can explore further on Dagger's official website: &lt;strong&gt;&lt;a href="https://dagger.io" rel="noopener noreferrer"&gt;https://dagger.io&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;For those eager to dive deeper into Dagger's capabilities, the Dagger documentation is an excellent resource: &lt;strong&gt;&lt;a href="https://docs.dagger.io" rel="noopener noreferrer"&gt;https://docs.dagger.io&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;For absolute hackers: &lt;strong&gt;&lt;a href="https://github.com/dagger/dagger" rel="noopener noreferrer"&gt;https://github.com/dagger/dagger&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Dagger Discord community: &lt;strong&gt;&lt;a href="https://discord.gg/dagger-io" rel="noopener noreferrer"&gt;https://discord.gg/dagger-io&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This blogpost was written by the team of &lt;a href="https://dyrectorio.com" rel="noopener noreferrer"&gt;dyrector.io&lt;/a&gt;. dyrector.io is an open-source continuous delivery &amp;amp; deployment platform with version management.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Support us with a star on &lt;a href="https://github.com/dyrector-io/dyrectorio/" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>dagger</category>
      <category>cicd</category>
      <category>containers</category>
      <category>docker</category>
    </item>
  </channel>
</rss>
