<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: DevOps Start</title>
    <description>The latest articles on DEV Community by DevOps Start (@devopsstart).</description>
    <link>https://dev.to/devopsstart</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3862044%2F9672d1b5-f8fd-4473-998f-30a47c07608f.png</url>
      <title>DEV Community: DevOps Start</title>
      <link>https://dev.to/devopsstart</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/devopsstart"/>
    <language>en</language>
    <item>
      <title>Mitigate CVE-2026-20245: Cisco SD-WAN Manager Exploit Guide</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Fri, 26 Jun 2026 13:02:06 +0000</pubDate>
      <link>https://dev.to/devopsstart/mitigate-cve-2026-20245-cisco-sd-wan-manager-exploit-guide-18h2</link>
      <guid>https://dev.to/devopsstart/mitigate-cve-2026-20245-cisco-sd-wan-manager-exploit-guide-18h2</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally published on devopsstart.com. It provides a step-by-step guide to detect, block, and patch the critical CVE-2026-20245 remote code execution vulnerability in Cisco SD-WAN Manager.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A single unauthenticated API request can give an attacker full control over your Cisco SD-WAN Manager (vManage). If you are running a vulnerable version, you need to act within hours, not days. This guide explains the CVE-2026-20245 remote code execution bug, shows how to detect it, gives a no-patch workaround, and walks through the official fix. You will leave with a repeatable checklist that secures your SD-WAN control plane.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem
&lt;/h2&gt;

&lt;p&gt;CVE-2026-20245 is a critical (CVSS 9.8) remote code execution vulnerability in Cisco SD-WAN Manager. The flaw exists in the REST API endpoints that handle device registration and configuration push. An unauthenticated attacker can send specially crafted HTTP requests to the vManage server and execute arbitrary commands with root privileges. This means full device compromise: the attacker can pivot to managed routers, steal configuration secrets, and disrupt WAN traffic. The exploit does not require any prior access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Root Causes
&lt;/h2&gt;

&lt;p&gt;Three conditions make this exploit possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unpatched software version.&lt;/strong&gt; Affected releases include Cisco SD-WAN Manager versions 20.3.x before 20.3.4, 20.6.x before 20.6.2, and 20.9.x before 20.9.1. If you are running one of these, the vulnerable code path is present by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Management interface exposed to untrusted networks.&lt;/strong&gt; The API endpoint &lt;code&gt;/dataservice/device/config&lt;/code&gt; is reachable on TCP port 443. Many organisations place vManage on a flat management network or expose the web UI to the internet for remote access. A misconfigured firewall or lack of ACLs lets attackers reach the vulnerable service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing authentication enforcement for certain API calls.&lt;/strong&gt; Some API routes were not properly guarded. The &lt;code&gt;/dataservice/device/config&lt;/code&gt; endpoint expects a token, but prior to the patch the token validation was skipped for requests that included a specific Content-Type header. A simple &lt;code&gt;curl&lt;/code&gt; command bypasses authentication entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;Follow these steps to mitigate and then patch the vulnerability. The workaround should be applied immediately; schedule the patch window as soon as possible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1 – Immediate Workaround (Block the Vulnerable Endpoint)
&lt;/h3&gt;

&lt;p&gt;Do not rely on the vManage's own CLI (it does not run a standard IOS configuration system). Instead, implement a firewall rule on the network layer. On the firewall or router that sits in front of vManage, add an access control entry that drops all traffic to the vManage IP on port 443 from untrusted sources. For more precise blocking, configure deep packet inspection to drop HTTP requests with a URI containing &lt;code&gt;/dataservice/device/config&lt;/code&gt;. If your firewall does not support L7 inspection, an iptables rule on the vManage Linux host can do the job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ssh admin@vmanage-ip
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;iptables &lt;span class="nt"&gt;-A&lt;/span&gt; INPUT &lt;span class="nt"&gt;-p&lt;/span&gt; tcp &lt;span class="nt"&gt;--dport&lt;/span&gt; 443 &lt;span class="nt"&gt;-m&lt;/span&gt; string &lt;span class="nt"&gt;--string&lt;/span&gt; &lt;span class="s2"&gt;"/dataservice/device/config"&lt;/span&gt; &lt;span class="nt"&gt;--algo&lt;/span&gt; bm &lt;span class="nt"&gt;-j&lt;/span&gt; DROP
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;iptables-save &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /etc/iptables/rules.v4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; This iptables rule blocks only requests targeting that specific path. All other API calls and web GUI access remain functional. Verify the rule is active: &lt;code&gt;sudo iptables -L INPUT -v -n | grep DROP&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you cannot apply network-layer filtering immediately, disable all REST API services via the vManage GUI: &lt;em&gt;Administration &amp;gt; Settings &amp;gt; REST API &amp;gt; Disable&lt;/em&gt;. This breaks automation workflows, but it is safer than leaving the endpoint open.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 – Apply the Official Patch
&lt;/h3&gt;

&lt;p&gt;Download the patched version from the &lt;a href="https://software.cisco.com/download/home/286310553" rel="noopener noreferrer"&gt;Cisco Software Download Center&lt;/a&gt;. The recommended versions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;20.3.4 (or later)&lt;/li&gt;
&lt;li&gt;20.6.2 (or later)&lt;/li&gt;
&lt;li&gt;20.9.1 (or later)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Procedure:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Backup the current configuration using the GUI: &lt;em&gt;Administration &amp;gt; Maintenance &amp;gt; Backup/Restore&lt;/em&gt;. Download a full backup.&lt;/li&gt;
&lt;li&gt;Upgrade the SD-WAN Manager first, then the controllers (vBond, vSmart). Use the GUI &lt;em&gt;Administration &amp;gt; Software Upgrade&lt;/em&gt; or the CLI:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;request software &lt;span class="nb"&gt;install&lt;/span&gt; &amp;lt;filename&amp;gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;request software activate
&lt;span class="nv"&gt;$ &lt;/span&gt;reload
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;After reboot, verify the version:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;show version
Cisco SD-WAN Manager 20.9.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3 – Post-Mitigation Verification
&lt;/h3&gt;

&lt;p&gt;Test that the exploit no longer works. From a separate host, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;curl &lt;span class="nt"&gt;-k&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"https://&amp;lt;vmanage-ip&amp;gt;/dataservice/device/config"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/xml"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;config&amp;gt;&amp;lt;device-ip&amp;gt;10.0.0.1&amp;lt;/device-ip&amp;gt;&amp;lt;/config&amp;gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the patch is applied, you get a &lt;code&gt;403 Forbidden&lt;/code&gt; or &lt;code&gt;401 Unauthorized&lt;/code&gt; response. A successful &lt;code&gt;200&lt;/code&gt; response means the endpoint is still accessible – double-check your firewall and patch version.&lt;/p&gt;

&lt;p&gt;Also monitor syslog for any unusual connection attempts. If you used the iptables rule, check its counters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;iptables &lt;span class="nt"&gt;-L&lt;/span&gt; INPUT &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"dataservice"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A non-zero packet count means the vulnerable endpoint was targeted (possibly a live attack).&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention
&lt;/h2&gt;

&lt;p&gt;Long term, treat the SD-WAN Manager as a bastion host.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Segment the management network.&lt;/strong&gt; Place vManage behind a firewall that only allows access from trusted jump boxes. Use Cisco TrustSec or 802.1X for additional network access control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enforce RBAC.&lt;/strong&gt; Ensure only read-only API tokens are used where possible. Assign the &lt;code&gt;netadmin&lt;/code&gt; role sparingly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enable API rate limiting.&lt;/strong&gt; In the vManage GUI, go to &lt;em&gt;Administration &amp;gt; Settings &amp;gt; API Rate Limiting&lt;/em&gt; and set a low limit (for example, 100 requests per minute).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integrate with a SIEM.&lt;/strong&gt; Forward vManage logs to your SIEM and alert on any &lt;code&gt;/dataservice/device/config&lt;/code&gt; requests. Add a log-action to your firewall rule to capture hits.&lt;/p&gt;

&lt;p&gt;For a broader security posture, review our CVE remediation guides, such as &lt;a href="https://dev.to/troubleshooting/how-to-fix-cve-2026-43284-preventing-dirty-frag-pod-escapes"&gt;How to Fix CVE-2026-43284: Preventing Dirty Frag Pod Escapes&lt;/a&gt; and &lt;a href="https://dev.to/troubleshooting/how-to-mitigate-copy-fail-cve-2026-31431-with-seccomp"&gt;How to Mitigate Copy Fail (CVE-2026-31431) with Seccomp&lt;/a&gt;. These articles apply similar workaround-first, patch-second patterns to infrastructure vulnerabilities.&lt;/p&gt;

&lt;p&gt;Act now. A remote root shell in your SD-WAN control plane is one curl request away.&lt;/p&gt;

</description>
      <category>cve202620245</category>
      <category>ciscosdwanmanagerrce</category>
      <category>remotecodeexecution</category>
      <category>mitigationguide</category>
    </item>
    <item>
      <title>Mastering DevOps Fundamentals: A Practical Guide</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Fri, 26 Jun 2026 08:32:34 +0000</pubDate>
      <link>https://dev.to/devopsstart/mastering-devops-fundamentals-a-practical-guide-1kfn</link>
      <guid>https://dev.to/devopsstart/mastering-devops-fundamentals-a-practical-guide-1kfn</guid>
      <description>&lt;p&gt;True DevOps mastery isn't about adopting specific tools or methodologies in isolation, but about deeply integrating cultural shifts with practical, often low-tech, engineering discipline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this take
&lt;/h2&gt;

&lt;p&gt;I've seen too many organizations spend fortunes on tools, proclaiming they're "doing DevOps" because they bought a CI/CD platform or deployed Kubernetes, only to find their release cycles are still glacial, incidents are frequent, and teams are burnt out. The mistake is believing that technology alone can fix systemic issues rooted in process, communication, and organizational structure.&lt;/p&gt;

&lt;p&gt;Consider a team I worked with that adopted a high-end CI/CD pipeline. They integrated static analysis, unit tests, and automated deployments to Kubernetes v1.28. Sounds great, right? The problem was that their cultural "inner loop" was broken. Developers would push code, the pipeline would fail due to flaky tests or environment mismatches, and instead of fixing the root cause, they'd spend hours manually re-running builds or merging stale branches. The tools allowed them to fail faster and more often, but didn't address the lack of shared ownership over the pipeline's health or the inconsistent local development environments. They were doing "continuous integration" in name, but not in spirit. This led to a significant increase in lead time for changes, the exact opposite of the DevOps promise.&lt;/p&gt;

&lt;p&gt;Another common anti-pattern is the "Kubernetes or bust" mentality. A growing startup I advised decided to migrate all their services to Kubernetes v1.29 to "be cloud-native" and "do DevOps." They skipped crucial steps like improving their logging, monitoring, and incident response for simpler, monolithic applications. The result? They suddenly had a massively complex distributed system with no effective way to observe its behavior or debug issues. When a pod started crashing (a classic &lt;code&gt;CrashLoopBackOff&lt;/code&gt; scenario, often due to misconfigured liveness probes), developers were lost. They lacked the fundamental observability practices that would have been valuable even with a simple virtual machine deployment. The advanced tooling amplified their lack of basic operational discipline, turning minor issues into major outages. If you're struggling to debug a monolith, a microservice running on Kubernetes will only make things worse. You can get a sense of how complex these issues can be by checking out guides like &lt;a href="https://dev.to/troubleshooting/crashloopbackoff-kubernetes"&gt;Fix CrashLoopBackOff in Kubernetes Pods&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Finally, many teams declare they're "shifting left" security by integrating vulnerability scanners into their CI/CD pipeline. This is a good start. However, if they don't simultaneously implement basic supply chain security practices, enforce least privilege, or establish a robust incident response plan, they're only patching a symptom. I've seen teams with pristine SAST reports fall victim to dependency confusion attacks or secret leaks because their fundamental security hygiene was lacking. Shifting left is only effective when paired with a strong security posture across the entire lifecycle, including runtime protection and a clear understanding of the threat model. The most sophisticated security tools won't help if your team uses weak passwords or stores secrets in plaintext in Git.&lt;/p&gt;

&lt;h2&gt;
  
  
  The strongest counter-argument
&lt;/h2&gt;

&lt;p&gt;The strongest counter-argument is that while cultural shifts are paramount, tools are not merely optional accessories; they are fundamental enablers without which those cultural aspirations often remain just that: aspirations. Without robust tooling, the "culture of automation" becomes a pipe dream, drowned out by manual toil and inconsistent processes. It's difficult to argue for a blameless culture around incident response if every deployment requires 20 manual steps and a prayer.&lt;/p&gt;

&lt;p&gt;Tools provide the scaffolding for DevOps practices. Git v2.43, for example, isn't just a version control system; it's the bedrock of collaborative development, enabling practices like pull requests, code reviews, and reproducible builds. Without a solid version control system, concepts like Infrastructure as Code (IaC) or GitOps are simply impossible. Imagine trying to manage Terraform v1.7.0 configuration files or Kubernetes manifests without Git history, branching, or merge capabilities. You'd quickly descend into "config drift" hell and a manual, error-prone deployment process. The &lt;a href="https://developer.hashicorp.com/terraform/language/state" rel="noopener noreferrer"&gt;official Terraform documentation on state management&lt;/a&gt; makes it clear that a robust backend with locking, like Amazon S3 with DynamoDB, is crucial, and that's just one piece of the tooling puzzle for reliability.&lt;/p&gt;

&lt;p&gt;Similarly, CI/CD platforms like GitHub Actions v3.0, GitLab CI v16.8, or Jenkins v2.426 don't just "do" automation; they enforce it. They standardize the build, test, and deployment process, making it repeatable and auditable. This consistency is what allows teams to gain confidence in frequent releases, which in turn fosters a culture of small, incremental changes, easier debugging, and quicker feedback loops. It's not the &lt;em&gt;presence&lt;/em&gt; of the tool, but its &lt;em&gt;effective application&lt;/em&gt; that makes the difference. If you can automate a deployment process from commit to production in under 10 minutes with high confidence, that speed and reliability fundamentally change how teams approach their work. It shifts their focus from "will it deploy?" to "is the feature valuable?"&lt;/p&gt;

&lt;p&gt;Moreover, modern infrastructure complexity practically &lt;em&gt;demands&lt;/em&gt; sophisticated tooling. Managing hundreds of microservices, thousands of containers, and petabytes of data without tools for observability (Prometheus v2.48, Grafana v10.3), centralized logging (Elastic Stack v8.11), and automated incident response (PagerDuty) is simply unfeasible. These tools collect the data and provide the insights necessary for a team to truly understand their systems, respond effectively to failures, and iterate on improvements. The tools are not just "nice-to-haves"; they are the very mechanisms through which a modern DevOps team gains visibility, control, and ultimately, reliability at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exceptions where a tool-first approach still wins
&lt;/h2&gt;

&lt;p&gt;There are specific scenarios where a judicious, tool-first approach can indeed catalyze cultural change and deliver immediate, tangible benefits, even before every cultural nuance of "DevOps" has permeated an organization. These are typically situations where a clear, pressing technical need can be met by a well-understood tool, subsequently driving process improvements.&lt;/p&gt;

&lt;p&gt;One such scenario is when a team is struggling with inconsistent infrastructure deployments. Introducing Infrastructure as Code (IaC) with a tool like Terraform v1.7.0, even with minimal initial buy-in beyond the core infrastructure team, can be a game-changer. By centralizing infrastructure definitions in Git, it immediately enforces a single source of truth, makes changes auditable, and enables repeatable deployments. Developers who previously waited days for manual infrastructure provisioning suddenly get resources in minutes via self-service pipelines. This tangible benefit often kickstarts broader adoption of version control for everything, peer review processes, and a shared understanding of infrastructure, organically fostering a "you build it, you run it" mentality. Using tools like Terraform with established best practices can dramatically improve the security posture of your infrastructure, as discussed in &lt;a href="https://dev.to/blog/secure-terraform-prs-with-an-architecture-firewall"&gt;Secure Terraform PRs with an Architecture Firewall&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Another strong case is in highly regulated industries or environments with strict compliance requirements. Here, adopting specific security, audit, or compliance tools isn't just a best practice; it's often a legal mandate. For example, implementing a robust software supply chain security solution that scans dependencies (for example, with tools like Trivy v0.49 or Snyk v1.1270) and enforces policies throughout the CI/CD pipeline. The tool dictates a new, more secure process, forcing developers to address vulnerabilities earlier and providing auditors with clear evidence of compliance. While the ideal is cultural adoption, the immediate need for compliance can drive the integration of specific tools which then educate and influence behavior. These tools act as a guardrail, preventing non-compliant actions and establishing a baseline for security that might otherwise be overlooked. This proactive stance is critical for avoiding security incidents and can be bolstered by advanced strategies discussed in &lt;a href="https://dev.to/blog/supply-chain-security-proxy-move-beyond-vulnerability-scanni"&gt;Supply Chain Security Proxy: Move Beyond Vulnerability Scann&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Finally, in rapidly scaling organizations, specialized tools become indispensable for managing complexity, even if the cultural maturity lags. When you move from a handful of services to hundreds, or from a few dozen users to millions, tools for advanced observability (like a comprehensive ELK stack for logs, Prometheus v2.48 for metrics, and Jaeger for tracing), performance testing, and cloud cost management (FinOps tools) are not optional. They provide the necessary visibility and control to prevent total collapse. While a fully mature DevOps culture would use these tools to drive continuous improvement, their initial implementation often serves as a survival mechanism, providing the data necessary to react to growth challenges. The insights gained from these tools can then be used to evangelize and drive the cultural changes required for long-term sustainability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Fundamentals You Should Prioritize
&lt;/h2&gt;

&lt;p&gt;If you're looking to truly master DevOps fundamentals, shift your focus from chasing the latest shiny tool to building a solid foundation of engineering discipline and fostering the right cultural habits. Here's where to put your energy:&lt;/p&gt;

&lt;h3&gt;
  
  
  Version Control Everything
&lt;/h3&gt;

&lt;p&gt;This isn't just about your application code. Your infrastructure configurations, Kubernetes manifests, documentation, database schemas, and even your &lt;code&gt;.env&lt;/code&gt; files (sanitized, of course) belong in Git. This provides an auditable history, enables collaboration, and is the absolute bedrock for automation.&lt;/p&gt;

&lt;p&gt;Here's a simple example of cloning a repository and checking its status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;git clone https://github.com/your-org/your-repo.git
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-repo
&lt;span class="nv"&gt;$ &lt;/span&gt;git status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;On branch main
Your branch is up to &lt;span class="nb"&gt;date &lt;/span&gt;with &lt;span class="s1"&gt;'origin/main'&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt;

nothing to commit, working tree clean
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Automate Relentlessly (But Smartly)
&lt;/h3&gt;

&lt;p&gt;CI/CD isn't a destination; it's a continuous process of reducing manual effort and improving feedback loops. Start with the most repetitive, error-prone tasks. Build, test, scan, and deploy automatically. But don't just automate bad processes; fix the processes first, then automate them.&lt;/p&gt;

&lt;p&gt;A basic GitHub Actions workflow for a Node.js application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Node.js CI&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;main&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;main&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Use Node.js 20.x&lt;/span&gt;
      &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
      &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;20.x'&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install dependencies&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run tests&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This workflow ensures that every push and pull request to &lt;code&gt;main&lt;/code&gt; branch automatically runs tests, giving immediate feedback.&lt;/p&gt;

&lt;h3&gt;
  
  
  Treat Infrastructure as Code (IaC)
&lt;/h3&gt;

&lt;p&gt;Stop clicking in the console. Define your infrastructure (servers, networks, databases, Kubernetes clusters) in code using tools like Terraform v1.7.0 or Pulumi v3.100. This makes your infrastructure reproducible, versionable, and testable. It's the only way to scale infrastructure reliably.&lt;/p&gt;

&lt;p&gt;Here's a simple &lt;code&gt;main.tf&lt;/code&gt; file using Terraform v1.7.0 to define an AWS S3 bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.tf for Terraform v1.7.0&lt;/span&gt;

&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;required_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/aws"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 5.0"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"aws"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_s3_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"example_bucket"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-unique-application-logs-bucket-12345"&lt;/span&gt; &lt;span class="c1"&gt;# Must be globally unique&lt;/span&gt;
  &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Dev"&lt;/span&gt;
    &lt;span class="nx"&gt;Project&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"MyApp"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"bucket_name"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_s3_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;example_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bucket&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"The name of the S3 bucket"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To apply this, you would run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;terraform init
&lt;span class="nv"&gt;$ &lt;/span&gt;terraform plan
&lt;span class="nv"&gt;$ &lt;/span&gt;terraform apply &lt;span class="nt"&gt;--auto-approve&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output of &lt;code&gt;terraform apply&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Terraform will perform the following actions:

  &lt;span class="c"&gt;# aws_s3_bucket.example_bucket will be created&lt;/span&gt;
  + resource &lt;span class="s2"&gt;"aws_s3_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"example_bucket"&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      + accel_transfer_enabled &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;
      + acl                    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + arn                    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + bucket                 &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-unique-application-logs-bucket-12345"&lt;/span&gt;
      + bucket_domain_name     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + bucket_prefix          &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + bucket_regional_domain_name &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + force_destroy          &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;
      + &lt;span class="nb"&gt;id&lt;/span&gt;                     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + object_lock_enabled    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;
      + policy                 &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + region                 &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + request_payer          &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + tags                   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
          + &lt;span class="s2"&gt;"Environment"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Dev"&lt;/span&gt;
          + &lt;span class="s2"&gt;"Project"&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"MyApp"&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
      + tags_all               &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
          + &lt;span class="s2"&gt;"Environment"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Dev"&lt;/span&gt;
          + &lt;span class="s2"&gt;"Project"&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"MyApp"&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
      + website_domain         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
      + website_endpoint       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;known after apply&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

Plan: 1 to add, 0 to change, 0 to destroy.
aws_s3_bucket.example_bucket: Creating...
aws_s3_bucket.example_bucket: Creation &lt;span class="nb"&gt;complete &lt;/span&gt;after 1s &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-unique-application-logs-bucket-12345]

Apply &lt;span class="nb"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; Resources: 1 added, 0 changed, 0 destroyed.

Outputs:

bucket_name &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-unique-application-logs-bucket-12345"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Remember to use a globally unique bucket name.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitor and Observe Deeply
&lt;/h3&gt;

&lt;p&gt;You can't fix what you can't see. Instrument your applications and infrastructure to collect metrics, logs, and traces. Use tools like Prometheus v2.48 for metrics, Grafana v10.3 for dashboards, and a robust logging solution (for example, Loki v2.9 or Elastic Stack v8.11) to understand system behavior. Proactive monitoring helps you detect issues before your users do. Don't just watch CPU and memory; understand business metrics and application health indicators. For example, monitor your API's 99th percentile latency and error rates, not just if the process is running.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embrace Feedback Loops and a Blameless Culture
&lt;/h3&gt;

&lt;p&gt;DevOps is fundamentally about continuous improvement. After an incident, conduct blameless post-mortems focusing on system and process failures, not individual blame. Learn from mistakes, implement corrective actions, and share knowledge. Encourage frequent, open communication between development and operations teams. This fosters trust and a shared sense of responsibility. If something breaks in production, the question should always be "What can we do to prevent this type of failure again?" not "Whose fault was this?"&lt;/p&gt;

&lt;h3&gt;
  
  
  Shift Security Left, but Don't Forget Right
&lt;/h3&gt;

&lt;p&gt;Integrate security into every stage of your development pipeline. This means security training for developers, static and dynamic analysis in CI/CD, dependency scanning, and threat modeling. However, don't stop there. Implement runtime security measures, network segmentation, robust access controls, and a solid incident response plan. A comprehensive approach covers the entire lifecycle, from design to production. Focusing solely on "shift left" without considering runtime protection is like locking your front door but leaving your back door wide open.&lt;/p&gt;

&lt;p&gt;Mastering DevOps fundamentals means internalizing these core principles. It's about changing how teams work together, improving their engineering practices, and relentlessly seeking efficiency and reliability. The tools are there to support these efforts, but they are not the goal in themselves. Without the underlying cultural and practical discipline, even the most sophisticated tools will only lead to more complex problems. Start simple, focus on the fundamentals, and let your problems drive your tool choices, rather than letting tools dictate your approach.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>cicd</category>
      <category>automation</category>
      <category>bestpractices</category>
    </item>
    <item>
      <title>LLM Prompt Caching with Git to Cut API Costs</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Thu, 25 Jun 2026 08:30:04 +0000</pubDate>
      <link>https://dev.to/devopsstart/llm-prompt-caching-with-git-to-cut-api-costs-121h</link>
      <guid>https://dev.to/devopsstart/llm-prompt-caching-with-git-to-cut-api-costs-121h</guid>
      <description>&lt;p&gt;If your CI/CD pipelines call LLM APIs like OpenAI's GPT-4, you've probably noticed the token costs. Automated systems that generate documentation or review code often run the same prompts repeatedly, leading to high bills. You can reduce these costs significantly by implementing a simple prompt cache using a tool you already have: Git.&lt;/p&gt;

&lt;p&gt;This article explains how to use a dedicated Git repository as a database-free key-value cache for LLM prompts and responses. Before calling an expensive API, your script checks a local Git clone for a cached answer. If found, it uses the saved response, avoiding the API call entirely. This method can cut costs by over 50% in CI/CD environments where prompts are frequently repeated.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Git-Based Caching Works
&lt;/h2&gt;

&lt;p&gt;The approach treats a Git repository as a key-value store. You simply create a new, dedicated repository to act as the cache.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Key:&lt;/strong&gt; A SHA256 hash of the prompt's content. Hashing ensures that even a one-character difference creates a unique key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Value:&lt;/strong&gt; The LLM's response, stored as a plain text file. The filename is the key, for example, &lt;code&gt;5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When your script needs an LLM response, it first calculates the prompt's hash. It then checks if a file with that name exists in its local clone of the cache repository. If it does, that's a cache hit. If not, it's a miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Caching Workflow in Action
&lt;/h2&gt;

&lt;p&gt;The logic for your application or CI script follows a "check-miss-write" pattern.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Clone/Pull:&lt;/strong&gt; Before running, ensure your script has an up-to-date local clone of the cache repository. A quick &lt;code&gt;$ git pull&lt;/code&gt; is all you need.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate Hash:&lt;/strong&gt; Take the full prompt string and generate its SHA256 hash. This becomes your cache key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check for Key:&lt;/strong&gt; Look for a file named after the hash in the local cache repository.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Handle the Result:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cache Hit:&lt;/strong&gt; If the file exists, read its contents. This is your LLM response. No API call is made.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache Miss:&lt;/strong&gt; If the file does not exist, call the actual LLM API to get a new response.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Write to Cache:&lt;/strong&gt; On a cache miss, save the new response to a file named after the prompt's hash. Then, commit and push this new file to the remote cache repository.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Example of a cache repository's structure&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-1&lt;/span&gt; llm-cache/
0a3b...
1c5d...
5e88...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This workflow ensures that the next time the same prompt is encountered by any user or pipeline with access to the repo, it will be a cache hit. This is particularly effective in CI pipelines that &lt;a href="https://dev.to/tutorials/how-to-build-ai-agents-for-kubernetes-deployments"&gt;build AI agents for Kubernetes deployments&lt;/a&gt;, where environment setup prompts are often identical across runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Python Implementation Example
&lt;/h2&gt;

&lt;p&gt;Here is a simple Python function that implements this caching logic. It uses the standard &lt;code&gt;hashlib&lt;/code&gt; and &lt;code&gt;os&lt;/code&gt; libraries. You can consult the official &lt;a href="https://docs.python.org/3/library/hashlib.html" rel="noopener noreferrer"&gt;Python hashlib documentation&lt;/a&gt; for more details on hashing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;

&lt;span class="c1"&gt;# --- Configuration ---
# IMPORTANT: Update this to the absolute path of your cache repository clone.
&lt;/span&gt;&lt;span class="n"&gt;CACHE_DIR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/path/to/your/local/llm-cache-repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_llm_response_with_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm_api_call_func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Gets an LLM response, using a Git-based file cache to avoid duplicate API calls.

    Args:
        prompt: The full prompt string to send to the LLM.
        llm_api_call_func: A function that takes a prompt string and returns the API response.

    Returns:
        The LLM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s response, either from the cache or a new API call.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Ensure the cache is up-to-date
&lt;/span&gt;    &lt;span class="c1"&gt;# A production implementation should include robust error handling for Git commands.
&lt;/span&gt;    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;git&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pull&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CACHE_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Generate the cache key
&lt;/span&gt;    &lt;span class="n"&gt;prompt_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cache_file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CACHE_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_hash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Check for a cache hit
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_file_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CACHE HIT: Found response for hash &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt_hash&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# 4. Handle a cache miss
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CACHE MISS: Calling API for hash &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt_hash&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;llm_api_call_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 5. Write the new response to the cache and push
&lt;/span&gt;    &lt;span class="c1"&gt;# Note: The prompt and response are stored in plain text. Do not use this method for sensitive data.
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Adding new response to cache...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;git&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_file_path&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CACHE_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;git&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;commit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add cache for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt_hash&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CACHE_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;git&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;push&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;CACHE_DIR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="c1"&gt;# --- Example Usage ---
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fake_openai_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Replace this with your actual client.chat.completions.create() call
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--- Faking expensive API call ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is the LLM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s answer to the prompt starting with: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;my_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate a Kubernetes Deployment YAML for a Python Flask app named &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;my-app&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; listening on port 5000.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# First call (will be a miss)
&lt;/span&gt;    &lt;span class="n"&gt;response1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_llm_response_with_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fake_openai_call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Response 1:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Second call (will be a hit)
&lt;/span&gt;    &lt;span class="n"&gt;response2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_llm_response_with_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fake_openai_call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Response 2:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Benefits of This Approach
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost Reduction:&lt;/strong&gt; Avoids expensive API calls for repeated prompts. With GPT-4 Turbo input prices around $10 per million tokens, caching just a few hundred complex prompts in CI can lead to substantial savings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No New Infrastructure:&lt;/strong&gt; It uses your existing Git provider, so there is no need to set up or maintain a separate caching service like Redis or Memcached.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit Trail:&lt;/strong&gt; The Git history provides a complete, version-controlled log of every unique prompt and its corresponding LLM response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster Execution on Hits:&lt;/strong&gt; Reading a local file takes milliseconds, while a network API call can take several seconds. This speeds up CI/CD jobs that get a cache hit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Limitations and Considerations
&lt;/h2&gt;

&lt;p&gt;This method is pragmatic but has trade-offs compared to a dedicated caching system.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Manual Cache Invalidation:&lt;/strong&gt; To get a fresh response for a cached prompt, you must manually delete the file from the repository (&lt;code&gt;git rm &amp;lt;hash&amp;gt;&lt;/code&gt;, commit and push). There is no built-in time-to-live (TTL) mechanism.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repository Size:&lt;/strong&gt; The cache repository will grow indefinitely. While text-based responses are small, this method is unsuitable for caching large files like images or audio. Regular maintenance may be needed to prune old entries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency and Race Conditions:&lt;/strong&gt; If two CI jobs miss on the &lt;em&gt;same prompt&lt;/em&gt; simultaneously, they will both call the LLM API. They will then race to commit and push the new file. One &lt;code&gt;git push&lt;/code&gt; will fail. The failing script needs retry logic (for example, &lt;code&gt;git pull&lt;/code&gt; and check again), or you will waste an API call.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This Git-based caching technique is most effective in environments with high prompt repetition, such as CI/CD pipelines for code analysis, documentation generation, or testing. For applications requiring high-throughput, atomic operations, or automatic cache eviction, a dedicated solution like Redis is more appropriate. For many teams, however, this simple approach provides a significant benefit for minimal effort.&lt;/p&gt;

</description>
      <category>llmpromptcaching</category>
      <category>llmcostoptimization</category>
      <category>gitkeyvaluestore</category>
      <category>llmops</category>
    </item>
    <item>
      <title>How to Detect and Prevent Malicious AI Agent Skills</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Wed, 24 Jun 2026 08:30:44 +0000</pubDate>
      <link>https://dev.to/devopsstart/how-to-detect-and-prevent-malicious-ai-agent-skills-3ea0</link>
      <guid>https://dev.to/devopsstart/how-to-detect-and-prevent-malicious-ai-agent-skills-3ea0</guid>
      <description>&lt;p&gt;Malicious AI agent skills are tool or server dependencies (often via Model Context Protocol or MCP) that present a benign interface to the LLM but execute unauthorized actions on the host system. You detect these by monitoring for unauthorized tool calls, unexpected egress traffic or attempts to access sensitive system files like &lt;code&gt;/etc/shadow&lt;/code&gt;. Prevention requires a zero-trust architecture where skills are isolated in sandboxes with strict network egress filters and human-in-the-loop approvals for destructive actions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the error means
&lt;/h2&gt;

&lt;p&gt;When you integrate "skills" into an AI agent, you add executable dependencies to your runtime. A malicious skill acts as a Trojan horse, allowing an agent to perform "silent failures" or "unauthorized escalation."&lt;/p&gt;

&lt;p&gt;Unlike a traditional software crash, the error here is a security breach. You might see an agent unexpectedly exfiltrating &lt;code&gt;.env&lt;/code&gt; files, executing &lt;code&gt;rm -rf /&lt;/code&gt;, or sending internal system prompts to an external API. If your agent calls tools you didn't authorize or accesses paths outside its designated workspace, your skill supply chain is compromised.&lt;/p&gt;

&lt;h2&gt;
  
  
  Root Causes
&lt;/h2&gt;

&lt;p&gt;The vulnerability comes from treating AI tools as static configuration rather than executable code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implicit Trust in the Supply Chain&lt;/strong&gt;&lt;br&gt;
Many developers install MCP servers from community repositories without auditing the source. Similar to a malicious npm package, an MCP server can contain obfuscated code that triggers only under specific prompt conditions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Indirect Prompt Injection&lt;/strong&gt;&lt;br&gt;
An agent may read a malicious file (for example, a README.md in a repo) containing hidden instructions. These instructions trick the LLM into using a legitimate skill, such as a shell executor, to perform a malicious action that bypasses the user's intent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Over-Privileged Environments&lt;/strong&gt;&lt;br&gt;
Running AI agents with the same permissions as the local user is a critical failure. If the agent has root access or full SSH key access, one compromised skill can compromise the entire workstation or cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lack of Egress Control&lt;/strong&gt;&lt;br&gt;
Most agent runtimes allow unrestricted outbound HTTP requests. This allows malicious skills to "phone home" with stolen secrets or API keys.&lt;/p&gt;
&lt;h2&gt;
  
  
  Detection and Neutralization
&lt;/h2&gt;

&lt;p&gt;To stop malicious skills, implement layered defense focusing on isolation and auditing. For those building custom agents, refer to the &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;Model Context Protocol documentation&lt;/a&gt; to understand standard communication patterns.&lt;/p&gt;
&lt;h3&gt;
  
  
  Static Audit of MCP Servers
&lt;/h3&gt;

&lt;p&gt;Before adding a server to your &lt;code&gt;claude_desktop_config.json&lt;/code&gt; or agent config, audit the entry point. Search for &lt;code&gt;curl&lt;/code&gt;, &lt;code&gt;wget&lt;/code&gt;, or &lt;code&gt;eval&lt;/code&gt; calls that fetch remote scripts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Search for suspicious remote execution patterns in a local MCP server directory&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-rE&lt;/span&gt; &lt;span class="s2"&gt;"curl|wget|eval|exec|base64"&lt;/span&gt; ./mcp-servers/suspicious-tool/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Implement a Restricted Runtime
&lt;/h3&gt;

&lt;p&gt;Never run agent skills directly on your host. Use a containerized environment with limited resources. For those managing agents on a cluster, integrate &lt;a href="https://dev.to/tutorials/llm-observability-on-kubernetes-a-practical-guide"&gt;LLM Observability on Kubernetes: A Practical Guide&lt;/a&gt; to monitor tool-call latency and volume. I have seen this prevent total host compromise in environments with &amp;gt;10 nodes by trapping the agent in a non-privileged namespace.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run a potentially risky MCP server in a restricted Docker container&lt;/span&gt;
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; mcp-sandbox &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"512m"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cpus&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"0.5"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bridge"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--read-only&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /tmp/agent-data:/data:rw &lt;span class="se"&gt;\&lt;/span&gt;
  mcp-server-image:v1.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Enforce Network Egress Filtering
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;iptables&lt;/code&gt; or a service mesh to block all outbound traffic except to known API endpoints. This reduces the risk of data exfiltration by nearly 100% for basic "phone-home" malware.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Block all outbound traffic by default, allow only specific APIs&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;iptables &lt;span class="nt"&gt;-P&lt;/span&gt; OUTPUT DROP
&lt;span class="nb"&gt;sudo &lt;/span&gt;iptables &lt;span class="nt"&gt;-A&lt;/span&gt; OUTPUT &lt;span class="nt"&gt;-p&lt;/span&gt; tcp &lt;span class="nt"&gt;--dport&lt;/span&gt; 443 &lt;span class="nt"&gt;-d&lt;/span&gt; api.anthropic.com &lt;span class="nt"&gt;-j&lt;/span&gt; ACCEPT
&lt;span class="nb"&gt;sudo &lt;/span&gt;iptables &lt;span class="nt"&gt;-A&lt;/span&gt; OUTPUT &lt;span class="nt"&gt;-p&lt;/span&gt; tcp &lt;span class="nt"&gt;--dport&lt;/span&gt; 443 &lt;span class="nt"&gt;-d&lt;/span&gt; github.com &lt;span class="nt"&gt;-j&lt;/span&gt; ACCEPT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Structured Tool Logging
&lt;/h3&gt;

&lt;p&gt;Configure your agent to log every &lt;code&gt;tool_use&lt;/code&gt; call, including the exact arguments passed and the raw output returned.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Piping agent logs to a file for forensic analysis&lt;/span&gt;
agent-runtime &lt;span class="nt"&gt;--log-level&lt;/span&gt; debug 2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;tee &lt;/span&gt;agent_audit.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Prevention Strategies
&lt;/h2&gt;

&lt;p&gt;Shift from a "trust-by-default" to a "zero-trust" agent architecture to avoid future compromises.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Bill of Materials (AIBOM)&lt;/strong&gt;&lt;br&gt;
Maintain a versioned list of every MCP server and model version used in production. Do not allow "latest" tags; pin to specific git hashes. This prevents "poisoned" updates from automatically entering your environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human-in-the-Loop (HITL)&lt;/strong&gt;&lt;br&gt;
Configure your agent interface to require manual approval for destructive tools, such as &lt;code&gt;delete_file&lt;/code&gt;, &lt;code&gt;execute_shell&lt;/code&gt;, or &lt;code&gt;send_email&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Least Privilege&lt;/strong&gt;&lt;br&gt;
Create a dedicated OS user for the agent with no sudo privileges and restricted directory access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secret Management&lt;/strong&gt;&lt;br&gt;
Use a secret manager instead of environment variables. This prevents skills from simply calling &lt;code&gt;env&lt;/code&gt; to steal your keys, a common tactic seen in &lt;a href="https://dev.to/blog/github-actions-security-how-to-stop-secret-leaks-in-cicd"&gt;GitHub Actions Security: How to Stop Secret Leaks in CI/CD&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps for DevOps Teams
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Audit your current &lt;code&gt;config.json&lt;/code&gt; files for third-party MCP servers.&lt;/li&gt;
&lt;li&gt;Wrap your agent runtime in a Docker container with &lt;code&gt;--read-only&lt;/code&gt; flags.&lt;/li&gt;
&lt;li&gt;Implement an egress allow-list to restrict tool communications.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>aiagentsecurity</category>
      <category>modelcontextprotocol</category>
      <category>devopssecurity</category>
      <category>llmsandboxing</category>
    </item>
    <item>
      <title>Kubernetes Troubleshooting: Why Did My Pod Die?</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Tue, 23 Jun 2026 08:32:37 +0000</pubDate>
      <link>https://dev.to/devopsstart/kubernetes-troubleshooting-why-did-my-pod-die-1m1p</link>
      <guid>https://dev.to/devopsstart/kubernetes-troubleshooting-why-did-my-pod-die-1m1p</guid>
      <description>&lt;p&gt;Pods die because of scheduling failures, startup crashes or runtime terminations. When a Kubernetes pod fails, it rarely tells you exactly why in a single line. Instead, it gives you a status like &lt;code&gt;CrashLoopBackOff&lt;/code&gt; or &lt;code&gt;Pending&lt;/code&gt;, which are symptoms rather than root causes. To fix a pod, you must distinguish between these three failure modes. This guide provides a decision tree for diagnosing pod deaths, moving from high-level status checks to deep-dive container inspection.&lt;/p&gt;

&lt;p&gt;For a comprehensive look at the most common restart cycles, check out the guide on &lt;a href="https://dev.to/troubleshooting/how-to-fix-kubernetes-crashloopbackoff-in-production"&gt;how to fix Kubernetes CrashLoopBackOff in production&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the failure states
&lt;/h2&gt;

&lt;p&gt;A pod "death" is a transition in the Pod Lifecycle. When you see &lt;code&gt;CrashLoopBackOff&lt;/code&gt;, the container started, crashed, and Kubernetes is now waiting an exponentially increasing amount of time before trying to start it again to avoid hammering the node.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;OOMKilled&lt;/code&gt; means the Linux kernel terminated the process because it exceeded its memory limit. I have seen this happen frequently in Java applications where the JVM heap is set higher than the Kubernetes memory limit. &lt;code&gt;ImagePullBackOff&lt;/code&gt; means the kubelet cannot retrieve the container image from the registry. Each of these states points to a different layer of the stack: the infrastructure, the container runtime or the application code itself. Refer to the official &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/" rel="noopener noreferrer"&gt;Kubernetes Pod Lifecycle documentation&lt;/a&gt; for the full state machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common root causes
&lt;/h2&gt;

&lt;p&gt;Pod failures generally fall into three categories.&lt;/p&gt;

&lt;h3&gt;
  
  
  Startup and Configuration Failures
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Missing Environment Variables&lt;/strong&gt;: The application panics immediately because a required database URL or API key is missing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invalid Image Tags&lt;/strong&gt;: A typo in the image version or a deleted tag in the registry leads to &lt;code&gt;ErrImagePull&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Registry Authentication&lt;/strong&gt;: The &lt;code&gt;imagePullSecrets&lt;/code&gt; are missing or the service account lacks permission to pull from a private repository.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Resource and Infrastructure Constraints
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory Limits (OOMKilled)&lt;/strong&gt;: The container attempted to allocate more memory than defined in its &lt;code&gt;limits&lt;/code&gt; section.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU Throttling&lt;/strong&gt;: Extreme throttling can trigger Liveness probe timeouts, causing Kubernetes to kill and restart the pod.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduling Constraints&lt;/strong&gt;: Pods stuck in &lt;code&gt;Pending&lt;/code&gt; usually suffer from "Insufficient cpu" or "Insufficient memory" on all available nodes, or they have &lt;code&gt;nodeSelector&lt;/code&gt; constraints that no node meets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Health Check Failures
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Liveness Probe Misconfiguration&lt;/strong&gt;: The probe checks a &lt;code&gt;/health&lt;/code&gt; endpoint that takes 10 seconds to respond, but the &lt;code&gt;timeoutSeconds&lt;/code&gt; is set to 1. Kubernetes assumes the app is dead and kills it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slow Startup&lt;/strong&gt;: Heavy frameworks like Spring Boot may take 60 seconds to start. If the Liveness probe starts checking after 10 seconds, the pod is killed before it ever becomes ready.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step-by-step recovery process
&lt;/h2&gt;

&lt;p&gt;Follow this decision tree to isolate the root cause.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Identify the Symptom
&lt;/h3&gt;

&lt;p&gt;Start with the high-level status.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get pods
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Observation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If status is &lt;code&gt;Pending&lt;/code&gt; → Go to Step 2.&lt;/li&gt;
&lt;li&gt;If status is &lt;code&gt;CrashLoopBackOff&lt;/code&gt; or &lt;code&gt;Error&lt;/code&gt; → Go to Step 3.&lt;/li&gt;
&lt;li&gt;If status is &lt;code&gt;Running&lt;/code&gt; but the pod keeps restarting → Go to Step 4.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 2: Debugging Pending Pods
&lt;/h3&gt;

&lt;p&gt;If the pod isn't even starting, check the events.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe pod &amp;lt;pod-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look at the &lt;code&gt;Events&lt;/code&gt; section at the bottom. If you see &lt;code&gt;FailedScheduling&lt;/code&gt;, check for taints or resource pressure. If you see &lt;code&gt;FailedMount&lt;/code&gt;, your PVC is likely stuck in another zone or not bound.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Debugging CrashLoops and Errors
&lt;/h3&gt;

&lt;p&gt;If the pod starts and then dies, check the application logs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--previous&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--previous&lt;/code&gt; flag is critical. It allows you to see the logs from the container that just crashed, rather than the logs of the new container currently starting.&lt;/p&gt;

&lt;p&gt;If logs are empty, check the exit code using &lt;code&gt;kubectl describe pod&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exit Code 137&lt;/strong&gt;: This is almost always &lt;code&gt;OOMKilled&lt;/code&gt;. You must increase the memory limits in your manifest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit Code 1&lt;/strong&gt;: Application crash (NullPointerException, missing config, etc.).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 4: Debugging Pods that Restart
&lt;/h3&gt;

&lt;p&gt;If the pod is &lt;code&gt;Running&lt;/code&gt; but the restart count is climbing, the Liveness probe is likely killing it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe pod &amp;lt;pod-name&amp;gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"Liveness probe failed"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implement a &lt;code&gt;startupProbe&lt;/code&gt;. This tells Kubernetes to ignore Liveness and Readiness probes until the container has finished its initial boot sequence. In clusters with slow-starting legacy apps, this reduces unnecessary restart cycles by nearly 100%.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example Startup Probe for slow apps&lt;/span&gt;
&lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/healthz&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
  &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Advanced Inspection
&lt;/h3&gt;

&lt;p&gt;If you cannot get logs and the pod dies too fast to &lt;code&gt;exec&lt;/code&gt; into, use an ephemeral debug container.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl debug &lt;span class="nt"&gt;-it&lt;/span&gt; &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox &lt;span class="nt"&gt;--target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;container-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This attaches a shell to the process namespace of the failing pod without restarting it, which allows you to inspect the filesystem and network state in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to prevent future failures
&lt;/h2&gt;

&lt;p&gt;Preventing pod failure requires a production-ready manifest checklist. Never deploy a pod without these four elements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Explicit Resource Requests and Limits&lt;/strong&gt;: Set &lt;code&gt;requests&lt;/code&gt; to what the app needs to run and &lt;code&gt;limits&lt;/code&gt; to a reasonable ceiling. This prevents a single pod from consuming all node memory and triggering a node-wide Out-of-Memory event.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proper Probe Hierarchy&lt;/strong&gt;: Use &lt;code&gt;startupProbe&lt;/code&gt; for initial boot, &lt;code&gt;livenessProbe&lt;/code&gt; for deadlock detection and &lt;code&gt;readinessProbe&lt;/code&gt; to control traffic flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-Root Users&lt;/strong&gt;: Use &lt;code&gt;securityContext&lt;/code&gt; to ensure the pod does not crash due to permission errors when writing to mounted volumes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graceful Shutdown&lt;/strong&gt;: Handle &lt;code&gt;SIGTERM&lt;/code&gt; in your application code. This allows Kubernetes to drain connections before the 30-second &lt;code&gt;terminationGracePeriodSeconds&lt;/code&gt; expires, avoiding 502 errors during deployments.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>kubernetestroubleshooting</category>
      <category>crashloopbackoff</category>
      <category>kubernetespods</category>
      <category>devopsguide</category>
    </item>
    <item>
      <title>How to Debug OOMKilled Pods in Kubernetes: A Step-by-Step Guide</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Mon, 22 Jun 2026 09:24:18 +0000</pubDate>
      <link>https://dev.to/devopsstart/how-to-debug-oomkilled-pods-in-kubernetes-a-step-by-step-guide-2daj</link>
      <guid>https://dev.to/devopsstart/how-to-debug-oomkilled-pods-in-kubernetes-a-step-by-step-guide-2daj</guid>
      <description>&lt;h2&gt;
  
  
  What Does OOMKilled Actually Mean?
&lt;/h2&gt;

&lt;p&gt;An &lt;code&gt;OOMKilled&lt;/code&gt; status occurs when the Linux kernel's Out-of-Memory (OOM) killer terminates a process to prevent a system-wide crash. In Kubernetes, this is signaled by Exit Code 137. The primary goal of the OOM killer is to reclaim memory to keep the underlying node stable.&lt;/p&gt;

&lt;p&gt;There are two distinct triggers for this event. First, a container may exceed its specified memory &lt;code&gt;limit&lt;/code&gt;, leading the kubelet to kill the process immediately. Second, the entire node may experience memory exhaustion, forcing the kernel to select a "victim" pod based on its Quality of Service (QoS) class. I've seen this happen frequently in clusters where "Burstable" pods are over-provisioned, causing the kernel to kill pods that were technically under their own limits just to save the node. Detailed resource management specifications are available in the &lt;a href="https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/" rel="noopener noreferrer"&gt;official Kubernetes documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why is Your Pod Getting OOMKilled?
&lt;/h2&gt;

&lt;p&gt;Root causes typically fall into three categories: application leaks, misconfigured limits, or node-level pressure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Memory Leaks
&lt;/h3&gt;

&lt;p&gt;This is the most common cause for pods that operate normally for several hours before crashing. A leak occurs when an application allocates memory but fails to release it. In Java, this often happens when objects are stored in static collections that never clear. In Go, common culprits include unclosed goroutines or slices that grow indefinitely. In these cases, memory usage climbs linearly until it hits the hard limit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Improper Resource Limits
&lt;/h3&gt;

&lt;p&gt;Engineers often set &lt;code&gt;limits&lt;/code&gt; too close to &lt;code&gt;requests&lt;/code&gt;. Many applications have a "bursty" startup phase. For example, a Spring Boot application loading numerous beans into memory may spike to 600Mi during initialization. If the &lt;code&gt;limit&lt;/code&gt; is set to 512Mi, the pod will be killed before it ever reaches a healthy state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Node-Level Memory Pressure
&lt;/h3&gt;

&lt;p&gt;When multiple pods are configured with &lt;code&gt;requests&lt;/code&gt; significantly lower than their &lt;code&gt;limits&lt;/code&gt;, they can all attempt to burst simultaneously. If the aggregate usage exceeds the node's physical RAM, the kernel triggers a node-level OOM event. The kernel targets pods with the lowest priority or those consuming the most memory relative to their requested amount.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Diagnose and Fix OOMKilled Pods
&lt;/h2&gt;

&lt;p&gt;Follow this systematic workflow to move from a crashing pod to a verified root cause.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Confirm the Termination Reason
&lt;/h3&gt;

&lt;p&gt;Identify the failing pod and verify the exit code using &lt;code&gt;kubectl&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl get pods
NAME                               READY   STATUS             RESTARTS   AGE
api-gateway-7f8db6d9-abc12          0/1     CrashLoopBackOff   4          12m

&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl describe pod api-gateway-7f8db6d9-abc12
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Locate the &lt;code&gt;Last State&lt;/code&gt; section in the output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Containers:
  api-container:
    State:          Running
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Finished:     Thu, 24 Oct 2024 14:22:01 +0000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An &lt;code&gt;Exit Code: 137&lt;/code&gt; confirms an OOM event. If the pod cycles through crashes without this specific code, refer to our guide on &lt;a href="https://dev.to/troubleshooting/crashloopbackoff-kubernetes"&gt;Fix CrashLoopBackOff in Kubernetes Pods&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Analyze the "Working Set" Metric
&lt;/h3&gt;

&lt;p&gt;Avoid using &lt;code&gt;container_memory_usage_bytes&lt;/code&gt; in Prometheus, as it includes page caches that the kernel can reclaim under pressure. Kubernetes makes OOM decisions based on &lt;code&gt;container_memory_working_set_bytes&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Execute this PromQL query in your Grafana dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sum(container_memory_working_set_bytes{pod="api-gateway-7f8db6d9-abc12"}) by (pod)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A "sawtooth" pattern (steady growth followed by a sharp drop to zero) indicates a memory leak. A flat line with a sudden, vertical spike suggests a resource limit issue or a specific heavy request.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Application Profiling
&lt;/h3&gt;

&lt;p&gt;Increasing limits without profiling only delays the crash. If a leak is suspected, you must analyze the heap.&lt;/p&gt;

&lt;p&gt;For Go applications, integrate &lt;code&gt;pprof&lt;/code&gt;. Add this to your &lt;code&gt;main.go&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="s"&gt;"net/http/pprof"&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c"&gt;// Listen on a separate port to avoid interfering with app traffic&lt;/span&gt;
        &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0.0.0.0:6060"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}()&lt;/span&gt;
    &lt;span class="c"&gt;// your app logic&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Capture a heap profile while the pod is under load:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nb"&gt;exec &lt;/span&gt;api-gateway-7f8db6d9-abc12 &lt;span class="nt"&gt;--&lt;/span&gt; curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:6060/debug/pprof/heap &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; heap.pprof
go tool pprof &lt;span class="nt"&gt;-http&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;:8080 heap.pprof
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Java applications, trigger a heap dump using &lt;code&gt;jcmd&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nb"&gt;exec &lt;/span&gt;api-gateway-7f8db6d9-abc12 &lt;span class="nt"&gt;--&lt;/span&gt; jcmd 1 GC.heap_dump /tmp/heapdump.hprof
kubectl &lt;span class="nb"&gt;cp &lt;/span&gt;api-gateway-7f8db6d9-abc12:/tmp/heapdump.hprof ./heapdump.hprof
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Analyze the &lt;code&gt;.hprof&lt;/code&gt; file in VisualVM or Eclipse MAT to identify the leaking class.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Right-Sizing the Limits
&lt;/h3&gt;

&lt;p&gt;Calculate the new limit based on observed peaks. A production-ready standard is to set the &lt;code&gt;limit&lt;/code&gt; 20% to 30% above the peak &lt;code&gt;working_set_bytes&lt;/code&gt; observed during a full load test.&lt;/p&gt;

&lt;p&gt;Update your deployment manifest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;512Mi"&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;250m"&lt;/span&gt;
  &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1Gi"&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;500m"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply the change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to Prevent OOMKills in the Future
&lt;/h2&gt;

&lt;p&gt;To stop reacting to memory crashes, implement these three architectural strategies.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deploy Vertical Pod Autoscaler (VPA)&lt;/strong&gt;: Use VPA in &lt;code&gt;Recommender&lt;/code&gt; mode. It analyzes historical usage and suggests the ideal requests and limits, reducing the guesswork that leads to OOM events.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enforce Guaranteed QoS&lt;/strong&gt;: For critical workloads, set &lt;code&gt;requests&lt;/code&gt; exactly equal to &lt;code&gt;limits&lt;/code&gt;. This assigns the pod to the "Guaranteed" QoS class, making it the last candidate for eviction when the node runs out of RAM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Coordinate with HPA&lt;/strong&gt;: Ensure your Horizontal Pod Autoscaler (HPA) is tuned. If HPA triggers new pods based on CPU but your pods are OOMKilled due to memory, you'll experience cascading failures. Read &lt;a href="https://dev.to/blog/kubernetes-hpa-deep-dive-autoscaling-explained"&gt;Kubernetes HPA Deep Dive: Autoscaling Explained&lt;/a&gt; for coordination tips.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Why did my pod get OOMKilled even though it was below its limit?&lt;/strong&gt;&lt;br&gt;
A: This is a node-level OOM event. When the physical RAM of the node is exhausted, the kernel kills pods based on QoS class. "BestEffort" pods die first, then "Burstable" pods. If your pod is "Burstable" and the node is under pressure, it can be killed regardless of its individual limit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What is the difference between RSS and Working Set?&lt;/strong&gt;&lt;br&gt;
A: Resident Set Size (RSS) is the memory physically held in RAM. Working Set is RSS plus cached memory that cannot be evicted. Kubernetes uses Working Set for OOM decisions because it represents the memory the container absolutely requires to function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Should I enable swap in Kubernetes to prevent OOMKills?&lt;/strong&gt;&lt;br&gt;
A: No. While Kubernetes 1.28+ has improved swap support, relying on swap usually masks memory leaks and introduces severe latency spikes. It is better to fix the leak or increase the node size.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion and Next Steps
&lt;/h2&gt;

&lt;p&gt;Debugging &lt;code&gt;OOMKilled&lt;/code&gt; pods requires moving beyond &lt;code&gt;kubectl get pods&lt;/code&gt; and into memory metrics and heap profiling. By distinguishing between container-level and node-level OOM events, you can apply the correct fix, whether that is tuning JVM heap sizes or adjusting your QoS class.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your next steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Audit your current deployments for "Burstable" pods with wide gaps between requests and limits.&lt;/li&gt;
&lt;li&gt;Install the Prometheus &lt;code&gt;kube-state-metrics&lt;/code&gt; to track &lt;code&gt;container_memory_working_set_bytes&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Integrate &lt;code&gt;pprof&lt;/code&gt; or &lt;code&gt;jcmd&lt;/code&gt; into your base container images to make profiling immediate during incidents.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>kubernetestroubleshooting</category>
      <category>oomkilled</category>
      <category>kubernetesmemorylimits</category>
      <category>devopsguide</category>
    </item>
    <item>
      <title>Cilium vs Calico: CNI Comparison for Platform Teams</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Sun, 21 Jun 2026 08:47:09 +0000</pubDate>
      <link>https://dev.to/devopsstart/cilium-vs-calico-cni-comparison-for-platform-teams-54i2</link>
      <guid>https://dev.to/devopsstart/cilium-vs-calico-cni-comparison-for-platform-teams-54i2</guid>
      <description>&lt;p&gt;Choosing a Container Network Interface (CNI) for Kubernetes used to be a simple decision. Does it connect pods? Good enough. Today, that choice is a foundational platform decision that dictates your cluster's security posture, performance limits, and observability capabilities for years to come. Your CNI is no longer just a networking plugin; it's a critical piece of infrastructure that can either accelerate or bottleneck your entire stack.&lt;/p&gt;

&lt;p&gt;In 2026, the conversation is dominated by two leaders: Cilium and Calico. Both are powerful, mature, and trusted in massive production environments. But they represent different philosophies and technical approaches. Calico offers battle-tested flexibility with a rich history, while Cilium represents a new, eBPF-native world of networking. This guide will cut through the noise to help your platform team make the right long-term decision, focusing not just on features, but on the strategic implications for performance, security, and operations at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Differentiator: The Rise of eBPF
&lt;/h2&gt;

&lt;p&gt;To understand the Cilium vs. Calico debate, you first have to understand eBPF (extended Berkeley Packet Filter). Think of eBPF as a way to run sandboxed, event-driven programs directly within the Linux kernel without changing kernel source code. For networking, this is a game-changer. You can find a deep dive on the technology at the official &lt;a href="https://ebpf.io/what-is-ebpf/" rel="noopener noreferrer"&gt;eBPF &amp;amp; Cilium community site&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For decades, Kubernetes networking relied on &lt;code&gt;iptables&lt;/code&gt; or &lt;code&gt;IPVS&lt;/code&gt;. These tools are powerful but operate by creating long, complex chains of rules. As your cluster grows, with thousands of services and pods, traversing these chains for every single packet adds significant CPU overhead and latency. It's like sending a package through a city with thousands of traffic stops.&lt;/p&gt;

&lt;p&gt;eBPF offers a direct path. eBPF programs can be attached to network interfaces to make intelligent routing and filtering decisions right at the source, bypassing the cumbersome &lt;code&gt;iptables&lt;/code&gt; and &lt;code&gt;conntrack&lt;/code&gt; subsystems entirely. The result is a faster, more efficient, and more programmable data path.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cilium&lt;/strong&gt; was built from the ground up with eBPF as its core. Every feature (networking, observability, and security) is designed to leverage the power and efficiency of eBPF.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calico&lt;/strong&gt; started with a standard Linux networking data plane using &lt;code&gt;iptables&lt;/code&gt; and IP-in-IP or BGP for routing. It's incredibly robust and well-understood. Recognizing the power of eBPF, Calico introduced a pluggable eBPF data plane. This gives you a choice, but it also means eBPF is an add-on to its core architecture rather than its native foundation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This fundamental difference in architecture is the source of nearly every other distinction between the two projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Network Policy and Security Showdown
&lt;/h2&gt;

&lt;p&gt;Both Cilium and Calico provide robust network security, but their approaches and advanced capabilities differ significantly due to eBPF.&lt;/p&gt;

&lt;h3&gt;
  
  
  Calico's Policy Engine
&lt;/h3&gt;

&lt;p&gt;Calico has long been the gold standard for network policy. It supports standard Kubernetes &lt;code&gt;NetworkPolicy&lt;/code&gt; resources and extends them with its own CRDs like &lt;code&gt;GlobalNetworkPolicy&lt;/code&gt; and &lt;code&gt;NetworkPolicy&lt;/code&gt; with richer features like explicit &lt;code&gt;deny&lt;/code&gt; rules and rule ordering.&lt;/p&gt;

&lt;p&gt;A typical Calico policy is direct and focuses on L3/L4 (IP and port) filtering. Here's an example that allows ingress from pods with the &lt;code&gt;app: frontend&lt;/code&gt; label to port &lt;code&gt;6379/TCP&lt;/code&gt; on pods with the &lt;code&gt;app: database&lt;/code&gt; label.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;projectcalico.org/v3&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allow-frontend-to-db&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app == 'database'&lt;/span&gt;
  &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
    &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
    &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app == 'frontend'&lt;/span&gt;
    &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;6379&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is clear, effective, and works reliably across Calico's standard and eBPF data planes. For many organizations, this level of control is perfectly sufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cilium's L7-Aware Security
&lt;/h3&gt;

&lt;p&gt;Cilium's eBPF-native approach allows it to operate at a much higher level of the stack. It can parse application-layer protocols like HTTP, gRPC, and Kafka directly from the kernel. This enables incredibly granular, identity-aware policies that &lt;code&gt;iptables&lt;/code&gt;-based systems simply cannot match.&lt;/p&gt;

&lt;p&gt;Imagine you want to allow the &lt;code&gt;billing-service&lt;/code&gt; to read from a &lt;code&gt;metrics-api&lt;/code&gt; but not write to it. With Cilium, you can enforce this at the API level.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cilium.io/v2"&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CiliumNetworkPolicy&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-aware-policy-example&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpointSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;metrics-api&lt;/span&gt;
  &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;fromEndpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;billing-service&lt;/span&gt;
    &lt;span class="na"&gt;toPorts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080"&lt;/span&gt;
        &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET"&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/api/v1/metrics"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This policy enforces that only &lt;code&gt;GET&lt;/code&gt; requests to &lt;code&gt;/api/v1/metrics&lt;/code&gt; are allowed from the &lt;code&gt;billing-service&lt;/code&gt;. Any &lt;code&gt;POST&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt;, or other requests would be dropped by the CNI, even if they are on the correct port. This is a massive security enhancement, effectively turning your network into a zero-trust environment at the API level without requiring a service mesh.&lt;/p&gt;

&lt;p&gt;For encryption, both projects support WireGuard for transparent, in-transit encryption between nodes, providing a modern and performant alternative to IPsec.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance and Scalability in 2026
&lt;/h2&gt;

&lt;p&gt;In a small cluster, the performance difference between Cilium and Calico's eBPF mode might be negligible. But when you scale to hundreds of nodes and tens of thousands of pods, the architectural differences become stark.&lt;/p&gt;

&lt;p&gt;The primary performance advantage of eBPF comes from avoiding the &lt;code&gt;iptables&lt;/code&gt; and &lt;code&gt;conntrack&lt;/code&gt; kernel subsystems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduced CPU Overhead:&lt;/strong&gt; Every packet doesn't need to traverse long chains of &lt;code&gt;iptables&lt;/code&gt; rules. This frees up significant CPU cycles on each node, which can then be used by your applications. This is especially noticeable on nodes with high connection churn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower Latency:&lt;/strong&gt; By processing packets on a more direct path, eBPF reduces the per-packet latency within the cluster. This can have a meaningful impact on the performance of latency-sensitive applications like databases or real-time APIs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved Throughput:&lt;/strong&gt; The efficiency of the eBPF data path allows for higher overall network throughput, a key consideration for data-intensive workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While Calico's eBPF mode also gains these benefits, Cilium's eBPF implementation is often considered more mature and deeply integrated, having been the project's sole focus since its inception. In large-scale clusters, this maturity can translate to better stability and more predictable performance under extreme load. If you are deploying large clusters, for example when you &lt;a href="https://dev.to/tutorials/deploy-eks-cluster-with-terraform"&gt;deploy an EKS cluster with Terraform&lt;/a&gt;, this long-term performance profile becomes a critical factor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability: Seeing the Unseen
&lt;/h2&gt;

&lt;p&gt;Troubleshooting network issues in Kubernetes can be a nightmare. Is a packet being dropped by a network policy? Is a service not responding? This is where built-in observability tools make a huge difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cilium's Hubble
&lt;/h3&gt;

&lt;p&gt;Cilium comes with Hubble, a powerful, purpose-built observability platform that leverages eBPF to provide deep visibility into network flows without any application instrumentation. It gives you a service dependency map, real-time flow data, and policy troubleshooting tools out of the box.&lt;/p&gt;

&lt;p&gt;With the Hubble CLI, you can immediately see what's happening. For instance, to see all DNS requests and replies in real time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="c"&gt;# Assuming Cilium v1.15+ and Hubble is enabled&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;hubble observe &lt;span class="nt"&gt;--protocol&lt;/span&gt; dns &lt;span class="nt"&gt;-f&lt;/span&gt;

TIMESTAMP                  SOURCE                      DESTINATION                 TYPE          VERDICT   SUMMARY
May 20 12:34:56.123Z   kube-system/coredns-abcde   default/my-app-fghij        L7:response   FORWARDED   DNS Qry: A, Rcode: NOERROR
May 20 12:34:56.456Z   default/my-app-fghij        kube-system/coredns-abcde   L7:request    FORWARDED   DNS Qry: my-external-service.com.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Hubble UI provides a graphical representation of this data, making it incredibly easy to spot anomalies or understand service communication patterns. If you're constantly fighting weird networking bugs or dealing with a &lt;code&gt;CrashLoopBackOff&lt;/code&gt;, this level of insight can save hours of debugging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Calico's Observability
&lt;/h3&gt;

&lt;p&gt;Calico Enterprise offers similar observability features, including a dynamic service graph and flow visualization. In the open-source version, observability is typically achieved by exporting metrics to Prometheus and building Grafana dashboards. This is a powerful and flexible approach, but it requires more setup and integration effort. You can get flow logs, but they are not as rich or easily queryable as what Hubble provides out of the box.&lt;/p&gt;

&lt;p&gt;For teams that want a "batteries-included" observability solution tightly coupled with their CNI, Hubble is a clear winner. For teams that prefer to build their own observability stack around standards like Prometheus, Calico's approach is perfectly viable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Blurring Lines with Service Mesh
&lt;/h2&gt;

&lt;p&gt;The next battleground for CNIs is the service mesh. Both Cilium and Calico are expanding their feature sets to provide capabilities traditionally associated with tools like Istio or Linkerd.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cilium Service Mesh:&lt;/strong&gt; Cilium offers service mesh features like traffic management (retries, timeouts), canary deployments, and mTLS encryption without requiring a sidecar proxy. It achieves this by embedding Envoy proxy functionality directly into the CNI layer, managed by eBPF. This sidecar-less model promises better performance and lower resource overhead compared to traditional service meshes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calico and Istio:&lt;/strong&gt; Calico's strategy is to provide best-in-class integration with Istio. It can accelerate Istio's data plane by offloading some of the networking functions to its eBPF layer. This allows you to combine Calico's robust CNI and policy engine with the full, mature feature set of the Istio service mesh.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This presents a strategic choice: do you want a single, integrated platform for networking, security, and service mesh (Cilium), or do you prefer a modular approach, combining a dedicated CNI (Calico) with a dedicated service mesh (Istio)?&lt;/p&gt;

&lt;h2&gt;
  
  
  Making the Choice: A Decision Framework
&lt;/h2&gt;

&lt;p&gt;There is no single "best" CNI. The right choice depends entirely on your team's priorities, existing infrastructure, and future goals.&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose Cilium if
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance is your top priority.&lt;/strong&gt; You run latency-sensitive or high-throughput applications and want to squeeze every ounce of performance from your infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need advanced, L7-aware security.&lt;/strong&gt; Your security model requires enforcing policies at the API level (HTTP, gRPC) and you want to build a zero-trust network.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want built-in, out-of-the-box observability.&lt;/strong&gt; Your team values integrated tools like Hubble for rapid troubleshooting and dependency mapping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You are building a new platform and are "all-in" on eBPF.&lt;/strong&gt; You are comfortable adopting a modern, eBPF-native stack and are interested in its integrated service mesh capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Calico if
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You need maximum flexibility and a proven track record.&lt;/strong&gt; You operate in diverse environments (on-prem, multiple clouds) and need a CNI that supports various data planes (standard Linux, eBPF, Windows).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your organization already has a significant investment in Istio.&lt;/strong&gt; Calico's deep integration with Istio allows you to enhance your existing service mesh with a powerful CNI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You prefer a more gradual adoption of eBPF.&lt;/strong&gt; You want the option to run some clusters with the well-understood standard Linux data plane while experimenting with the eBPF data plane in others.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your team has deep expertise in traditional Linux networking.&lt;/strong&gt; Calico's architecture will feel familiar and its use of BGP for routing is a standard in many enterprises.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions (FAQ)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Is eBPF mature enough for production in 2026?&lt;/strong&gt;&lt;br&gt;
Absolutely. eBPF has been a stable part of the Linux kernel for years. It's used in production by massive tech companies like Google, Meta, and Netflix to handle networking, security, and observability at an immense scale. Both Cilium and Calico's eBPF implementations are considered production-grade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Can I migrate from Calico to another CNI like Cilium?&lt;/strong&gt;&lt;br&gt;
Yes, but it's a complex and disruptive process. A CNI migration typically requires a "rip and replace" approach, which involves draining nodes, uninstalling the old CNI, installing the new one, and then uncordoning the nodes. This requires careful planning and significant downtime. It's far better to make the right choice from the beginning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Does Calico's standard Linux data plane still have a place?&lt;/strong&gt;&lt;br&gt;
Yes. For environments with older Linux kernels that lack the necessary eBPF features, or for teams who prioritize stability and familiarity over cutting-edge performance, the standard &lt;code&gt;iptables&lt;/code&gt;-based data plane is still a very solid and reliable choice. It's one of Calico's key strengths: providing that flexibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. What about other CNIs like Flannel or Weave Net?&lt;/strong&gt;&lt;br&gt;
While Flannel and Weave Net are excellent for getting started or for smaller, less demanding clusters, they generally lack the advanced security, observability, and performance features of Cilium and Calico. For any serious production platform, the choice has largely narrowed to these two.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Choosing between Cilium and Calico is a strategic decision that will shape your Kubernetes platform for years. It's a choice between an eBPF-native, all-in-one solution that pushes the boundaries of performance and security (Cilium), and a flexible, battle-hardened CNI that offers a choice of data planes and deep integration with the wider ecosystem (Calico).&lt;/p&gt;

&lt;p&gt;Your next step should be hands-on evaluation. Don't just read about it. Spin up two identical test clusters. &lt;a href="https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/" rel="noopener noreferrer"&gt;Install Cilium&lt;/a&gt; on one and &lt;a href="https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart" rel="noopener noreferrer"&gt;Calico with the eBPF data plane&lt;/a&gt; on the other. Deploy a sample application and use tools like &lt;code&gt;netperf&lt;/code&gt; to measure throughput and latency. Test a complex network policy on each. Use Hubble to visualize traffic. Only by seeing how they operate in your environment can you make a truly informed decision for the future of your platform.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>cni</category>
      <category>comparison</category>
      <category>cilium</category>
    </item>
    <item>
      <title>NGINX vs Traefik: Kubernetes Ingress Comparison Guide</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Sat, 20 Jun 2026 08:28:19 +0000</pubDate>
      <link>https://dev.to/devopsstart/nginx-vs-traefik-kubernetes-ingress-comparison-guide-21gd</link>
      <guid>https://dev.to/devopsstart/nginx-vs-traefik-kubernetes-ingress-comparison-guide-21gd</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Choosing between NGINX and Traefik isn't about finding the "better" tool, but about deciding where you want your operational toil to live. For a platform engineer, the choice comes down to a trade-off between raw, predictable performance and developer agility. NGINX is the industry titan, offering unmatched throughput and a configuration model that's been battle-tested for decades. Traefik, conversely, was born for the cloud-native era, treating infrastructure as a fluid entity where services appear and disappear in seconds.&lt;/p&gt;

&lt;p&gt;To evaluate these, you must look beyond the feature checklist. You need to consider Day 2 operations: how the controller handles 500+ microservices, the complexity of managing TLS certificates and the latency introduced during configuration reloads. As the industry shifts toward the &lt;a href="https://kubernetes.io/docs/concepts/services-networking/gateway/" rel="noopener noreferrer"&gt;Kubernetes Gateway API&lt;/a&gt;, both tools are evolving, but their core philosophies remain distinct. You're choosing between a high-performance proxy that adapts to Kubernetes and a Kubernetes-native orchestrator that happens to be a proxy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Side-by-Side Comparison Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;NGINX Ingress Controller&lt;/th&gt;
&lt;th&gt;Traefik Proxy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Process-based (C-based)&lt;/td&gt;
&lt;td&gt;Event-driven (Go-based)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Config Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Annotations $\rightarrow$ nginx.conf&lt;/td&gt;
&lt;td&gt;CRDs $\rightarrow$ Dynamic Config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Config Updates&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reloads (potential connection drops)&lt;/td&gt;
&lt;td&gt;Hot-reloads (zero downtime)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TLS/SSL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;External (cert-manager)&lt;/td&gt;
&lt;td&gt;Native ACME/Let's Encrypt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;External (Prometheus/Grafana)&lt;/td&gt;
&lt;td&gt;Built-in Dashboard + Metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Performance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher raw throughput, lower CPU&lt;/td&gt;
&lt;td&gt;High, but slightly higher overhead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Learning Curve&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Steep for complex routing&lt;/td&gt;
&lt;td&gt;Moderate, native K8s feel&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gateway API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;First-class citizen&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  NGINX: The High-Performance Workhorse
&lt;/h2&gt;

&lt;p&gt;NGINX Ingress is the safe bet for environments where every millisecond of latency counts and traffic patterns are relatively stable. Its strength lies in its efficiency. Because it's written in C, it handles massive concurrency with a smaller memory footprint than Go-based alternatives. In high-load scenarios, NGINX typically maintains a lower p99 latency than Traefik.&lt;/p&gt;

&lt;p&gt;However, the "NGINX way" often involves a heavy reliance on annotations. If you need complex routing, you end up with an &lt;code&gt;Ingress&lt;/code&gt; resource cluttered with &lt;code&gt;nginx.ingress.kubernetes.io&lt;/code&gt; keys. For advanced logic, you have to use "snippets", which are essentially raw NGINX config fragments injected into the template. This is powerful but dangerous, as a syntax error in a snippet can crash the entire controller.&lt;/p&gt;

&lt;p&gt;One major pain point is the reload mechanism. While the controller tries to minimize disruption, changing certain global settings triggers a reload of the NGINX process. I've seen this cause intermittent connection drops in clusters with &amp;gt;100 nodes when managing long-lived WebSockets or gRPC streams.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;networking.k8s.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Ingress&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-gateway&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Example of annotation-heavy config&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/rewrite-target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/backend-protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HTTPS"&lt;/span&gt;
    &lt;span class="na"&gt;nginx.ingress.kubernetes.io/limit-rps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;50"&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ingressClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api.example.com&lt;/span&gt;
    &lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/v1&lt;/span&gt;
        &lt;span class="na"&gt;pathType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prefix&lt;/span&gt;
        &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-service&lt;/span&gt;
            &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Traefik: The Cloud-Native Orchestrator
&lt;/h2&gt;

&lt;p&gt;Traefik is designed for the "churn" of microservices. It doesn't just read a config file; it listens to the Kubernetes API server. When a new service is deployed or an HPA scales your pods, Traefik updates its routing table in real-time without restarting or reloading.&lt;/p&gt;

&lt;p&gt;The standout feature is the native CRD approach. Instead of messy annotations, Traefik uses &lt;code&gt;IngressRoute&lt;/code&gt; and &lt;code&gt;Middleware&lt;/code&gt; resources. This allows you to define a "RateLimit" middleware once and attach it to ten different services, rather than duplicating annotations across ten different Ingress files. This architectural choice reduces YAML duplication by roughly 40% in large-scale deployments.&lt;/p&gt;

&lt;p&gt;The built-in Let's Encrypt integration is a massive win for platform teams. You don't need to install and manage &lt;code&gt;cert-manager&lt;/code&gt; and its associated &lt;code&gt;ClusterIssuers&lt;/code&gt; if you only need basic ACME automation. Traefik handles the challenge and renewal internally.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Middleware&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rate-limit-api&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rateLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;average&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
    &lt;span class="na"&gt;burst&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;

&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;traefik.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;IngressRoute&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-route&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;entryPoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;websecure&lt;/span&gt;
  &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Host(`api.example.com`) &amp;amp;&amp;amp; PathPrefix(`/v1`)&lt;/span&gt;
    &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Rule&lt;/span&gt;
    &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-service&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
    &lt;span class="na"&gt;middlewares&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rate-limit-api&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  When to Choose Which
&lt;/h2&gt;

&lt;p&gt;The decision should be based on your team's operational capacity and your application's traffic profile.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose NGINX if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are running a high-throughput API where raw latency is your primary KPI.&lt;/li&gt;
&lt;li&gt;You have a small number of stable services that don't change their routing rules hourly.&lt;/li&gt;
&lt;li&gt;Your team is already comfortable with NGINX syntax and wants a predictable tool.&lt;/li&gt;
&lt;li&gt;You are leveraging a CNI like Cilium to handle some of the L7 logic and only need NGINX for the edge. If you're evaluating networking layers, check out our /comparisons/kubernetes-cni-comparison-cilium-vs-calico-for-platform-team for more context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Traefik if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are managing a volatile microservices environment with frequent deployments and auto-scaling.&lt;/li&gt;
&lt;li&gt;You want a built-in dashboard to visualize traffic flow without configuring a complex Grafana stack.&lt;/li&gt;
&lt;li&gt;You want to reduce "YAML bloat" by using reusable Middlewares.&lt;/li&gt;
&lt;li&gt;You prefer a tool that feels like a native part of the Kubernetes API rather than an external proxy ported to K8s.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does Traefik support the standard Kubernetes Ingress resource?&lt;/strong&gt;&lt;br&gt;
Yes, Traefik supports the standard &lt;code&gt;Ingress&lt;/code&gt; resource for compatibility, but to unlock features like advanced Middlewares and complex routing rules, you must use the &lt;code&gt;IngressRoute&lt;/code&gt; CRD.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which one is better for gRPC traffic?&lt;/strong&gt;&lt;br&gt;
Both support gRPC, but NGINX generally provides better raw performance for gRPC streams. However, Traefik's lack of reload-based disruptions makes it more stable for long-lived gRPC connections during configuration updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use both in the same cluster?&lt;/strong&gt;&lt;br&gt;
Yes. By using different &lt;code&gt;ingressClassName&lt;/code&gt; values, you can run both controllers. This is a common pattern when migrating from one to the other or when separating internal and external traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration and Adoption Checklist
&lt;/h2&gt;

&lt;p&gt;If you are moving from NGINX to Traefik (or vice versa), avoid a "big bang" migration. The risk of a total outage is too high.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Deployment&lt;/strong&gt;: Install the new controller alongside the old one. Use different &lt;code&gt;ingressClassName&lt;/code&gt; values to ensure they don't fight over the same resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Canary Routing&lt;/strong&gt;: Use a DNS weight shift (e.g., Route53 or Cloudflare) to send 5% of traffic to the new controller.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Middleware Mapping&lt;/strong&gt;: Map your NGINX snippets to Traefik Middlewares. Document every custom header or rewrite rule.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cert Validation&lt;/strong&gt;: If moving to Traefik, verify ACME challenge propagation before deleting your &lt;code&gt;cert-manager&lt;/code&gt; setup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability Sync&lt;/strong&gt;: Ensure your Prometheus scrapers are updated to pull the specific metrics format of the new controller. If you're struggling with pod stability during this rollout, see our guide on /troubleshooting/crashloopbackoff-kubernetes to debug fast.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once 100% of traffic is stable on the new controller, prune the old Ingress resources and uninstall the legacy controller to reclaim cluster resources.&lt;/p&gt;

</description>
      <category>kubernetesingress</category>
      <category>nginxingresscontroller</category>
      <category>traefikproxy</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>Fix GitLab CI Docker daemon connection error in 3 steps</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Fri, 19 Jun 2026 18:59:16 +0000</pubDate>
      <link>https://dev.to/devopsstart/fix-gitlab-ci-docker-daemon-connection-error-in-3-steps-240o</link>
      <guid>https://dev.to/devopsstart/fix-gitlab-ci-docker-daemon-connection-error-in-3-steps-240o</guid>
      <description>&lt;p&gt;&lt;em&gt;This article was originally published on DevOpsStart.com. Here's a quick fix for the dreaded 'Cannot connect to the Docker daemon' error in GitLab CI.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;You have a GitLab CI job that tries to run &lt;code&gt;docker build&lt;/code&gt; or &lt;code&gt;docker info&lt;/code&gt;, and it fails with this error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Cannot connect to the Docker daemon at tcp://docker:2375. Is the docker daemon running?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The error means your job is running inside a container that has the &lt;code&gt;docker&lt;/code&gt; CLI installed but no Docker daemon (&lt;code&gt;dockerd&lt;/code&gt;) service to talk to. The default &lt;code&gt;docker&lt;/code&gt; image from Docker Hub ships only the client. Without a running daemon, every &lt;code&gt;docker&lt;/code&gt; command returns this error.&lt;/p&gt;

&lt;h2&gt;
  
  
  Root Causes
&lt;/h2&gt;

&lt;p&gt;Three things cause this error.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Missing &lt;code&gt;docker&lt;/code&gt; service in &lt;code&gt;.gitlab-ci.yml&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;You must define a &lt;code&gt;services&lt;/code&gt; block that runs the daemon. The standard approach is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;docker:dind&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, your job has no daemon to connect to.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Incorrect or missing &lt;code&gt;DOCKER_HOST&lt;/code&gt; variable
&lt;/h3&gt;

&lt;p&gt;When you use the &lt;code&gt;docker&lt;/code&gt; executor (the default GitLab Runner executor), the daemon runs as a separate container in the same Pod. You must tell the client where to find it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;DOCKER_HOST&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp://docker:2375&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;DOCKER_HOST&lt;/code&gt; is set to &lt;code&gt;tcp://localhost:2375&lt;/code&gt; or missing entirely, the client looks for a local Unix socket that does not exist inside your job container. The job container has no socket file.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Runner not in &lt;code&gt;privileged&lt;/code&gt; mode
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;docker:dind&lt;/code&gt; service requires &lt;code&gt;privileged&lt;/code&gt; mode to run its own Docker daemon. If your GitLab Runner's &lt;code&gt;config.toml&lt;/code&gt; has &lt;code&gt;privileged = false&lt;/code&gt; or is unset, the dind container cannot start. Check your runner configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /etc/gitlab-runner/config.toml
&lt;span class="o"&gt;[[&lt;/span&gt;runners]]
  name &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-runner"&lt;/span&gt;
  url &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://gitlab.com/"&lt;/span&gt;
  token &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt;
  executor &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"docker"&lt;/span&gt;
  &lt;span class="o"&gt;[&lt;/span&gt;runners.docker]
    privileged &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;true
    &lt;/span&gt;pull_policy &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"always"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;privileged&lt;/code&gt; is &lt;code&gt;false&lt;/code&gt; or absent, the dind service never boots, and you get the &lt;code&gt;Cannot connect to the Docker daemon&lt;/code&gt; error.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;A complete fix takes three steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: add the &lt;code&gt;docker:dind&lt;/code&gt; service
&lt;/h3&gt;

&lt;p&gt;Add this to your &lt;code&gt;.gitlab-ci.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker:24.0.7&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker:dind&lt;/span&gt;
    &lt;span class="na"&gt;alias&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker&lt;/span&gt;

&lt;span class="na"&gt;variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;DOCKER_HOST&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp://docker:2375&lt;/span&gt;
  &lt;span class="na"&gt;DOCKER_TLS_CERTDIR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;DOCKER_HOST&lt;/code&gt; value &lt;code&gt;tcp://docker:2375&lt;/code&gt; tells the client to use the service container named &lt;code&gt;docker&lt;/code&gt; (the GitLab Runner resolves the service alias to the container hostname). The &lt;code&gt;DOCKER_TLS_CERTDIR: ""&lt;/code&gt; disables TLS for simplicity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: test with a simple job
&lt;/h3&gt;

&lt;p&gt;Add a minimal job to verify the fix works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;build&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;docker info&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;docker build -t my-app .&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;docker info&lt;/code&gt; command confirms the daemon is reachable. If it succeeds, your pipeline is ready.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: check runner logs
&lt;/h3&gt;

&lt;p&gt;If &lt;code&gt;docker info&lt;/code&gt; still fails, inspect the runner logs. On your GitLab Runner host, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;docker logs &amp;lt;runner-container-id&amp;gt; &lt;span class="nt"&gt;--tail&lt;/span&gt; 50
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for lines like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Starting container service-docker:dind-0 ... 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see no such line, the service never started. Check &lt;code&gt;privileged&lt;/code&gt; mode again in &lt;code&gt;config.toml&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention
&lt;/h2&gt;

&lt;p&gt;To avoid this error in future:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always add &lt;code&gt;services: - docker:dind&lt;/code&gt; to any job that needs &lt;code&gt;docker&lt;/code&gt; commands.&lt;/li&gt;
&lt;li&gt;Always set &lt;code&gt;DOCKER_HOST: tcp://docker:2375&lt;/code&gt; as a top-level variable in your pipeline.&lt;/li&gt;
&lt;li&gt;Verify your GitLab Runner's &lt;code&gt;config.toml&lt;/code&gt; has &lt;code&gt;privileged = true&lt;/code&gt; before you push new jobs.&lt;/li&gt;
&lt;li&gt;For a production-grade setup, consider using the &lt;code&gt;docker:24.0.7-dind&lt;/code&gt; image variant. It bundles the daemon and the client in one image, reducing the chance of version mismatches.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>gitlabcidockerdaemon</category>
      <category>dockerdindservice</category>
      <category>gitlabcicannotconnectdockerdae</category>
      <category>gitlabrunnerprivilegedmode</category>
    </item>
    <item>
      <title>How to Fix Kubernetes CrashLoopBackOff in Production</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Fri, 19 Jun 2026 09:04:30 +0000</pubDate>
      <link>https://dev.to/devopsstart/how-to-fix-kubernetes-crashloopbackoff-in-production-34fl</link>
      <guid>https://dev.to/devopsstart/how-to-fix-kubernetes-crashloopbackoff-in-production-34fl</guid>
      <description>&lt;h2&gt;
  
  
  Problem: What CrashLoopBackOff actually means
&lt;/h2&gt;

&lt;p&gt;When you see &lt;code&gt;CrashLoopBackOff&lt;/code&gt; in your &lt;code&gt;kubectl get pods&lt;/code&gt; output, you aren't looking at a specific error, but a state. It is a symptom. It tells you that the kubelet tried to start your container, the container crashed, and Kubernetes is now waiting before trying again.&lt;/p&gt;

&lt;p&gt;To prevent the API server and the node from being hammered by a process that crashes instantly, Kubernetes implements an exponential backoff delay. The first restart happens quickly, but subsequent failures increase the wait time (10s, 20s, 40s, up to a maximum of 5 minutes). If you don't intervene, your pod will spend more time waiting than running, making it nearly impossible to catch logs in real-time.&lt;/p&gt;

&lt;p&gt;You can find more details on the pod lifecycle in the official &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/" rel="noopener noreferrer"&gt;Kubernetes Documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Root Causes: The "Why" behind the crash
&lt;/h2&gt;

&lt;p&gt;Diagnosing a crash loop requires moving from the symptom to the cause. I've seen this fail most often in clusters with &amp;gt;50 nodes where configuration drift becomes common. Root causes generally fall into three severity tiers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Low Severity: Configuration and Environment
&lt;/h3&gt;

&lt;p&gt;These are "fail-fast" errors. The application starts, realizes it is missing a critical piece of information, and exits. Common culprits include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Missing ConfigMaps or Secrets&lt;/strong&gt;: The pod is configured to expect a volume or environment variable that doesn't exist.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incorrect Environment Variables&lt;/strong&gt;: A typo in a database URL or a missing API key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wrong Command/Args&lt;/strong&gt;: An incorrect entrypoint in the Dockerfile or a typo in the &lt;code&gt;args&lt;/code&gt; section of the YAML.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Medium Severity: Resource and Infrastructure
&lt;/h3&gt;

&lt;p&gt;These crashes are often intermittent or happen shortly after the app begins processing traffic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OOMKilled (Exit Code 137)&lt;/strong&gt;: The container exceeded its memory limit. This is the most common production crash, often reducing availability by 100% for that specific replica.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Liveness Probe Death Spiral&lt;/strong&gt;: The application takes 30 seconds to start, but the liveness probe kills it after 10 seconds. The pod is healthy, but Kubernetes thinks it's dead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage Permissions&lt;/strong&gt;: The container user doesn't have write access to a mounted PersistentVolume.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  High Severity: External Dependencies
&lt;/h3&gt;

&lt;p&gt;The application is healthy, but its environment is hostile:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Database Connection Timeouts&lt;/strong&gt;: The app crashes because it cannot reach the DB due to a firewall rule or incorrect credentials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DNS Resolution Failures&lt;/strong&gt;: CoreDNS issues prevent the app from finding other services within the cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency API Outages&lt;/strong&gt;: A hard dependency (like an external Auth provider) is down, and the app isn't designed to handle the failure gracefully.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Solution: Step-by-Step Production Triage
&lt;/h2&gt;

&lt;p&gt;When a production service goes into &lt;code&gt;CrashLoopBackOff&lt;/code&gt;, follow this severity-based logic tree. Do not guess; use the data provided by the cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: The Rapid Triage
&lt;/h3&gt;

&lt;p&gt;Start by checking the pod status and events. This tells you if the crash is happening because of the image, the scheduler, or the application itself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl describe pod &amp;lt;pod-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look at the &lt;code&gt;Containers&lt;/code&gt; section for the &lt;code&gt;Last State&lt;/code&gt;. You will see an &lt;code&gt;Exit Code&lt;/code&gt;. This is your most important clue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Output:&lt;/strong&gt;&lt;br&gt;
You should see a section like this:&lt;br&gt;
&lt;code&gt;Last State: Terminated&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Reason: Error&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Exit Code: 137&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Decoding the Exit Code
&lt;/h3&gt;

&lt;p&gt;Map the exit code from &lt;code&gt;describe&lt;/code&gt; to your action plan:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exit Code 0&lt;/strong&gt;: The app finished its task and exited. If this is a Deployment, it shouldn't happen. You likely need a Job instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit Code 1&lt;/strong&gt;: General application crash. Move to Step 3 to check logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit Code 137&lt;/strong&gt;: OOMKilled. Increase your memory limits in the deployment YAML.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit Code 139&lt;/strong&gt;: Segmentation fault. This is usually a binary incompatibility or a memory corruption issue in the code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit Code 143&lt;/strong&gt;: SIGTERM. The pod was told to stop but didn't do it gracefully.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 3: Retrieving the "Hidden" Logs
&lt;/h3&gt;

&lt;p&gt;If a pod is crashing, &lt;code&gt;kubectl logs &amp;lt;pod-name&amp;gt;&lt;/code&gt; often returns nothing because the current container has just started and hasn't logged anything yet. You must check the logs of the previous failed instance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--previous&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the logs show a connection timeout to a database, check your network policies or secret values. If you need a faster way to handle urgent fixes, you can use &lt;code&gt;/tips/rapid-rollback-kubectl-set-image-for-urgent-fixes&lt;/code&gt; to revert to a known working image version.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Debugging "Silent" Crashes with Ephemeral Containers
&lt;/h3&gt;

&lt;p&gt;Sometimes logs are empty and the exit code is vague. In Kubernetes v1.23+, you can use &lt;code&gt;kubectl debug&lt;/code&gt; to spin up a sidecar container with debugging tools (like &lt;code&gt;curl&lt;/code&gt;, &lt;code&gt;dig&lt;/code&gt;, or &lt;code&gt;vim&lt;/code&gt;) that shares the same process namespace as the crashing pod.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl debug &lt;span class="nt"&gt;-it&lt;/span&gt; &amp;lt;pod-name&amp;gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;busybox &lt;span class="nt"&gt;--target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;container-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once inside, you can inspect the &lt;code&gt;/tmp&lt;/code&gt; directory, check network connectivity, or look at the filesystem to see if a config file was mounted incorrectly. For a wider array of helpful commands, refer to the /tips/kubectl-essential-commands guide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention: Stopping the loop before it starts
&lt;/h2&gt;

&lt;p&gt;Prevention is about shifting from "fixing" to "hardening". Implement these four strategies to eliminate &lt;code&gt;CrashLoopBackOff&lt;/code&gt; in production:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Right-Size Resources&lt;/strong&gt;: Use a Vertical Pod Autoscaler (VPA) in &lt;code&gt;Recommender&lt;/code&gt; mode to find the actual memory usage of your app. Set your &lt;code&gt;limits&lt;/code&gt; roughly 20% higher than your &lt;code&gt;requests&lt;/code&gt; to handle spikes without triggering an OOMKill.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Graceful Probes&lt;/strong&gt;: Never set a &lt;code&gt;livenessProbe&lt;/code&gt; with the same timing as your &lt;code&gt;readinessProbe&lt;/code&gt;. Give your application a &lt;code&gt;initialDelaySeconds&lt;/code&gt; that covers its worst-case startup time. If your app takes 20 seconds to boot, set the delay to 30 seconds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Robust Signal Handling&lt;/strong&gt;: Ensure your application handles &lt;code&gt;SIGTERM&lt;/code&gt; (Exit Code 143). If your app ignores this signal, Kubernetes will eventually force-kill it with &lt;code&gt;SIGKILL&lt;/code&gt; (Exit Code 137) after the &lt;code&gt;terminationGracePeriodSeconds&lt;/code&gt; expires.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Better Observability&lt;/strong&gt;: Integrate deep monitoring. If you are running AI workloads, implementing /tutorials/llm-observability-on-kubernetes-a-practical-guide will help you identify if crashes are caused by GPU memory exhaustion or model loading timeouts.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why does my pod stay in CrashLoopBackOff even after I fix the ConfigMap?&lt;/strong&gt;&lt;br&gt;
Kubernetes uses an exponential backoff. If your pod has crashed multiple times, it might wait up to 5 minutes before the next restart attempt. You can force an immediate restart by deleting the pod: &lt;code&gt;kubectl delete pod &amp;lt;pod-name&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is an Exit Code 137 always an OOMKill?&lt;/strong&gt;&lt;br&gt;
Almost always, but not exclusively. It means the process received a &lt;code&gt;SIGKILL&lt;/code&gt; (Signal 9). While the kubelet usually sends this when a container exceeds its memory limit, it can also happen if an external process or the node's OOM killer terminates the process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How can I prevent a crashing pod from affecting other pods on the same node?&lt;/strong&gt;&lt;br&gt;
Define strict &lt;code&gt;resources.limits&lt;/code&gt;. Without memory limits, a single leaking container can consume all node memory, triggering the Node-level OOM killer and causing unrelated pods to be evicted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion and Next Steps
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;CrashLoopBackOff&lt;/code&gt; is a safety mechanism, not a bug. By isolating the exit code and inspecting previous logs, you can quickly determine if you are facing a configuration error, a resource constraint, or a dependency failure.&lt;/p&gt;

&lt;p&gt;To further harden your production environment, take these next steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Audit your &lt;code&gt;livenessProbes&lt;/code&gt; to ensure they aren't too aggressive.&lt;/li&gt;
&lt;li&gt;Deploy a Vertical Pod Autoscaler (VPA) to get data-driven memory limits.&lt;/li&gt;
&lt;li&gt;Implement a structured logging pipeline (EFK/ELK) so you don't have to rely on &lt;code&gt;--previous&lt;/code&gt; logs during an incident.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetestroubleshooting</category>
      <category>crashloopbackoff</category>
      <category>kubernetesproduction</category>
      <category>devopsguide</category>
    </item>
    <item>
      <title>Senior SRE Interview Questions &amp; Answers for 2026</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Thu, 18 Jun 2026 08:56:33 +0000</pubDate>
      <link>https://dev.to/devopsstart/senior-sre-interview-questions-answers-for-2026-42d6</link>
      <guid>https://dev.to/devopsstart/senior-sre-interview-questions-answers-for-2026-42d6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Landing a Senior Site Reliability Engineering (SRE) role in 2026 requires more than just knowing how to write a YAML file or explaining the difference between a Pod and a Deployment. The industry has shifted. We have moved past the early adoption phase of Kubernetes and into the era of Platform Engineering, where the goal is not just to manage infrastructure but to build an Internal Developer Platform (IDP) that enables self-service.&lt;/p&gt;

&lt;p&gt;Interviewers no longer test for rote memorization of Linux commands. They look for architectural judgment, the ability to manage cognitive load for developers and a deep understanding of how reliability impacts the bottom line. A Senior SRE is expected to be a force multiplier, not just a firefighter.&lt;/p&gt;

&lt;p&gt;In this guide, you will find the high-signal questions currently being asked at top-tier tech companies, along with the senior-level reasoning required to answer them. We cover everything from cell-based architectures and OpenTelemetry to the psychological nuances of blameless post-mortems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Evolution of SRE in 2026
&lt;/h2&gt;

&lt;p&gt;Before diving into specific questions, you must understand the current landscape. SRE has largely merged with Platform Engineering. The focus is now on "Golden Paths" (standardized, supported ways to deploy software) to reduce developer friction.&lt;/p&gt;

&lt;p&gt;Furthermore, AIOps is no longer a buzzword. Senior SREs are now expected to integrate LLMs into their observability stacks for anomaly detection and automated root cause analysis. If you discuss observability, you should reference patterns for monitoring non-deterministic AI workloads, such as tracking token latency and prompt cache hit rates, to show you understand how to monitor LLM-integrated applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced System Design for Reliability
&lt;/h2&gt;

&lt;p&gt;At the senior level, scalability is not just about adding more replicas to a deployment. It is about blast radius reduction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Question: How do you design a system to prevent a single regional failure from taking down your entire global platform?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What they are looking for:&lt;/strong&gt; Knowledge of Global Server Load Balancing (GSLB), Anycast routing and specifically "Cell-based Architecture."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Senior Answer:&lt;/strong&gt;&lt;br&gt;
Do not just say "I would use a multi-region deployment." Explain the trade-offs. Mention that while multi-region provides availability, it introduces data consistency challenges tied to the CAP theorem.&lt;/p&gt;

&lt;p&gt;Explain the concept of Cells. Instead of one giant regional cluster, divide infrastructure into isolated cells (smaller, independent units of deployment). If a bug is deployed or a database locks up, it only affects one cell (e.g., 5% of users) rather than the entire region.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key components to mention:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Route53/Cloudflare:&lt;/strong&gt; For traffic steering based on health checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cell Router:&lt;/strong&gt; A thin layer that maps a User ID to a specific cell.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Asynchronous Replication:&lt;/strong&gt; Using tools like CockroachDB or DynamoDB Global Tables to handle state across regions without killing latency.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Question: How do you handle "Thundering Herd" problems in a distributed system?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What they are looking for:&lt;/strong&gt; Understanding of caching strategies and request shedding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Senior Answer:&lt;/strong&gt;&lt;br&gt;
A thundering herd occurs when many clients retry a failed request simultaneously, crashing the recovering service. I implement three layers of defense:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Exponential Backoff with Jitter:&lt;/strong&gt; Ensure clients do not retry at the exact same millisecond. A simple &lt;code&gt;sleep(2^attempt + random_jitter)&lt;/code&gt; prevents synchronized spikes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Circuit Breakers:&lt;/strong&gt; Use a service mesh (like Istio or Linkerd) to trip the circuit and fail fast when a downstream service is overwhelmed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request Prioritization:&lt;/strong&gt; Implement a priority queue where critical traffic (e.g., checkout) is processed before background tasks (e.g., analytics).&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Observability vs. Monitoring
&lt;/h2&gt;

&lt;p&gt;Monitoring tells you that something is wrong; observability allows you to understand why it is wrong without shipping new code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Question: We have millions of metrics, but our dashboards are noisy. How do you define "actionable" SLIs and SLOs?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What they are looking for:&lt;/strong&gt; The ability to link technical metrics to business value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Senior Answer:&lt;/strong&gt;&lt;br&gt;
Most teams make the mistake of measuring CPU usage as an SLI. CPU is a cause, not a symptom. A senior SRE focuses on the user experience.&lt;/p&gt;

&lt;p&gt;I use the Four Golden Signals (Latency, Traffic, Errors, Saturation) but tie them to specific user journeys. For example, instead of "API Error Rate," I define the SLI as "Percentage of successful 'Add to Cart' requests completed within 500ms over a rolling 30-day window."&lt;/p&gt;

&lt;p&gt;If the Error Budget is exhausted, the action is a freeze on feature releases to focus on reliability. This turns a technical metric into a business decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Question: How do you handle high-cardinality data in a distributed tracing environment?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What they are looking for:&lt;/strong&gt; Experience with OpenTelemetry (OTel) and the cost implications of telemetry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Senior Answer:&lt;/strong&gt;&lt;br&gt;
High cardinality (e.g., putting a unique UserID in every metric tag) can crash a Prometheus instance or lead to massive bills in Datadog.&lt;/p&gt;

&lt;p&gt;I recommend moving to OpenTelemetry for a vendor-agnostic approach. To handle cardinality, I implement Head-based or Tail-based Sampling. Instead of keeping 100% of traces, we keep 100% of errors and 5% of successful requests. This provides the necessary visibility into failures without the storage overhead of every single "200 OK" request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Incident Management and the "SRE Mindset"
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Question: You are leading an incident where a cascading failure is occurring across three microservices. How do you manage the situation?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What they are looking for:&lt;/strong&gt; Command and control (ICS), communication skills and technical triage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Senior Answer:&lt;/strong&gt;&lt;br&gt;
First, I establish roles: an Incident Commander (IC) to coordinate, a Communications Lead to update stakeholders and an Ops Lead to handle the technical fix. I avoid having too many people directing the technical execution.&lt;/p&gt;

&lt;p&gt;Technically, my first goal is to stop the bleeding, not find the root cause. I look for the bottleneck service and apply aggressive load shedding or disable non-essential features using feature flags to lower the pressure on the system. Once the system is stable, we move to the Post-Mortem phase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Question: How do you handle a situation where a Product Manager insists on a feature release that you know will risk the Error Budget?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What they are looking for:&lt;/strong&gt; Negotiation skills and a commitment to the SRE philosophy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Senior Answer:&lt;/strong&gt;&lt;br&gt;
I do not frame it as "No, we cannot do this." I frame it as a risk management conversation.&lt;/p&gt;

&lt;p&gt;I show the current Error Budget burn rate. If we are at 10% of our budget for the month, I explain that a failed release could lead to an outage that violates our SLA, potentially costing the company $X per hour in revenue. I suggest a Canary Deployment strategy, releasing to 1% of users first. This allows the PM to get the feature out while limiting the blast radius.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure as Code (IaC) and GitOps at Scale
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Question: Terraform is becoming slow and state locking is a constant issue for our team of 50 engineers. How do you scale your IaC?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What they are looking for:&lt;/strong&gt; Experience with state management and modularization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Senior Answer:&lt;/strong&gt;&lt;br&gt;
The monolithic state is a common failure point. I first implement state splitting, breaking the infrastructure into logical layers (e.g., Networking, Database, Application) so that a change to an app does not require locking the VPC state.&lt;/p&gt;

&lt;p&gt;For teams moving toward massive scale, I evaluate the transition to a programmatic IaC approach using Pulumi or OpenTofu, which allows for better testing and abstraction than HCL. To automate the rollout, I implement a GitOps pipeline using Argo CD or Flux to ensure the cluster state always matches the Git repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud-Native Security (DevSecOps)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Question: How do you implement a "Zero Trust" network in a Kubernetes environment?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What they are looking for:&lt;/strong&gt; Knowledge of Network Policies and eBPF.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Senior Answer:&lt;/strong&gt;&lt;br&gt;
Default Kubernetes networking is flat, meaning any pod can talk to any pod. To implement Zero Trust, I start with a Default Deny Network Policy for all namespaces.&lt;/p&gt;

&lt;p&gt;Then, I use a CNI that supports eBPF, such as Cilium. eBPF allows us to enforce security policies at the kernel level rather than relying on iptables, which provides better performance and deeper visibility into the network flow. I also integrate a service mesh like Istio to enforce Mutual TLS (mTLS) for all service-to-service communication, ensuring that identities are verified via certificates, not just IP addresses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Troubleshooting Scenarios
&lt;/h2&gt;

&lt;p&gt;In senior interviews, you will often get a whiteboard scenario. The interviewer does not want the right answer immediately; they want to see your debugging methodology.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario: "The database CPU is spiking to 90%, but the application traffic (requests per second) is flat. How do you debug this?"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Senior Approach:&lt;/strong&gt;&lt;br&gt;
I follow a top-down diagnostic path:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identify the Workload:&lt;/strong&gt; Is the CPU spike caused by an increase in total queries or is a small number of queries becoming more expensive? I check the Slow Query Log.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check for Locking/Contention:&lt;/strong&gt; I look for long-running transactions or lock waits. A single unoptimized query hitting a table without an index can spike CPU even if traffic is flat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External Factors:&lt;/strong&gt; I check for background jobs. Did a database backup start? Is an ETL process running a massive join?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Exhaustion:&lt;/strong&gt; I check if the DB is swapping to disk or if there is a memory leak causing excessive Garbage Collection.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Scenario: "A new deployment caused a spike in 5xx errors. The pods are running, but the app is failing. What do you do?"
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Senior Approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Immediate Mitigation:&lt;/strong&gt; First, I trigger a rollback to the last known good image using &lt;code&gt;kubectl rollout undo deployment/&amp;lt;deployment-name&amp;gt;&lt;/code&gt;. Speed of recovery is the priority.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log Analysis:&lt;/strong&gt; I check the logs for "Panic" or "Out of Memory" (OOM) errors. If pods are restarting, I check if it is a probe failure (Liveness/Readiness).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diffing:&lt;/strong&gt; I compare the configuration changes between the failed version and the previous version. Was a secret missing? Did an environment variable change?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trace Analysis:&lt;/strong&gt; I use distributed tracing to see if the 5xx is coming from the app itself or a downstream dependency that the new version is calling differently.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the biggest difference between a DevOps Engineer and an SRE?
&lt;/h3&gt;

&lt;p&gt;DevOps is a cultural philosophy focused on breaking down silos between Dev and Ops. SRE is a specific implementation of DevOps. SRE applies software engineering principles to operations problems, focusing on SLIs, SLOs and Error Budgets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which tool should I learn first: Terraform or Pulumi?
&lt;/h3&gt;

&lt;p&gt;Terraform is the industry standard and essential for any resume. However, Pulumi is gaining traction in Platform Engineering because it allows you to use general-purpose languages (TypeScript, Python, Go), making it easier to build complex logic for internal platforms.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I handle "on-call burnout" as a Senior SRE?
&lt;/h3&gt;

&lt;p&gt;Burnout is a systemic failure, not a personal one. I advocate for Operational Load tracking. If the team spends more than 50% of their time on toil (manual, repetitive work), I negotiate with leadership to halt feature work and dedicate a stability sprint to automate the causes of the alerts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion and Next Steps
&lt;/h2&gt;

&lt;p&gt;Passing a Senior SRE interview is about demonstrating that you can think in terms of systems, trade-offs and business risk. You are not just there to keep the lights on; you are there to build a system that can survive the failure of its individual components.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your Action Plan:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit your experience:&lt;/strong&gt; For every project on your resume, identify the trade-off. Why did you choose X over Y? What was the cost?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Master the Golden Signals:&lt;/strong&gt; Be ready to explain exactly how you would measure the reliability of a specific business feature.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Practice the Cell mindset:&lt;/strong&gt; Read up on how companies like AWS and Meta use cell-based architectures to limit blast radius.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hands-on with OTel:&lt;/strong&gt; Deploy an OpenTelemetry collector in a lab environment to understand how traces and metrics flow.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>sreinterviewquestions</category>
      <category>sitereliabilityengineering</category>
      <category>platformengineering</category>
      <category>systemdesignforreliability</category>
    </item>
    <item>
      <title>LLM Observability on Kubernetes: A Practical Guide</title>
      <dc:creator>DevOps Start</dc:creator>
      <pubDate>Wed, 17 Jun 2026 09:03:45 +0000</pubDate>
      <link>https://dev.to/devopsstart/llm-observability-on-kubernetes-a-practical-guide-3i62</link>
      <guid>https://dev.to/devopsstart/llm-observability-on-kubernetes-a-practical-guide-3i62</guid>
      <description>&lt;p&gt;Monitoring traditional applications often feels like a well-trodden path. You set up logs, grab some metrics, and perhaps add a few traces. However, integrating Large Language Models (LLMs) or AI agents, especially when running on Kubernetes, fundamentally changes this paradigm. &lt;strong&gt;LLM observability on Kubernetes&lt;/strong&gt; is a different beast entirely, demanding a more nuanced approach than standard application monitoring.&lt;/p&gt;

&lt;p&gt;This tutorial is designed for DevOps, ML, or platform engineers grappling with the unique challenges of monitoring LLM-powered applications and AI agents on Kubernetes. You'll learn why traditional tools fall short and how to build a practical, end-to-end observability pipeline. We will use battle-tested Kubernetes-native tools like Prometheus, Grafana, Loki, and OpenTelemetry. The tutorial includes hands-on experience with a simple AI agent application, instrumenting it, deploying it to Kubernetes, and setting up a unified observability stack to monitor its performance, cost, and behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Traditional Observability Fails for LLM-Powered AI Agents
&lt;/h2&gt;

&lt;p&gt;Generative AI applications, particularly those powered by LLMs and complex AI agents, introduce a new dimension to observability that traditional methods struggle to address. It's no longer just about CPU and memory; it's about context, coherence, and cost per token. For effective LLM observability on Kubernetes, your standard monitoring stack needs an upgrade.&lt;/p&gt;

&lt;p&gt;Here’s why traditional monitoring falls short for LLMs and AI agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Non-Determinism:&lt;/strong&gt; LLMs are inherently non-deterministic. The same prompt can yield different responses, making it hard to track performance or identify regressions solely through request/response codes. You need to understand the &lt;em&gt;content&lt;/em&gt; and &lt;em&gt;quality&lt;/em&gt; of responses. For example, a successful HTTP 200 response doesn't indicate if an LLM response was a hallucination.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex Prompt/Response Dynamics:&lt;/strong&gt; The interaction isn't a simple input-output. It involves intricate prompt engineering, context windows, and diverse response formats. Observing just the HTTP status code tells you nothing about a hallucination or an off-topic answer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Usage &amp;amp; Cost:&lt;/strong&gt; Every interaction with an LLM consumes tokens, which directly translates to cost, especially with proprietary models like OpenAI's GPT-4. Monitoring token usage per request, per user, or per session is critical for cost control and capacity planning. Traditional metrics simply don't capture this. For example, a simple query might cost fractions of a cent, but 10,000 queries per minute can quickly escalate costs to thousands of dollars daily.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency Nuances:&lt;/strong&gt; LLM response times are often dominated by token generation, not just initial processing. You need to differentiate between prompt processing latency and response streaming latency for accurate performance tuning of your LLM applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Agent Complexity:&lt;/strong&gt; This is where it gets really interesting. AI agents involve multiple steps: planning, tool selection, tool execution, memory management, and iterative reasoning. Each step is a potential failure point. You need to trace the agent's entire decision path, track tool call successes/failures, and understand how the agent arrived at its final answer. A simple error log tells you &lt;em&gt;that&lt;/em&gt; something failed, but not &lt;em&gt;why&lt;/em&gt; the agent chose a particular path or tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Black Box" Nature:&lt;/strong&gt; While you can control the inputs, the internal workings of large foundation models are opaque. Observability needs to shine a light on the model's &lt;em&gt;behavior&lt;/em&gt; at the application layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On Kubernetes, you're adept at monitoring resource utilization at the pod and container level. But for LLM applications, this is only half the story. You need to correlate Kubernetes infrastructure metrics with application-specific LLM and agent metrics to get a complete picture of your LLM observability on Kubernetes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Observability Pillars for LLM Workloads on Kubernetes
&lt;/h2&gt;

&lt;p&gt;The traditional pillars of observability (Logs, Metrics, and Traces) remain foundational, but they need to be adapted and extended for LLM workloads on Kubernetes. This integrated approach is key to achieving comprehensive LLM observability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logs: The Narrative of Your AI Agent's Decisions
&lt;/h3&gt;

&lt;p&gt;For LLM applications and AI agents, logs are more than just error messages. They are the narrative of your agent's reasoning process, crucial for understanding and debugging &lt;strong&gt;LLM observability on Kubernetes&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt/Response Logging:&lt;/strong&gt; Crucial for debugging and understanding model behavior. Log the full input prompt, context, and the LLM's raw response. This helps diagnose why a model might have hallucinated or gone off-topic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Decision Logs:&lt;/strong&gt; For AI agents, log every significant step:

&lt;ul&gt;
&lt;li&gt;Initial plan formulation.&lt;/li&gt;
&lt;li&gt;Tool selection decisions.&lt;/li&gt;
&lt;li&gt;Inputs and outputs of each tool call.&lt;/li&gt;
&lt;li&gt;Re-planning attempts.&lt;/li&gt;
&lt;li&gt;Intermediate reasoning steps.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured Logging:&lt;/strong&gt; Absolutely essential. Use JSON logging to include metadata like &lt;code&gt;trace_id&lt;/code&gt;, &lt;code&gt;span_id&lt;/code&gt;, &lt;code&gt;user_id&lt;/code&gt;, &lt;code&gt;session_id&lt;/code&gt;, &lt;code&gt;model_name&lt;/code&gt;, &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;token_counts&lt;/code&gt;, and &lt;code&gt;safety_flags&lt;/code&gt;. This makes logs queryable and correlatable with metrics and traces.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes Integration:&lt;/strong&gt; Leverage Kubernetes' standard output (stdout/stderr) for logs. A log collector like Fluent Bit, deployed as a DaemonSet, can then ship these structured logs to a centralized logging solution like Loki, Elasticsearch, or Splunk.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Metrics: Quantifying LLM Performance and Cost
&lt;/h3&gt;

&lt;p&gt;Metrics provide the quantitative insights into your LLM application's health, performance, and operational cost. These are vital for effective &lt;strong&gt;LLM observability on Kubernetes&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Application-Level Metrics:&lt;/strong&gt; These are paramount. We'll dive into specific examples shortly, but think latency, token usage, error rates for LLM calls, and specific agent action success rates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Utilization:&lt;/strong&gt; Standard Kubernetes metrics for CPU, memory, network I/O are still vital. For GPU-accelerated inference, monitoring GPU utilization, memory, and temperature is critical. Prometheus can scrape these from Kubernetes &lt;code&gt;kube-state-metrics&lt;/code&gt; and &lt;code&gt;node-exporter&lt;/code&gt; (or specific GPU exporters like &lt;code&gt;DCGM Exporter&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Metrics:&lt;/strong&gt; Beyond just resource utilization, track API calls to external LLMs and internal token consumption. This allows for real-time cost estimation and budgeting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes Integration:&lt;/strong&gt; Prometheus, with its &lt;code&gt;ServiceMonitor&lt;/code&gt; and &lt;code&gt;PodMonitor&lt;/code&gt; custom resources, is perfectly suited for scraping application-level metrics directly from your LLM application pods.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Traces: Following the AI Agent's Chain of Thought
&lt;/h3&gt;

&lt;p&gt;Distributed tracing is arguably the most powerful pillar for debugging complex, multi-step AI agents, offering deep insights into &lt;strong&gt;LLM observability on Kubernetes&lt;/strong&gt;. It visualizes the entire execution path.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;End-to-End Flow:&lt;/strong&gt; A trace provides a timeline view of a single request or agent interaction, spanning across multiple services and internal functions. For an AI agent, this means seeing the initial user query, the LLM call, the tool selection logic, the tool execution, and the final response, all linked together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Span Granularity:&lt;/strong&gt; Create spans for:

&lt;ul&gt;
&lt;li&gt;Incoming request.&lt;/li&gt;
&lt;li&gt;Each LLM API call (input prompt, model, temperature, output response).&lt;/li&gt;
&lt;li&gt;Each step of the agent's reasoning chain (e.g., "planning step", "tool invocation", "context retrieval").&lt;/li&gt;
&lt;li&gt;Each external tool call (e.g., database lookup, external API call).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Propagation:&lt;/strong&gt; Essential for connecting spans across service boundaries. OpenTelemetry automatically handles this for many protocols.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Conventions:&lt;/strong&gt; Use OpenTelemetry's semantic conventions for LLM operations. This ensures consistency and makes traces easier to interpret across different tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes Integration:&lt;/strong&gt; Deploy an OpenTelemetry Collector within your Kubernetes cluster to receive traces from your instrumented applications and export them to a tracing backend like Jaeger or Tempo.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Metrics for LLM Apps &amp;amp; AI Agents on Kubernetes
&lt;/h2&gt;

&lt;p&gt;Let's get specific about the metrics you &lt;em&gt;must&lt;/em&gt; track for LLM-powered applications and AI agents to ensure comprehensive &lt;strong&gt;LLM observability on Kubernetes&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Metrics for LLM Applications
&lt;/h3&gt;

&lt;p&gt;These reveal how quickly and efficiently your LLM application is serving requests.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Processing Latency:&lt;/strong&gt; Time taken from receiving a prompt to sending it to the LLM API.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;histogram&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;llm_prompt_processing_seconds_bucket&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response Generation Latency:&lt;/strong&gt; Time taken for the LLM to generate the full response, or the time until the first token is received for streaming.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;histogram&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;llm_response_generation_seconds_bucket&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Request Latency:&lt;/strong&gt; End-to-end time for a user query.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;histogram&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;llm_agent_total_request_seconds_bucket&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throughput (Queries Per Second):&lt;/strong&gt; Number of requests or agent interactions handled per second.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;counter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;llm_agent_requests_total&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Call Latency:&lt;/strong&gt; Time taken for specific tools invoked by the agent (e.g., database query, external API call).

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;histogram&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;agent_tool_call_seconds_bucket{tool_name="weather_api"}&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Resource Utilization Metrics for LLMs on Kubernetes
&lt;/h3&gt;

&lt;p&gt;While standard Kubernetes metrics cover CPU and memory, pay special attention to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU Utilization:&lt;/strong&gt; Percentage of GPU compute units being used. Critical for local inference.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;gauge&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;gpu_utilization_percentage&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU Memory Usage:&lt;/strong&gt; Amount of memory allocated on the GPU.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;gauge&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;gpu_memory_usage_bytes&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU and Memory (per Pod):&lt;/strong&gt; Standard &lt;code&gt;container_cpu_usage_seconds_total&lt;/code&gt; and &lt;code&gt;container_memory_working_set_bytes&lt;/code&gt; from &lt;code&gt;kube-state-metrics&lt;/code&gt; and &lt;code&gt;node-exporter&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  LLM Cost Monitoring Metrics
&lt;/h3&gt;

&lt;p&gt;Directly impacting your budget, these are often overlooked in initial deployments and are crucial for comprehensive LLM observability.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input Token Count:&lt;/strong&gt; Number of tokens sent in the prompt.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;counter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;llm_input_tokens_total&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Token Count:&lt;/strong&gt; Number of tokens received in the response.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;counter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;llm_output_tokens_total&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total LLM API Calls:&lt;/strong&gt; Number of requests made to the underlying LLM (internal or external).

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;counter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;llm_api_calls_total{model_name="gpt-4"}&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Estimated Cost:&lt;/strong&gt; A derived metric calculated by multiplying token counts or API calls by their respective per-unit costs. This is often best handled in Grafana using PromQL. For example, if input tokens cost $0.01 per 1000 and output tokens cost $0.03 per 1000, you can calculate real-time cost.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; (Derived in Grafana)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example (PromQL):&lt;/em&gt; &lt;code&gt;(sum(rate(llm_input_tokens_total[5m])) / 1000 * 0.01) + (sum(rate(llm_output_tokens_total[5m])) / 1000 * 0.03)&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Model Quality &amp;amp; AI Agent Behavior Metrics
&lt;/h3&gt;

&lt;p&gt;These are more challenging to define but crucial for understanding the LLM's effectiveness and the performance of your AI agents.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Safety Guardrail Activations:&lt;/strong&gt; Count of times a safety or moderation filter was triggered (e.g., content flagged as unsafe).

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;counter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;llm_safety_violations_total&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination Flags:&lt;/strong&gt; If your application has logic to detect potential hallucinations, count these. This is often heuristic.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;counter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;llm_hallucinations_detected_total&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Goal Completion Rate:&lt;/strong&gt; For multi-step agents, track the percentage of interactions where the agent successfully achieved its objective.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;counter&lt;/code&gt; (for successes/failures)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;agent_goal_completions_total&lt;/code&gt;, &lt;code&gt;agent_goal_failures_total&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Call Success/Failure Rates:&lt;/strong&gt; Track how often an agent's chosen tool executed successfully.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;counter&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;agent_tool_call_success_total{tool_name="database_lookup"}&lt;/code&gt;, &lt;code&gt;agent_tool_call_failure_total{tool_name="external_api"}&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Number of Agent Steps/Iterations:&lt;/strong&gt; How many steps or LLM calls an agent took to complete a task. High numbers might indicate inefficiency.

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Prometheus metric type:&lt;/em&gt; &lt;code&gt;histogram&lt;/code&gt; or &lt;code&gt;gauge&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Example:&lt;/em&gt; &lt;code&gt;agent_iteration_steps_count_bucket&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Instrumenting LLM Applications &amp;amp; Agents for Observability
&lt;/h2&gt;

&lt;p&gt;Instrumentation is where you expose the internal state of your LLM application and AI agent. &lt;a href="https://opentelemetry.io/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt; is the gold standard here for its vendor-neutral approach and comprehensive support for traces, metrics, and logs, making it ideal for &lt;strong&gt;LLM observability on Kubernetes&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why OpenTelemetry is Essential for LLM Observability
&lt;/h3&gt;

&lt;p&gt;OpenTelemetry (OTel) provides a set of APIs, SDKs, and tools to instrument your application to generate and export telemetry data. It's language-agnostic and supports various exporters, allowing you to switch observability backends without re-instrumenting your code. For complex distributed systems like Kubernetes-hosted AI agents, OTel's distributed tracing capabilities are invaluable for understanding the flow of LLM interactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Instrumentation Methods for LLM Observability
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. OpenTelemetry for Tracing and Metrics
&lt;/h4&gt;

&lt;p&gt;This is the recommended approach for deep, custom instrumentation for LLM observability.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Traces:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Use the OpenTelemetry SDK for your language (e.g., &lt;code&gt;opentelemetry-python&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Wrap LLM calls, tool calls, and agent steps with spans.&lt;/li&gt;
&lt;li&gt;Add relevant attributes (e.g., &lt;code&gt;model_name&lt;/code&gt;, &lt;code&gt;prompt_hash&lt;/code&gt;, &lt;code&gt;token_counts&lt;/code&gt;, &lt;code&gt;tool_name&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;) to spans.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;While OpenTelemetry can also generate metrics, for simple counter/gauge/histogram metrics that Prometheus can scrape directly, using your language's Prometheus client library (e.g., &lt;code&gt;prometheus_client&lt;/code&gt; for Python) is often simpler for HTTP exposition.&lt;/li&gt;
&lt;li&gt;For richer, more complex metrics or if you want to use OTLP for metrics, OpenTelemetry's metrics API is also powerful.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Framework Callbacks (e.g., LangChain)
&lt;/h4&gt;

&lt;p&gt;If you're using an LLM framework like LangChain, many provide callback systems that are perfect for capturing agent activity for better LLM observability.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangChain Callbacks:&lt;/strong&gt; Implement custom callbacks (&lt;code&gt;BaseCallbackHandler&lt;/code&gt;) to log, emit metrics, or create traces at various points:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;on_llm_start&lt;/code&gt;/&lt;code&gt;on_llm_end&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;on_tool_start&lt;/code&gt;/&lt;code&gt;on_tool_end&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;on_chain_start&lt;/code&gt;/&lt;code&gt;on_chain_end&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;on_agent_action&lt;/code&gt;/&lt;code&gt;on_agent_finish&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrating with OpenTelemetry:&lt;/strong&gt; Within these callbacks, you can explicitly create OpenTelemetry spans and add attributes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Custom Wrappers
&lt;/h4&gt;

&lt;p&gt;For simpler LLM integrations or when frameworks don't offer enough hooks, you can create custom wrapper functions around your LLM API calls and agent logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;prometheus_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.propagate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;set_global_textmap&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.resources&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Resource&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.trace&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TracerProvider&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.trace.export&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ConsoleSpanExporter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;BatchSpanProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.exporter.otlp.proto.grpc.trace_exporter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OTLPSpanExporter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.requests&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestsInstrumentor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.trace.sampling&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ALWAYS_ON&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opel_propagate_b3&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;B3Format&lt;/span&gt; &lt;span class="c1"&gt;# pip install opel-propagate-b3
&lt;/span&gt;
&lt;span class="c1"&gt;# --- OpenTelemetry Setup (for Tracing) ---
# For a real application, you'd configure the OTLPSpanExporter to point to your OpenTelemetry Collector.
# For demonstration, we'll use ConsoleSpanExporter or a local OTLP endpoint if available.
&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm-agent-app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service.version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TracerProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sampler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ALWAYS_ON&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# In a K8s cluster, this endpoint should point to the OTel Collector service.
&lt;/span&gt;&lt;span class="n"&gt;otlp_exporter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OTLPSpanExporter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://otel-collector.observability:4317&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;insecure&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;span_processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BatchSpanProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;otlp_exporter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_span_processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;span_processor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_tracer_provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.agent.tracer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Propagator for B3 headers (commonly used in microservices)
&lt;/span&gt;&lt;span class="nf"&gt;set_global_textmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;B3Format&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Instrument requests library for outgoing HTTP calls (e.g., to actual LLM API)
&lt;/span&gt;&lt;span class="nc"&gt;RequestsInstrumentor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;instrument&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# --- Prometheus Metrics Setup ---
# Latency of the LLM call itself
&lt;/span&gt;&lt;span class="n"&gt;LLM_CALL_LATENCY_SECONDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_call_latency_seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Latency of LLM API calls in seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;20.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Total input and output tokens
&lt;/span&gt;&lt;span class="n"&gt;LLM_INPUT_TOKENS_TOTAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_input_tokens_total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total input tokens processed by LLM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;LLM_OUTPUT_TOKENS_TOTAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_output_tokens_total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total output tokens generated by LLM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Agent tool call success/failure
&lt;/span&gt;&lt;span class="n"&gt;AGENT_TOOL_CALLS_TOTAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent_tool_calls_total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total agent tool calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# status: 'success' or 'failure'
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Agent overall request latency
&lt;/span&gt;&lt;span class="n"&gt;AGENT_REQUEST_LATENCY_SECONDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent_request_latency_seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;End-to-end latency of agent requests in seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;20.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;30.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;60.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --- Structured Logger Setup ---
# Note: In a real Flask app, this would be integrated into the app's logger.
# This is a simplified example.
&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%(message)s&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# The formatter will produce JSON
&lt;/span&gt;    &lt;span class="n"&gt;datefmt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%dT%H:%M:%S%z&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Override the default formatter to inject trace_id and span_id
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;JsonFormatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Formatter&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;log_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;formatTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datefmt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;levelname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;# Default to plain message
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm-agent-app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;component&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Attempt to parse message as JSON and merge
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;msg_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg_dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Ensure a 'message' field is always present
&lt;/span&gt;                &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;pass&lt;/span&gt; &lt;span class="c1"&gt;# Message is not JSON, use original record.getMessage()
&lt;/span&gt;
        &lt;span class="c1"&gt;# Inject trace_id and span_id
&lt;/span&gt;        &lt;span class="n"&gt;current_span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_current_span&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_span&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;current_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_span_context&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;is_valid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_span_context&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_span_context&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Remove default handler and add one with custom formatter
# Note: For Flask, actual integration may differ.
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="p"&gt;[:]:&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;removeHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StreamHandler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setFormatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;JsonFormatter&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# --- Mock LLM and Agent Logic ---
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mock_llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulates an LLM API call.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_api_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.request.type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.prompts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt; &lt;span class="c1"&gt;# Store as JSON string
&lt;/span&gt;
        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# Simulate variable latency
&lt;/span&gt;        &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;

        &lt;span class="c1"&gt;# Simulate token usage
&lt;/span&gt;        &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="c1"&gt;# Example
&lt;/span&gt;        &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="c1"&gt;# Example
&lt;/span&gt;
        &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is a simulated response to: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;response_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

        &lt;span class="c1"&gt;# Simulate agent action suggestion
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;forecast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I recommend using a weather tool.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;response_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New York&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I recommend using a time tool.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;response_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Simulating an LLM error.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No tool suggested.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.tokens.total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.tokens.prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.tokens.completion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http.status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


        &lt;span class="n"&gt;LLM_CALL_LATENCY_SECONDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;LLM_INPUT_TOKENS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;LLM_OUTPUT_TOKENS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_call_completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM call finished&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status_code&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mock_weather_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulates an external weather API call.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.parameters.city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is sunny with 25°C.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;AGENT_TOOL_CALLS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather tool call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mock_time_tool&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulates an external time API call.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The current time is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%H&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;AGENT_TOOL_CALLS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Time tool call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;llm_agent_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request_headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;The core logic of our simple LLM agent.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract trace context from incoming request headers
&lt;/span&gt;    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request_headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_agent_request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user.query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;agent_start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent request initiated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;

        &lt;span class="n"&gt;llm_response_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mock_llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo-mock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;final_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_response_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 2: Agent decision making based on LLM response
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;llm_response_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_response_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent decided to call tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_decision_making&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;decision_span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;decision_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decision.type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;decision_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decision.tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New York&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;tool_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mock_weather_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;final_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Tool result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;tool_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mock_time_tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="n"&gt;final_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Tool result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent attempted to call unknown tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
                    &lt;span class="n"&gt;AGENT_TOOL_CALLS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;failure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;agent_end_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;total_latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_end_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;agent_start_time&lt;/span&gt;
        &lt;span class="n"&gt;AGENT_REQUEST_LATENCY_SECONDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_latency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent.total_latency_seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_latency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent.final_response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_finished&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent request completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_response_summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_latency_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_latency&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_answer&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Python code snippet demonstrates the principles for building LLM observability on Kubernetes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenTelemetry &lt;code&gt;tracer&lt;/code&gt;&lt;/strong&gt;: Used to create spans for the overall agent request, the LLM call, and each tool call. Attributes are added to these spans for rich context. The &lt;code&gt;extract(request_headers)&lt;/code&gt; ensures trace context propagation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus client metrics&lt;/strong&gt;: &lt;code&gt;Histogram&lt;/code&gt; for latencies and &lt;code&gt;Counter&lt;/code&gt; for tokens and tool calls are exposed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured Logging&lt;/strong&gt;: &lt;code&gt;logger.info&lt;/code&gt; calls output JSON logs, including &lt;code&gt;trace_id&lt;/code&gt; and &lt;code&gt;span_id&lt;/code&gt; for easy correlation. The custom &lt;code&gt;JsonFormatter&lt;/code&gt; ensures proper structure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Leveraging Kubernetes-Native Observability Tools for LLMs
&lt;/h2&gt;

&lt;p&gt;Now, let's tie this into your Kubernetes cluster using familiar tools to enhance LLM observability on Kubernetes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prometheus &amp;amp; Grafana for LLM Metrics
&lt;/h3&gt;

&lt;p&gt;Prometheus is the de-facto standard for metric collection in Kubernetes. Grafana provides powerful visualization.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus Operator:&lt;/strong&gt; The easiest way to deploy and manage Prometheus in Kubernetes is using the &lt;a href="https://prometheus-operator.dev/" rel="noopener noreferrer"&gt;Prometheus Operator&lt;/a&gt;. It introduces Custom Resource Definitions (CRDs) like &lt;code&gt;ServiceMonitor&lt;/code&gt; and &lt;code&gt;PodMonitor&lt;/code&gt; that simplify scraping configuration for your LLM applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grafana:&lt;/strong&gt; A leading open-source dashboarding tool that integrates seamlessly with Prometheus for visualizing LLM metrics.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Logging Solutions with Fluent Bit and Loki/Elasticsearch
&lt;/h3&gt;

&lt;p&gt;Centralized logging is non-negotiable for debugging microservices, including LLM-powered applications.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fluent Bit:&lt;/strong&gt; A lightweight and efficient log processor and forwarder. Deploy it as a DaemonSet on your Kubernetes nodes to collect logs from container &lt;code&gt;stdout&lt;/code&gt;/&lt;code&gt;stderr&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loki:&lt;/strong&gt; Grafana Labs' log aggregation system, designed for cost-effectiveness and scalability, especially when paired with Grafana for visualization. It indexes metadata (labels) rather than full log content, which is great for LLM observability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elasticsearch/Kibana:&lt;/strong&gt; Another popular stack for log aggregation and analysis, especially powerful for full-text search.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  OpenTelemetry Collector for Traces and LLM Observability
&lt;/h3&gt;

&lt;p&gt;The OpenTelemetry Collector is an essential component for distributed tracing within your Kubernetes environment.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Role:&lt;/strong&gt; It receives, processes, and exports telemetry data. Your LLM applications send traces to the collector, which then forwards them to your chosen tracing backend (e.g., Jaeger, Tempo).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; Deploy the collector as a &lt;code&gt;Deployment&lt;/code&gt; in your Kubernetes cluster, typically in your &lt;code&gt;observability&lt;/code&gt; namespace.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration:&lt;/strong&gt; Configure it to receive OTLP (OpenTelemetry Protocol) traces, process them (e.g., batching, sampling), and then export them to your backend.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Hands-on Guide: Building LLM Observability Pipeline on Kubernetes
&lt;/h2&gt;

&lt;p&gt;Let's put theory into practice. We'll deploy our simple LLM agent application, then set up Prometheus, Grafana, OpenTelemetry Collector, and Loki to observe it, building out comprehensive &lt;strong&gt;LLM observability on Kubernetes&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A running Kubernetes cluster (Minikube, k3s, or a cloud-managed cluster).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kubectl&lt;/code&gt; (v1.25+) installed and configured to connect to your cluster.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;helm&lt;/code&gt; (v3.10+) installed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docker&lt;/code&gt; installed (to build the application image).&lt;/li&gt;
&lt;li&gt;A Docker Hub account (or other container registry) to push your image.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Prepare the LLM Agent Application
&lt;/h3&gt;

&lt;p&gt;First, let's create our Python Flask application (&lt;code&gt;app.py&lt;/code&gt;) which implements the agent logic and instrumentation we discussed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonify&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;prometheus_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate_latest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;REGISTRY&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.propagate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;set_global_textmap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extract&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.resources&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Resource&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.trace&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TracerProvider&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.trace.export&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;BatchSpanProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.exporter.otlp.proto.grpc.trace_exporter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OTLPSpanExporter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.wsgi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WsgiMiddleware&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.instrumentation.requests&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestsInstrumentor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.trace.sampling&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ALWAYS_ON&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;set_meter_provider&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MeterProvider&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.exporter.prometheus&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PrometheusMetricReader&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opel_propagate_b3&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;B3Format&lt;/span&gt; &lt;span class="c1"&gt;# pip install opel-propagate-b3
&lt;/span&gt;
&lt;span class="c1"&gt;# --- OpenTelemetry Setup ---
# Resource for the service
&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm-agent-app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service.version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k8s.pod.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HOSTNAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k8s.namespace.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KUBERNETES_NAMESPACE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Tracer Provider
&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TracerProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sampler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ALWAYS_ON&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Configure OTLP exporter to send traces to the OpenTelemetry Collector in the 'observability' namespace
&lt;/span&gt;&lt;span class="n"&gt;otlp_exporter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OTLPSpanExporter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://otel-collector.observability:4317&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;insecure&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;span_processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BatchSpanProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;otlp_exporter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_span_processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;span_processor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_tracer_provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.agent.tracer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Propagator for B3 headers (commonly used in microservices for context propagation)
&lt;/span&gt;&lt;span class="nf"&gt;set_global_textmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;B3Format&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Instrument requests library for outgoing HTTP calls (e.g., to actual LLM API)
&lt;/span&gt;&lt;span class="nc"&gt;RequestsInstrumentor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;instrument&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# --- Prometheus Metrics Setup ---
# Latency of the LLM call itself
&lt;/span&gt;&lt;span class="n"&gt;LLM_CALL_LATENCY_SECONDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_call_latency_seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Latency of LLM API calls in seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;20.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Total input and output tokens
&lt;/span&gt;&lt;span class="n"&gt;LLM_INPUT_TOKENS_TOTAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_input_tokens_total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total input tokens processed by LLM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;LLM_OUTPUT_TOKENS_TOTAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_output_tokens_total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total output tokens generated by LLM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Agent tool call success/failure
&lt;/span&gt;&lt;span class="n"&gt;AGENT_TOOL_CALLS_TOTAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent_tool_calls_total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total agent tool calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# status: 'success' or 'failure'
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Agent overall request latency
&lt;/span&gt;&lt;span class="n"&gt;AGENT_REQUEST_LATENCY_SECONDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Histogram&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;agent_request_latency_seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;End-to-end latency of agent requests in seconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;20.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;30.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;60.0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --- Structured Logger Setup ---
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;JsonFormatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Formatter&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Base log data
&lt;/span&gt;        &lt;span class="n"&gt;log_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;formatTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datefmt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;levelname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm-agent-app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;component&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lineno&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lineno&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;funcName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;funcName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Attempt to parse message as JSON and merge. This supports logger.info(json.dumps({"event": "..."}))
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;msg_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# If the JSON message didn't contain a 'message' field, keep the default one
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg_dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;pass&lt;/span&gt; &lt;span class="c1"&gt;# Message is not JSON, use original record.getMessage() as the 'message' field
&lt;/span&gt;
        &lt;span class="c1"&gt;# Inject trace_id and span_id if available
&lt;/span&gt;        &lt;span class="n"&gt;current_span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_current_span&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_span&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;current_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_span_context&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;is_valid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_span_context&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_span_context&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;span_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setLevel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Clear existing handlers to prevent duplicate logs from Flask's default logger
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="p"&gt;[:]:&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;removeHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StreamHandler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setFormatter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;JsonFormatter&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# --- Mock LLM and Agent Logic ---
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mock_llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulates an LLM API call.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_api_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.request.type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.prompts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;([{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]))&lt;/span&gt; &lt;span class="c1"&gt;# Store as JSON string
&lt;/span&gt;
        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# Simulate variable latency
&lt;/span&gt;        &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;

        &lt;span class="c1"&gt;# Simulate token usage
&lt;/span&gt;        &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="c1"&gt;# Example: base 10 tokens + words
&lt;/span&gt;        &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="c1"&gt;# Example: base 50 tokens + half of input words
&lt;/span&gt;
        &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is a simulated response to: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;response_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

        &lt;span class="c1"&gt;# Simulate agent action suggestion
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;forecast&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I recommend using a weather tool.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;response_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New York&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I recommend using a time tool.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;response_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Simulating an LLM error.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No tool suggested.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.tokens.total&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.tokens.prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.tokens.completion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm.response.content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http.status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;LLM_CALL_LATENCY_SECONDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;LLM_INPUT_TOKENS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;LLM_OUTPUT_TOKENS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_call_completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM call finished&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status_code&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mock_weather_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulates an external weather API call.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.parameters.city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is sunny with 25°C.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;AGENT_TOOL_CALLS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather tool call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;mock_time_tool&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulates an external time API call.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The current time is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%H&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool.result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;AGENT_TOOL_CALLS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Time tool call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;llm_agent_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request_headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;The core logic of our simple LLM agent.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract trace context from incoming request headers for distributed tracing
&lt;/span&gt;    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_agent_request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user.query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;agent_start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent request initiated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;

        &lt;span class="n"&gt;llm_response_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mock_llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo-mock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;final_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_response_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 2: Agent decision making based on LLM response
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;llm_response_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_response_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent decided to call tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_as_current_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_decision_making&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;decision_span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;decision_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decision.type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;decision_span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;decision.tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New York&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;tool_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mock_weather_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;final_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Tool result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;tool_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mock_time_tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="n"&gt;final_answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Tool result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown_tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent attempted to call unknown tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
                    &lt;span class="n"&gt;AGENT_TOOL_CALLS_TOTAL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;failure&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;inc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;agent_end_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;total_latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_end_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;agent_start_time&lt;/span&gt;
        &lt;span class="n"&gt;AGENT_REQUEST_LATENCY_SECONDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;observe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_latency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent.total_latency_seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_latency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent.final_response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_finished&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent request completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_response_summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_latency_sec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_latency&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_answer&lt;/span&gt;

&lt;span class="c1"&gt;# --- Flask App ---
&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Wrap Flask app with OpenTelemetry WSGI middleware for automatic request tracing
&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wsgi_app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WsgiMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wsgi_app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/healthz&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;healthz&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/metrics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GET&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Expose Prometheus metrics.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;generate_latest&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/query&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_agent&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Handle LLM agent queries.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prompt is required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;

    &lt;span class="c1"&gt;# Pass request headers for context propagation
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;llm_agent_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# PrometheusMetricReader needs to be set up globally for default registry
&lt;/span&gt;    &lt;span class="n"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PrometheusMetricReader&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;meter_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MeterProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metric_readers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;set_meter_provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;meter_provider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, create a &lt;code&gt;Dockerfile&lt;/code&gt; for our application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Dockerfile&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.10-slim-buster&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; app.py .&lt;/span&gt;

&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 5000&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python", "app.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And &lt;code&gt;requirements.txt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Flask==2.3.3
prometheus_client==0.18.0
opentelemetry-api==1.25.0
opentelemetry-sdk==1.25.0
opentelemetry-exporter-otlp-proto-grpc==1.25.0
opentelemetry-instrumentation-requests==0.45b0
opentelemetry-instrumentation-wsgi==0.45b0
opel-propagate-b3==1.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build and push your Docker image. Remember to replace &lt;code&gt;your-dockerhub-user&lt;/code&gt; with your actual Docker Hub username.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; your-dockerhub-user/llm-agent-app:v1.0.0 &lt;span class="nb"&gt;.&lt;/span&gt;
docker push your-dockerhub-user/llm-agent-app:v1.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Deploy Observability Stack to Kubernetes
&lt;/h3&gt;

&lt;p&gt;We'll use Helm to deploy the core components for robust LLM observability. Create an &lt;code&gt;observability&lt;/code&gt; namespace first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create namespace observability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2.1 Deploy Prometheus Operator and Grafana
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

&lt;span class="c"&gt;# Install kube-prometheus-stack which includes Prometheus, Grafana, and Alertmanager&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;prometheus prometheus-community/kube-prometheus-stack &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; observability &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; prometheus.prometheusSpec.serviceMonitorSelectorNilUsesLabels&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; prometheus.prometheusSpec.podMonitorSelectorNilUsesLabels&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; grafana.enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; grafana.adminPassword&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"prom-operator"&lt;/span&gt; &lt;span class="se"&gt;\ &lt;/span&gt; &lt;span class="c"&gt;# pragma: allowlist secret&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; grafana.service.type&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"LoadBalancer"&lt;/span&gt; &lt;span class="c"&gt;# Use "NodePort" for Minikube or local clusters&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait for Prometheus and Grafana pods to be ready. You can get Grafana's LoadBalancer IP (or NodePort):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get svc &lt;span class="nt"&gt;-n&lt;/span&gt; observability grafana
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Log in to Grafana using &lt;code&gt;admin&lt;/code&gt; as the username and &lt;code&gt;prom-operator&lt;/code&gt; as the password.&lt;/p&gt;

&lt;h4&gt;
  
  
  2.2 Deploy Loki and Fluent Bit for Logging
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

&lt;span class="c"&gt;# Install Loki&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;loki grafana/loki &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; observability &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; service.type&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ClusterIP"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; persistence.enabled&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; persistence.size&lt;span class="o"&gt;=&lt;/span&gt;10Gi &lt;span class="c"&gt;# Adjust size as needed&lt;/span&gt;

&lt;span class="c"&gt;# Install Fluent Bit&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;fluent-bit grafana/fluent-bit &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; observability &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.service.flush&lt;span class="o"&gt;=&lt;/span&gt;1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.inputs[0].name&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.inputs[0].path&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/var/log/containers/*.log"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.inputs[0].multiline.parser&lt;span class="o"&gt;=&lt;/span&gt;docker,cri &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.inputs[0].db&lt;span class="o"&gt;=&lt;/span&gt;/var/log/flb_kube.db &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.inputs[0].mem_buf_limit&lt;span class="o"&gt;=&lt;/span&gt;5MB &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.outputs[0].name&lt;span class="o"&gt;=&lt;/span&gt;loki &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.outputs[0].host&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"loki.observability.svc.cluster.local"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.outputs[0].port&lt;span class="o"&gt;=&lt;/span&gt;3100 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.outputs[0].labels&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"job=fluent-bit"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.outputs[0].removeKeys&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"kubernetes.host,kubernetes.labels,kubernetes.annotations,kubernetes.pod_id,kubernetes.container_id,kubernetes.docker_id,kubernetes.container_hash,kubernetes.container_image,kubernetes.daemonset,kubernetes.deployment"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; config.outputs[0].labelMapPath&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/fluent-bit/config/label_map.json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; extraFiles.label_map_json&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;kubernetes&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: {&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;container_name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;container&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;namespace_name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;namespace&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;pod_name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;pod&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}}"&lt;/span&gt; &lt;span class="c"&gt;# Maps K8s metadata to Loki labels&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2.3 Deploy OpenTelemetry Collector
&lt;/h4&gt;

&lt;p&gt;Create &lt;code&gt;otel-collector.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# otel-collector.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;observability&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel/opentelemetry-collector:0.100.0&lt;/span&gt; &lt;span class="c1"&gt;# Use a specific, stable version&lt;/span&gt;
        &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/otelcol"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--config=/conf/otel-collector-config.yaml"&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4317&lt;/span&gt; &lt;span class="c1"&gt;# OTLP gRPC receiver&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4318&lt;/span&gt; &lt;span class="c1"&gt;# OTLP HTTP receiver&lt;/span&gt;
        &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector-config-vol&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/conf&lt;/span&gt;
      &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector-config-vol&lt;/span&gt;
        &lt;span class="na"&gt;configMap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector-config&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otel-collector-config&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;observability&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;otel-collector-config.yaml&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;receivers:&lt;/span&gt;
      &lt;span class="s"&gt;otlp:&lt;/span&gt;
        &lt;span class="s"&gt;protocols:&lt;/span&gt;
          &lt;span class="s"&gt;grpc:&lt;/span&gt;
          &lt;span class="s"&gt;http:&lt;/span&gt;
    &lt;span class="s"&gt;processors:&lt;/span&gt;
      &lt;span class="s"&gt;batch: # Batching for efficiency&lt;/span&gt;
        &lt;span class="s"&gt;send_batch_size: 100&lt;/span&gt;
        &lt;span class="s"&gt;timeout: 10s&lt;/span&gt;
    &lt;span class="s"&gt;exporters:&lt;/span&gt;
      &lt;span class="s"&gt;logging: # For demo purposes, logs traces to stdout. Replace with Jaeger/Tempo for a full setup.&lt;/span&gt;
        &lt;span class="s"&gt;verbosity: detailed&lt;/span&gt;
      &lt;span class="s"&gt;# Example for Jaeger/Tempo exporters:&lt;/span&gt;
      &lt;span class="s"&gt;# jaeger:&lt;/span&gt;
      &lt;span class="s"&gt;#   endpoint: "jaeger-collector.observability:14250" # Assuming Jaeger is deployed&lt;/span&gt;
      &lt;span class="s"&gt;#   tls:&lt;/span&gt;
      &lt;span class="s"&gt;#     insecure: true&lt;/span&gt;
      &lt;span class="s"&gt;# tempo:&lt;/span&gt;
      &lt;span class="s"&gt;#   endpoint: "tempo.observability:4317" # Assuming Tempo is deployed&lt;/span&gt;
      &lt;span class="s"&gt;#   tls:&lt;/span&gt;
      &lt;span class="s"&gt;#     insecure: true&lt;/span&gt;
    &lt;span class="s"&gt;service:&lt;/span&gt;
      &lt;span class="s"&gt;pipelines:&lt;/span&gt;
        &lt;span class="s"&gt;traces:&lt;/span&gt;
          &lt;span class="s"&gt;receivers: [otlp]&lt;/span&gt;
          &lt;span class="s"&gt;processors: [batch]&lt;/span&gt;
          &lt;span class="s"&gt;exporters: [logging] # Change to [jaeger] or [tempo] if you have them configured&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; otel-collector.yaml &lt;span class="nt"&gt;-n&lt;/span&gt; observability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Deploy the LLM Agent Application to Kubernetes
&lt;/h3&gt;

&lt;p&gt;Now, deploy your instrumented application. Create &lt;code&gt;llm-agent-app.yaml&lt;/code&gt;. Remember to replace &lt;code&gt;your-dockerhub-user&lt;/code&gt; with your actual Docker Hub username.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# llm-agent-app.yaml&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt; &lt;span class="c1"&gt;# Deploy in default or your app namespace&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app&lt;/span&gt;
      &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;prometheus.io/scrape&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt; &lt;span class="c1"&gt;# Enable Prometheus scraping&lt;/span&gt;
        &lt;span class="na"&gt;prometheus.io/port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5000"&lt;/span&gt;   &lt;span class="c1"&gt;# Port where metrics are exposed by the application&lt;/span&gt;
        &lt;span class="na"&gt;prometheus.io/path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/metrics"&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-dockerhub-user/llm-agent-app:v1.0.0&lt;/span&gt; &lt;span class="c1"&gt;# Replace with your image&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5000&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http-app&lt;/span&gt; &lt;span class="c1"&gt;# This port will serve both the application and /metrics&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;KUBERNETES_NAMESPACE&lt;/span&gt;
          &lt;span class="na"&gt;valueFrom&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;fieldRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;fieldPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;metadata.namespace&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;200m"&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;512Mi"&lt;/span&gt;
          &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1000m"&lt;/span&gt;
            &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1Gi"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt; &lt;span class="c1"&gt;# Service exposed port&lt;/span&gt;
    &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http-app&lt;/span&gt; &lt;span class="c1"&gt;# Maps to containerPort: 5000&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt; &lt;span class="c1"&gt;# Name of this service port&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LoadBalancer&lt;/span&gt; &lt;span class="c1"&gt;# Use "NodePort" for local clusters like Minikube/k3s&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;monitoring.coreos.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceMonitor&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app-sm&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;observability&lt;/span&gt; &lt;span class="c1"&gt;# ServiceMonitor should be in the same namespace as Prometheus&lt;/span&gt;
  &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;release&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prometheus&lt;/span&gt; &lt;span class="c1"&gt;# This label links to the Prometheus instance from kube-prometheus-stack&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llm-agent-app&lt;/span&gt; &lt;span class="c1"&gt;# Selects services with this label&lt;/span&gt;
  &lt;span class="na"&gt;endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http&lt;/span&gt; &lt;span class="c1"&gt;# Name of the port in the Service that exposes the metrics endpoint&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/metrics&lt;/span&gt;
    &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;
    &lt;span class="na"&gt;scrapeTimeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
  &lt;span class="na"&gt;namespaceSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchNames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt; &lt;span class="c1"&gt;# Or the namespace where your app is deployed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; llm-agent-app.yaml &lt;span class="nt"&gt;-n&lt;/span&gt; default
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait for the &lt;code&gt;llm-agent-app&lt;/code&gt; pod and service to be ready. Get its external IP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get svc llm-agent-app &lt;span class="nt"&gt;-n&lt;/span&gt; default
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Interact with the Agent to Generate Data
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;curl&lt;/code&gt; or a simple script to send queries to your agent. This will generate logs, metrics, and traces, populating your LLM observability stack.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For LoadBalancer, get the external IP&lt;/span&gt;
&lt;span class="nv"&gt;AGENT_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get svc llm-agent-app &lt;span class="nt"&gt;-n&lt;/span&gt; default &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.loadBalancer.ingress[0].ip}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# For NodePort (e.g., Minikube), get Minikube IP and NodePort&lt;/span&gt;
&lt;span class="c"&gt;# AGENT_IP=$(minikube ip)&lt;/span&gt;
&lt;span class="c"&gt;# AGENT_PORT=$(kubectl get svc llm-agent-app -n default -o jsonpath='{.spec.ports[?(@.name=="http")].nodePort}')&lt;/span&gt;
&lt;span class="c"&gt;# export AGENT_URL="http://$AGENT_IP:$AGENT_PORT"&lt;/span&gt;

&lt;span class="c"&gt;# Use AGENT_IP for LoadBalancer, or AGENT_URL for NodePort&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TARGET_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://&lt;/span&gt;&lt;span class="nv"&gt;$AGENT_IP&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="c"&gt;# or $AGENT_URL if using NodePort&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Sending requests to &lt;/span&gt;&lt;span class="nv"&gt;$TARGET_URL&lt;/span&gt;&lt;span class="s2"&gt;/query"&lt;/span&gt;

&lt;span class="c"&gt;# Send some queries&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "What is the weather like in London today?"}'&lt;/span&gt; &lt;span class="nv"&gt;$TARGET_URL&lt;/span&gt;/query
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "Tell me a fun fact about Kubernetes."}'&lt;/span&gt; &lt;span class="nv"&gt;$TARGET_URL&lt;/span&gt;/query
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "What time is it?"}'&lt;/span&gt; &lt;span class="nv"&gt;$TARGET_URL&lt;/span&gt;/query
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "Simulate an error now."}'&lt;/span&gt; &lt;span class="nv"&gt;$TARGET_URL&lt;/span&gt;/query &lt;span class="c"&gt;# Test error paths&lt;/span&gt;

&lt;span class="c"&gt;# Send more queries to generate enough data for dashboards&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;i &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;seq &lt;/span&gt;1 10&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "Give me another interesting fact."}'&lt;/span&gt; &lt;span class="nv"&gt;$TARGET_URL&lt;/span&gt;/query &amp;amp;&amp;gt;/dev/null
  &lt;span class="nb"&gt;sleep &lt;/span&gt;0.5
  curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "What is the current weather?"}'&lt;/span&gt; &lt;span class="nv"&gt;$TARGET_URL&lt;/span&gt;/query &amp;amp;&amp;gt;/dev/null
  &lt;span class="nb"&gt;sleep &lt;/span&gt;0.5
&lt;span class="k"&gt;done

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Requests sent. Data should now be flowing into your observability stack."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repeat these calls a few times to generate sufficient data for visualization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Visualize LLM Observability Data in Grafana
&lt;/h3&gt;

&lt;p&gt;Access Grafana using the LoadBalancer IP or NodePort you obtained earlier.&lt;/p&gt;

&lt;h4&gt;
  
  
  5.1 Prometheus Dashboard for LLM Metrics
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;Connections -&amp;gt; Data sources&lt;/strong&gt; and ensure Prometheus is configured (it should be automatically set up by &lt;code&gt;kube-prometheus-stack&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Create a new Dashboard (&lt;strong&gt;+ -&amp;gt; New Dashboard&lt;/strong&gt;).&lt;/li&gt;
&lt;li&gt;Add new panels with the following PromQL queries to monitor your LLM observability on Kubernetes:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LLM Call Latency (95th Percentile):&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Query: &lt;code&gt;histogram_quantile(0.95, sum by(le, model_name) (rate(llm_call_latency_seconds_bucket{app="llm-agent-app", namespace="default"}[5m])))&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM Token Usage (Input/Output Rate):&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Query: &lt;code&gt;sum by(model_name) (rate(llm_input_tokens_total{app="llm-agent-app", namespace="default"}[5m]))&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Query: &lt;code&gt;sum by(model_name) (rate(llm_output_tokens_total{app="llm-agent-app", namespace="default"}[5m]))&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Request Latency (99th Percentile):&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Query: &lt;code&gt;histogram_quantile(0.99, sum by(le) (rate(agent_request_latency_seconds_bucket{app="llm-agent-app", namespace="default"}[5m])))&lt;/code&gt; (99th percentile end-to-end agent latency)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Tool Call Success/Failure Rates:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Query: &lt;code&gt;sum by(tool_name, status) (rate(agent_tool_calls_total{app="llm-agent-app", namespace="default"}[5m]))&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Estimated Cost (Example, adjust token costs as per your LLM provider):&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Query: &lt;code&gt;(sum(rate(llm_input_tokens_total{app="llm-agent-app", namespace="default"}[5m])) / 1000 * 0.01) + (sum(rate(llm_output_tokens_total{app="llm-agent-app", namespace="default"}[5m])) / 1000 * 0.03)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Explanation:&lt;/em&gt; This query calculates an estimated cost based on a hypothetical rate of $0.01 per 1000 input tokens and $0.03 per 1000 output tokens.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  5.2 Loki Dashboard for LLM Logs
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;Connections -&amp;gt; Data sources&lt;/strong&gt; and add Loki as a new data source:

&lt;ul&gt;
&lt;li&gt;Name: &lt;code&gt;Loki&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;URL: &lt;code&gt;http://loki.observability.svc.cluster.local:3100&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Add a new panel to your dashboard. Set the visualization type to &lt;code&gt;Logs&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Select &lt;code&gt;Loki&lt;/code&gt; as your data source.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Log Queries (examples):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Show all logs from your app: &lt;code&gt;{container="llm-agent-app", namespace="default"}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Filter for LLM calls: &lt;code&gt;{container="llm-agent-app", namespace="default"} | json | event="llm_call_completed"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Filter for agent tool calls: &lt;code&gt;{container="llm-agent-app", namespace="default"} | json | event="tool_call"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Filter for specific trace ID: &lt;code&gt;{container="llm-agent-app", namespace="default"} | json | trace_id="&amp;lt;your-trace-id&amp;gt;"&lt;/code&gt; (You can pick a &lt;code&gt;trace_id&lt;/code&gt; from the Prometheus data or another log query).&lt;/li&gt;
&lt;li&gt;Filter for LLM errors: &lt;code&gt;{container="llm-agent-app", namespace="default"} | json | status_code="500"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;| json&lt;/code&gt; pipe command is crucial as our application emits structured JSON logs, allowing you to filter and parse fields within the log lines. You can then use &lt;code&gt;| line_format "{{.message}}"&lt;/code&gt; to display only the message, &lt;code&gt;| level="ERROR"&lt;/code&gt; to filter by log level, or &lt;code&gt;| prompt=~".*weather.*"&lt;/code&gt; to search within specific fields.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  5.3 OpenTelemetry Traces (Optional: If Jaeger/Tempo is set up)
&lt;/h4&gt;

&lt;p&gt;If you had deployed a full tracing backend like Jaeger (or Tempo), you would typically configure Jaeger as a data source in Grafana. Then you could navigate to the Traces section in Grafana, search by service name (&lt;code&gt;llm-agent-app&lt;/code&gt;) and trace ID to visualize the full agent execution flow with all its spans, greatly enhancing your LLM observability.&lt;/p&gt;

&lt;p&gt;For this tutorial, since we configured the OpenTelemetry Collector to log traces to stdout, you can inspect the collector's logs to see the trace data being received.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; observability deployment/otel-collector
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see detailed output for each trace, confirming that your application is successfully sending trace data to the collector. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"resource"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="s2"&gt;"attributes"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
      &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"key"&lt;/span&gt;: &lt;span class="s2"&gt;"service.name"&lt;/span&gt;, &lt;span class="s2"&gt;"value"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"stringValue"&lt;/span&gt;: &lt;span class="s2"&gt;"llm-agent-app"&lt;/span&gt;&lt;span class="o"&gt;}}&lt;/span&gt;,
      &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"key"&lt;/span&gt;: &lt;span class="s2"&gt;"k8s.pod.name"&lt;/span&gt;, &lt;span class="s2"&gt;"value"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"stringValue"&lt;/span&gt;: &lt;span class="s2"&gt;"llm-agent-app-..."&lt;/span&gt;&lt;span class="o"&gt;}}&lt;/span&gt;,
      &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"key"&lt;/span&gt;: &lt;span class="s2"&gt;"k8s.namespace.name"&lt;/span&gt;, &lt;span class="s2"&gt;"value"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"stringValue"&lt;/span&gt;: &lt;span class="s2"&gt;"default"&lt;/span&gt;&lt;span class="o"&gt;}}&lt;/span&gt;
    &lt;span class="o"&gt;]&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="s2"&gt;"scopeSpans"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
    &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="s2"&gt;"scope"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"name"&lt;/span&gt;: &lt;span class="s2"&gt;"llm.agent.tracer"&lt;/span&gt;, &lt;span class="s2"&gt;"version"&lt;/span&gt;: &lt;span class="s2"&gt;"1.0.0"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
      &lt;span class="s2"&gt;"spans"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
        &lt;span class="o"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"traceId"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;,
          &lt;span class="s2"&gt;"spanId"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;,
          &lt;span class="s2"&gt;"parentSpanId"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;,
          &lt;span class="s2"&gt;"name"&lt;/span&gt;: &lt;span class="s2"&gt;"llm_agent_request"&lt;/span&gt;,
          &lt;span class="s2"&gt;"kind"&lt;/span&gt;: &lt;span class="s2"&gt;"SPAN_KIND_SERVER"&lt;/span&gt;, &lt;span class="c"&gt;# Due to WSGIMiddleware&lt;/span&gt;
          &lt;span class="s2"&gt;"startTimeUnixNano"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;,
          &lt;span class="s2"&gt;"endTimeUnixNano"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;,
          &lt;span class="s2"&gt;"attributes"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
            &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"key"&lt;/span&gt;: &lt;span class="s2"&gt;"user.query"&lt;/span&gt;, &lt;span class="s2"&gt;"value"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"stringValue"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="o"&gt;}}&lt;/span&gt;,
            &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"key"&lt;/span&gt;: &lt;span class="s2"&gt;"agent.total_latency_seconds"&lt;/span&gt;, &lt;span class="s2"&gt;"value"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"doubleValue"&lt;/span&gt;: 1.23&lt;span class="o"&gt;}}&lt;/span&gt;,
            ...
          &lt;span class="o"&gt;]&lt;/span&gt;,
          &lt;span class="s2"&gt;"events"&lt;/span&gt;: &lt;span class="o"&gt;[]&lt;/span&gt;,
          &lt;span class="s2"&gt;"status"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"code"&lt;/span&gt;: &lt;span class="s2"&gt;"STATUS_CODE_UNSET"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;,
        &lt;span class="o"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"traceId"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;,
          &lt;span class="s2"&gt;"spanId"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;,
          &lt;span class="s2"&gt;"parentSpanId"&lt;/span&gt;: &lt;span class="s2"&gt;"..."&lt;/span&gt;, &lt;span class="c"&gt;# Parent will be llm_agent_request's spanId&lt;/span&gt;
          &lt;span class="s2"&gt;"name"&lt;/span&gt;: &lt;span class="s2"&gt;"llm_api_call"&lt;/span&gt;,
          &lt;span class="s2"&gt;"kind"&lt;/span&gt;: &lt;span class="s2"&gt;"SPAN_KIND_INTERNAL"&lt;/span&gt;,
          &lt;span class="s2"&gt;"attributes"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;
            &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"key"&lt;/span&gt;: &lt;span class="s2"&gt;"model_name"&lt;/span&gt;, &lt;span class="s2"&gt;"value"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"stringValue"&lt;/span&gt;: &lt;span class="s2"&gt;"gpt-3.5-turbo-mock"&lt;/span&gt;&lt;span class="o"&gt;}}&lt;/span&gt;,
            &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"key"&lt;/span&gt;: &lt;span class="s2"&gt;"llm.response.tokens.total"&lt;/span&gt;, &lt;span class="s2"&gt;"value"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"intValue"&lt;/span&gt;: 120&lt;span class="o"&gt;}}&lt;/span&gt;,
            ...
          &lt;span class="o"&gt;]&lt;/span&gt;,
          ...
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="c"&gt;# ... more spans for agent_decision_making, tool_call_weather etc.&lt;/span&gt;
      &lt;span class="o"&gt;]&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This confirms traces are being generated and processed by the collector. Integrating a full Jaeger or Tempo setup is a worthy next step, but is beyond the immediate scope of this tutorial.&lt;/p&gt;

&lt;h2&gt;
  
  
  Actionable Insights, Alerting, and Troubleshooting for LLM Observability
&lt;/h2&gt;

&lt;p&gt;Collecting data is only half the battle. You need to interpret it to derive actionable insights and set up effective alerts for your &lt;strong&gt;LLM observability on Kubernetes&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interpreting LLM Observability Data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance Bottlenecks:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High LLM Call Latency (P95/P99):&lt;/strong&gt; Check &lt;code&gt;llm_call_latency_seconds_bucket&lt;/code&gt;. Is it the model itself, network, or rate limiting from the LLM provider? If your average LLM call latency is consistently above 5 seconds, it might indicate network issues or an overloaded LLM endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High Agent Request Latency:&lt;/strong&gt; Examine traces (&lt;code&gt;llm_agent_request&lt;/code&gt; span) to pinpoint which step (LLM call, tool call, internal logic) is contributing most to the delay. Look for unusually long &lt;code&gt;tool_call&lt;/code&gt; spans.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low Throughput:&lt;/strong&gt; Is &lt;code&gt;llm_agent_requests_total&lt;/code&gt; not meeting expectations? Check CPU/memory utilization of the pod. Is the LLM model slow or is the application itself bottlenecked? A drop of 20% in QPS without a corresponding reduction in load might indicate an issue.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Overruns:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Spikes in Token Usage:&lt;/strong&gt; Monitor &lt;code&gt;llm_input_tokens_total&lt;/code&gt; and &lt;code&gt;llm_output_tokens_total&lt;/code&gt;. Are prompts getting unexpectedly long? Is the model generating excessively verbose responses? This could indicate a prompt engineering issue or model drift. An sudden increase of 50% in output tokens per request could drastically raise costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frequent LLM API Calls:&lt;/strong&gt; &lt;code&gt;llm_api_calls_total&lt;/code&gt; can highlight agents engaging in too many iterative LLM calls, indicating inefficiency.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Degradation &amp;amp; Agent Failures:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Increased LLM Error Rates:&lt;/strong&gt; Monitor &lt;code&gt;sum by(status_code) (rate(llm_call_latency_seconds_count{status_code!="200"}[5m]))&lt;/code&gt;. Correlate with logs to see actual error messages and prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Increased Agent Goal Failures:&lt;/strong&gt; &lt;code&gt;agent_goal_failures_total&lt;/code&gt;. What type of queries are failing? Dive into traces for these failed requests to see the agent's decision path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decreased Tool Call Success Rates:&lt;/strong&gt; &lt;code&gt;agent_tool_call_failure_total&lt;/code&gt;. Is a specific external tool failing? This points to external dependency issues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unexpected Agent Behavior:&lt;/strong&gt; Use Loki to search for keywords in prompt/response logs or agent decision logs. "Why did the agent choose X tool here?" can often be answered by reviewing the log of its thought process.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setting Up Effective Alerts for LLM Workloads
&lt;/h3&gt;

&lt;p&gt;Alerts should be proactive and actionable. Use Grafana Alerting (integrated with Prometheus Alertmanager).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Critical Alerts (PagerDuty, Slack):&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High Error Rate:&lt;/strong&gt; &lt;code&gt;sum(rate(llm_call_latency_seconds_count{status_code!="200"}[5m])) &amp;gt; 5&lt;/code&gt; (more than 5 LLM errors per 5 minutes for instance).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service Unavailability:&lt;/strong&gt; &lt;code&gt;up{app="llm-agent-app", namespace="default"} == 0&lt;/code&gt; (agent service is down).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critical Latency Spike:&lt;/strong&gt; &lt;code&gt;histogram_quantile(0.99, sum by(le) (rate(agent_request_latency_seconds_bucket{app="llm-agent-app", namespace="default"}[5m]))) &amp;gt; 15&lt;/code&gt; (99th percentile request latency exceeds 15 seconds).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Warning Alerts (Slack, Email):&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High Token Consumption Rate:&lt;/strong&gt; &lt;code&gt;sum(rate(llm_output_tokens_total{app="llm-agent-app", namespace="default"}[1h])) &amp;gt; 1000000&lt;/code&gt; (e.g., more than 1 million output tokens in the last hour, indicating potential cost overrun).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Increased Agent Tool Failures:&lt;/strong&gt; &lt;code&gt;sum(rate(agent_tool_calls_total{status="failure", app="llm-agent-app", namespace="default"}[5m])) &amp;gt; 1&lt;/code&gt; (any tool failures in a 5-minute window could warrant investigation).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High GPU Utilization:&lt;/strong&gt; &lt;code&gt;gpu_utilization_percentage{pod="llm-agent-app-..."} &amp;gt; 90&lt;/code&gt; (indicates resource saturation, potentially leading to performance degradation on GPU-accelerated workloads).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Informational Alerts:&lt;/strong&gt; Track trends without immediate action, e.g., daily cost reports.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Troubleshooting Workflow for LLM Observability
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Alert Triggered:&lt;/strong&gt; Receive an alert about high latency or errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard Check:&lt;/strong&gt; Go to your Grafana dashboard. Look at the specific metric that triggered the alert. Correlate with other metrics (e.g., if latency is high, are CPU/memory also high? Are token counts spiking?).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logs Investigation:&lt;/strong&gt; If metrics point to an application-specific issue, jump to Loki. Use the trace ID from the metric context (if available, or infer from timestamps) to find relevant logs. Search for errors, warnings, or specific agent decision steps around the time of the incident. Review the full prompt/response pairs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Traces Deep Dive:&lt;/strong&gt; If the issue is complex and spans multiple steps or services (especially for agents), use your tracing backend (Jaeger/Tempo). Find the trace for a representative failing request. Visualize the spans to identify the exact step (LLM call, tool call, external API) that introduced latency or an error. Examine attributes attached to spans for detailed context like LLM model, parameters, or tool call arguments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identify Root Cause:&lt;/strong&gt; Based on the correlated data, pinpoint the root cause: an inefficient prompt, a slow external tool, a bug in agent logic, or an overloaded Kubernetes node.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remediate and Verify:&lt;/strong&gt; Implement the fix, then monitor your observability stack to verify the issue is resolved and new alerts are not triggered.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions About LLM Observability on Kubernetes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is LLM observability and why is it important on Kubernetes?
&lt;/h3&gt;

&lt;p&gt;LLM observability is the ability to understand the internal state, performance, cost, and behavior of Large Language Model (LLM) powered applications and AI agents. It's crucial on Kubernetes because traditional monitoring tools don't capture the unique non-deterministic nature, token usage, and complex decision-making processes inherent to LLMs and AI agents, leading to blind spots in performance and cost management.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I monitor LLM costs on Kubernetes?
&lt;/h3&gt;

&lt;p&gt;To monitor LLM costs on Kubernetes, track metrics like &lt;code&gt;llm_input_tokens_total&lt;/code&gt;, &lt;code&gt;llm_output_tokens_total&lt;/code&gt;, and &lt;code&gt;llm_api_calls_total&lt;/code&gt;. These application-level metrics, collected by Prometheus, can then be used in Grafana with PromQL queries to calculate real-time estimated costs based on your LLM provider's pricing for tokens or API calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can OpenTelemetry be used for LLM metrics and traces?
&lt;/h3&gt;

&lt;p&gt;Yes, OpenTelemetry is the recommended standard for instrumenting LLM applications to generate both metrics and traces. It provides APIs and SDKs to create detailed spans for LLM calls and agent steps, and to emit custom metrics like token usage and latency, all of which can be collected and exported by the OpenTelemetry Collector.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do traditional monitoring tools fail for AI agents?
&lt;/h3&gt;

&lt;p&gt;Traditional monitoring tools primarily focus on infrastructure metrics (CPU, memory) and simple request/response codes. They fail for AI agents because agents have non-deterministic behavior, complex multi-step reasoning processes, rely on token usage for cost, and can hallucinate or go off-topic even with successful HTTP responses. Understanding these nuances requires deep application-level metrics, structured logs, and distributed traces.&lt;/p&gt;

&lt;h3&gt;
  
  
  What key metrics should I track for LLM performance on Kubernetes?
&lt;/h3&gt;

&lt;p&gt;Key metrics for LLM performance on Kubernetes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prompt Processing Latency&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Response Generation Latency&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total Agent Request Latency&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Throughput (Queries Per Second)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool Call Latency (for AI agents)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Input and Output Token Counts&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM API Call Counts&lt;/strong&gt;
These provide a comprehensive view of speed, efficiency, and resource consumption.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building robust &lt;strong&gt;LLM observability on Kubernetes&lt;/strong&gt; is not just about extending your existing monitoring stack; it requires a paradigm shift. You need to look beyond infrastructure metrics and delve deep into the nuances of LLM behavior, token economics, and agent decision-making.&lt;/p&gt;

&lt;p&gt;By leveraging OpenTelemetry for rich instrumentation, Prometheus and Grafana for comprehensive metrics, and Loki with Fluent Bit for structured logging, you can construct a powerful, integrated observability pipeline. This pipeline gives you the visibility needed to understand performance, control costs, and proactively troubleshoot the complex, often non-deterministic world of generative AI on Kubernetes.&lt;/p&gt;

&lt;p&gt;Your next steps should be:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Refine your application instrumentation:&lt;/strong&gt; Integrate your actual LLM calls and agent logic with OpenTelemetry and Prometheus client metrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experiment with diverse prompts:&lt;/strong&gt; Send various types of queries to your agent, including edge cases and error-inducing prompts, to see how your observability stack reacts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build custom Grafana dashboards:&lt;/strong&gt; Create dashboards tailored to your specific LLM applications, focusing on the most critical performance, cost, and quality metrics for LLM observability on Kubernetes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement robust alerting:&lt;/strong&gt; Define clear, actionable alerts based on thresholds that matter for your application's reliability and cost efficiency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explore tracing backends:&lt;/strong&gt; Consider deploying Jaeger or Tempo to gain full end-to-end distributed tracing capabilities, which are invaluable for complex AI agent debugging.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The journey to effective LLM observability is iterative. Continuously refine your instrumentation, dashboards, and alerts as your AI agents evolve and your understanding of their behavior deepens.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>llm</category>
      <category>observability</category>
      <category>aiagents</category>
    </item>
  </channel>
</rss>
