<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Diya </title>
    <description>The latest articles on DEV Community by Diya  (@diya_r).</description>
    <link>https://dev.to/diya_r</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2207590%2Fbabb71ca-76e0-4866-a3d5-86bde37a301d.png</url>
      <title>DEV Community: Diya </title>
      <link>https://dev.to/diya_r</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/diya_r"/>
    <language>en</language>
    <item>
      <title>Deployment using all three Kubernetes probes</title>
      <dc:creator>Diya </dc:creator>
      <pubDate>Mon, 25 May 2026 03:28:10 +0000</pubDate>
      <link>https://dev.to/diya_r/deployment-using-all-three-kubernetes-probes-2819</link>
      <guid>https://dev.to/diya_r/deployment-using-all-three-kubernetes-probes-2819</guid>
      <description>&lt;h2&gt;
  
  
  Full Example YAML
&lt;/h2&gt;

&lt;p&gt;Here’s a deployment using all three Kubernetes probes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-api:latest&lt;/span&gt;

    &lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/readyz&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5000&lt;/span&gt;
      &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt;

    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/readyz&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5000&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
      &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;

    &lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/healthz&lt;/span&gt;
        &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5000&lt;/span&gt;
      &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
      &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
      &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let’s break down what Kubernetes is actually doing here.&lt;/p&gt;




&lt;h2&gt;
  
  
  startupProbe
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;startupProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/readyz&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5000&lt;/span&gt;
  &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
  &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells Kubernetes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Check /readyz every 15 seconds.
Allow 20 failures before killing the container.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calculation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;15 seconds × 20 failures = 300 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So Kubernetes gives the application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5 minutes to fully start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;before deciding:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The application failed to start.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Default Values
&lt;/h3&gt;

&lt;p&gt;If not specified, Kubernetes uses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;span class="na"&gt;successThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which means by default:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10 seconds × 3 failures = 30 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your application may only get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~30 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;before Kubernetes decides startup failed.&lt;/p&gt;

&lt;p&gt;This is why slow-starting applications often need a custom &lt;code&gt;startupProbe&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Real-World Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Java applications&lt;/li&gt;
&lt;li&gt;ML workloads&lt;/li&gt;
&lt;li&gt;applications loading huge caches&lt;/li&gt;
&lt;li&gt;Python/Gunicorn services&lt;/li&gt;
&lt;li&gt;applications waiting for database migrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important part:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A startup probe failure itself is NOT the issue.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The issue happens only when failures continue beyond the threshold.&lt;/p&gt;




&lt;h2&gt;
  
  
  readinessProbe
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/readyz&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5000&lt;/span&gt;
  &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
  &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells Kubernetes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Wait 5 seconds after container start.
Then check /readyz every 10 seconds.
If it fails 3 consecutive times:
remove the pod from Service traffic.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calculation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10 seconds × 3 failures = 30 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the application cannot respond successfully for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;30 continuous seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the pod becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NotReady
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But importantly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The container is NOT restarted.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Traffic simply stops flowing to it temporarily.&lt;/p&gt;

&lt;h3&gt;
  
  
  Default Values
&lt;/h3&gt;

&lt;p&gt;If not configured, Kubernetes defaults to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;span class="na"&gt;successThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means Kubernetes starts checking almost immediately.&lt;/p&gt;

&lt;p&gt;That can become dangerous for applications that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take time to boot&lt;/li&gt;
&lt;li&gt;warm caches&lt;/li&gt;
&lt;li&gt;establish DB connections&lt;/li&gt;
&lt;li&gt;initialize workers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Important Concept
&lt;/h3&gt;

&lt;p&gt;A readiness failure usually means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Do not send traffic right now."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It does NOT mean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"The application is dead."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This distinction is extremely important in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  livenessProbe
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;livenessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/healthz&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5000&lt;/span&gt;
  &lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt;
  &lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells Kubernetes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Wait 30 seconds before starting checks.
Then check /healthz every 20 seconds.
If it fails 3 consecutive times:
restart the container.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calculation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;20 seconds × 3 failures = 60 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If health checks fail continuously for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;60 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kubernetes assumes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The application is unhealthy or stuck.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and restarts the container automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Default Values
&lt;/h3&gt;

&lt;p&gt;Kubernetes defaults:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;initialDelaySeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="na"&gt;periodSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;failureThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;span class="na"&gt;successThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which effectively means:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10 seconds × 3 failures = 30 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;before restart behavior begins.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Mistake
&lt;/h3&gt;

&lt;p&gt;Many teams configure aggressive liveness probes like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU spikes&lt;/li&gt;
&lt;li&gt;GC pauses&lt;/li&gt;
&lt;li&gt;dependency slowness&lt;/li&gt;
&lt;li&gt;temporary latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the application may briefly respond slowly.&lt;/p&gt;

&lt;p&gt;This can accidentally trigger unnecessary restarts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Most Important Thing to Understand
&lt;/h2&gt;

&lt;p&gt;Many engineers panic immediately when they see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Readiness probe failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Liveness probe failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But probes are designed to fail occasionally.&lt;/p&gt;

&lt;p&gt;The real question is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Did the failures exceed the threshold?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because Kubernetes only takes action after repeated failures over time.&lt;/p&gt;

&lt;p&gt;That’s why these settings matter so much:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;failureThreshold&lt;/span&gt;
&lt;span class="s"&gt;periodSeconds&lt;/span&gt;
&lt;span class="s"&gt;timeoutSeconds&lt;/span&gt;
&lt;span class="s"&gt;initialDelaySeconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Together, they control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how patient Kubernetes should be&lt;/li&gt;
&lt;li&gt;when traffic should stop&lt;/li&gt;
&lt;li&gt;when restarts should happen&lt;/li&gt;
&lt;li&gt;how tolerant the system should be during spikes&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes0odtpz74nygy8duab3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fes0odtpz74nygy8duab3.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Probe&lt;/th&gt;
&lt;th&gt;What Happens on Failure?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;startupProbe&lt;/td&gt;
&lt;td&gt;Container may be killed if startup takes too long&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readinessProbe&lt;/td&gt;
&lt;td&gt;Pod stops receiving traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;livenessProbe&lt;/td&gt;
&lt;td&gt;Container gets restarted&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;Kubernetes probes are not meant to punish applications.&lt;/p&gt;

&lt;p&gt;They are safety mechanisms.&lt;/p&gt;

&lt;p&gt;The goal is to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;avoid sending traffic to unhealthy pods&lt;/li&gt;
&lt;li&gt;restart stuck applications&lt;/li&gt;
&lt;li&gt;allow slow startups safely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you understand probe thresholds, Kubernetes behavior suddenly becomes much easier to debug.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>monitoring</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How Logs Travel From Your EKS Pod to Datadog</title>
      <dc:creator>Diya </dc:creator>
      <pubDate>Mon, 25 May 2026 03:04:13 +0000</pubDate>
      <link>https://dev.to/diya_r/how-logs-travel-from-your-eks-pod-to-datadog-12an</link>
      <guid>https://dev.to/diya_r/how-logs-travel-from-your-eks-pod-to-datadog-12an</guid>
      <description>&lt;p&gt;If you’re running applications on Kubernetes using Amazon EKS and suddenly seeing logs appear in Datadog, you may have wondered:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How did the logs even get there?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your application is running inside a Kubernetes pod.&lt;br&gt;&lt;br&gt;
Datadog is somewhere in the cloud.&lt;br&gt;&lt;br&gt;
Yet somehow every request, every error, and every stack trace magically appears in the Datadog UI.&lt;/p&gt;

&lt;p&gt;At first, it feels invisible.&lt;/p&gt;

&lt;p&gt;But once you understand the observability pipeline, Kubernetes starts making a lot more sense.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Is Datadog?
&lt;/h2&gt;

&lt;p&gt;Datadog is an observability platform.&lt;/p&gt;

&lt;p&gt;It helps engineers monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infrastructure&lt;/li&gt;
&lt;li&gt;Kubernetes clusters&lt;/li&gt;
&lt;li&gt;Applications&lt;/li&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Metrics&lt;/li&gt;
&lt;li&gt;Traces&lt;/li&gt;
&lt;li&gt;Security events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of Datadog as a centralized monitoring brain for your systems.&lt;/p&gt;

&lt;p&gt;Instead of SSH-ing into servers and manually checking logs, Datadog collects everything into one searchable place.&lt;/p&gt;

&lt;p&gt;You can search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;service:map-service status:error
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and instantly see logs from hundreds of Kubernetes pods.&lt;/p&gt;

&lt;p&gt;You can also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create dashboards&lt;/li&gt;
&lt;li&gt;Set alerts&lt;/li&gt;
&lt;li&gt;Trace requests&lt;/li&gt;
&lt;li&gt;Monitor pod restarts&lt;/li&gt;
&lt;li&gt;Watch CPU and memory usage&lt;/li&gt;
&lt;li&gt;Detect failures in real time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here’s the important part:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Your application itself usually does NOT directly communicate with Datadog.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That job belongs to the Datadog Agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is the Datadog Agent?
&lt;/h2&gt;

&lt;p&gt;The Datadog Agent is the collector.&lt;/p&gt;

&lt;p&gt;It runs inside your Kubernetes cluster and continuously gathers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Metrics&lt;/li&gt;
&lt;li&gt;Traces&lt;/li&gt;
&lt;li&gt;Kubernetes metadata&lt;/li&gt;
&lt;li&gt;Container information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Kubernetes, the Datadog Agent is usually deployed as a:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DaemonSet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A DaemonSet means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Run one Datadog Agent pod on every Kubernetes node.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So if your EKS cluster has:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;20 worker nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Kubernetes automatically creates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;20 Datadog Agent pods
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each agent watches workloads running on its own node.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Big Question
&lt;/h2&gt;

&lt;p&gt;Here’s what most people wonder:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“My application is inside a pod… so how does Datadog see its logs?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To understand this, we first need to understand how Kubernetes handles logs internally.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1  Your Application Writes Logs
&lt;/h2&gt;

&lt;p&gt;Inside your container, your application usually writes logs to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stdout
stderr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hello world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database connection failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or maybe your Gunicorn server prints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;500 Internal Server Error
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your app is simply writing text output.&lt;/p&gt;

&lt;p&gt;It doesn’t know anything about Datadog.&lt;/p&gt;

&lt;p&gt;It doesn’t know about dashboards.&lt;/p&gt;

&lt;p&gt;It doesn’t know about observability.&lt;/p&gt;

&lt;p&gt;It is simply talking.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2  Kubernetes Captures Container Logs
&lt;/h2&gt;

&lt;p&gt;Kubernetes containers run through a container runtime like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;containerd&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;CRI-O&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In EKS today, most clusters use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;containerd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime captures container stdout/stderr and stores it as log files on the Kubernetes node.&lt;/p&gt;

&lt;p&gt;Usually under paths like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/var/log/containers/
/var/log/pods/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk0h6lt2qlf0jf5bqe7kq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk0h6lt2qlf0jf5bqe7kq.png" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The logs now physically exist on the EC2 worker node.&lt;/p&gt;

&lt;p&gt;This is the key realization:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Your pod logs are not floating magically inside Kubernetes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They become actual files on the node filesystem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3 The Datadog Agent Watches Those Logs
&lt;/h2&gt;

&lt;p&gt;Now the Datadog Agent enters the picture.&lt;/p&gt;

&lt;p&gt;Because the Agent runs on every node, it can monitor container log files locally.&lt;/p&gt;

&lt;p&gt;Conceptually, it does something similar to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /var/log/containers/&lt;span class="k"&gt;*&lt;/span&gt;.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Agent continuously watches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;new logs&lt;/li&gt;
&lt;li&gt;new containers&lt;/li&gt;
&lt;li&gt;restarted pods&lt;/li&gt;
&lt;li&gt;Kubernetes metadata&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whenever your application writes a log like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR database timeout
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the Datadog Agent immediately sees it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4 Metadata Enrichment
&lt;/h2&gt;

&lt;p&gt;This is where Datadog becomes powerful.&lt;/p&gt;

&lt;p&gt;The Datadog Agent doesn’t just forward raw text.&lt;/p&gt;

&lt;p&gt;It enriches logs with Kubernetes metadata.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GET /orders 500"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pod_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"map-service-12345abc-west"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"namespace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"production"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"map-service"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ip-10-0-10-12"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cluster"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eks-prod"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your logs become searchable.&lt;/p&gt;

&lt;p&gt;You can filter by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pod&lt;/li&gt;
&lt;li&gt;namespace&lt;/li&gt;
&lt;li&gt;service&lt;/li&gt;
&lt;li&gt;cluster&lt;/li&gt;
&lt;li&gt;environment&lt;/li&gt;
&lt;li&gt;container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this enrichment, logs would just be random text.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5 Secure Upload to Datadog Cloud
&lt;/h2&gt;

&lt;p&gt;After enrichment, the Agent securely uploads logs to Datadog’s backend using HTTPS.&lt;/p&gt;

&lt;p&gt;The flow now looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlsarb3m7xd57w1po16s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlsarb3m7xd57w1po16s.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;That’s the full hidden journey.&lt;/p&gt;




&lt;h2&gt;
  
  
  What About Metrics?
&lt;/h2&gt;

&lt;p&gt;Datadog does more than logs.&lt;/p&gt;

&lt;p&gt;The Agent also collects metrics from Kubernetes.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU usage&lt;/li&gt;
&lt;li&gt;Memory usage&lt;/li&gt;
&lt;li&gt;Pod restarts&lt;/li&gt;
&lt;li&gt;RabbitMQ queue depth&lt;/li&gt;
&lt;li&gt;Redis connections&lt;/li&gt;
&lt;li&gt;Network traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It gathers metrics from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;kubelet&lt;/li&gt;
&lt;li&gt;Kubernetes API&lt;/li&gt;
&lt;li&gt;cAdvisor&lt;/li&gt;
&lt;li&gt;integrations&lt;/li&gt;
&lt;li&gt;DogStatsD&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Kubernetes Node
   ↓
kubelet metrics
   ↓
Datadog Agent
   ↓
Datadog Cloud
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is why you can build dashboards showing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pod CPU %&lt;/li&gt;
&lt;li&gt;node memory&lt;/li&gt;
&lt;li&gt;HPA scaling&lt;/li&gt;
&lt;li&gt;container restarts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;in real time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What About Traces (APM)?
&lt;/h2&gt;

&lt;p&gt;Datadog APM tracks requests flowing across services.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Frontend
   ↓
API Gateway
   ↓
ROS Service
   ↓
PostgreSQL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Datadog can measure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request latency&lt;/li&gt;
&lt;li&gt;database queries&lt;/li&gt;
&lt;li&gt;failed API calls&lt;/li&gt;
&lt;li&gt;slow endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This works using tracing libraries like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ddtrace (Python)&lt;/li&gt;
&lt;li&gt;datadog-js&lt;/li&gt;
&lt;li&gt;Java agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These libraries send trace data to the Datadog Agent.&lt;/p&gt;

&lt;p&gt;Then the Agent forwards it to Datadog.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Does the Datadog Agent Need Permissions?
&lt;/h2&gt;

&lt;p&gt;This part surprises many engineers.&lt;/p&gt;

&lt;p&gt;The Datadog Agent often mounts host system paths like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/var/log/containers
/proc
/sys/fs/cgroup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because Kubernetes isolation prevents containers from normally seeing host resources.&lt;/p&gt;

&lt;p&gt;The Agent needs visibility into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;container logs&lt;/li&gt;
&lt;li&gt;processes&lt;/li&gt;
&lt;li&gt;cgroups&lt;/li&gt;
&lt;li&gt;node metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some advanced integrations require even more access.&lt;/p&gt;

&lt;p&gt;For example, the Gunicorn integration may require:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;hostPID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;privileged&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="s"&gt;SYS_PTRACE&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;because the Agent sometimes needs to inspect processes outside its own container namespace.&lt;/p&gt;

&lt;p&gt;This is why observability agents can become security discussions between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;platform teams&lt;/li&gt;
&lt;li&gt;DevOps engineers&lt;/li&gt;
&lt;li&gt;security teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Monitoring deeply often requires elevated visibility.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Biggest Realization
&lt;/h2&gt;

&lt;p&gt;Your application is usually just writing logs normally.&lt;/p&gt;

&lt;p&gt;The Datadog Agent acts like a collector sitting beside your workloads.&lt;/p&gt;

&lt;p&gt;A simple analogy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Application = person speaking
Datadog Agent = microphone
Datadog Cloud = recording studio
Datadog UI = playback/search system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The application just talks.&lt;/p&gt;

&lt;p&gt;The Agent listens.&lt;/p&gt;




&lt;p&gt;Once you understand this pipeline, debugging Kubernetes becomes much easier.&lt;/p&gt;

&lt;p&gt;When logs disappear, you know exactly where to investigate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the app writing logs?&lt;/li&gt;
&lt;li&gt;Is stdout working?&lt;/li&gt;
&lt;li&gt;Did the container runtime capture them?&lt;/li&gt;
&lt;li&gt;Is the Datadog Agent healthy?&lt;/li&gt;
&lt;li&gt;Are hostPath mounts correct?&lt;/li&gt;
&lt;li&gt;Is networking blocking uploads?&lt;/li&gt;
&lt;li&gt;Is metadata enrichment failing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Observability suddenly becomes understandable instead of magical.&lt;/p&gt;

&lt;p&gt;And honestly, that’s one of the coolest parts of Kubernetes infrastructure.&lt;/p&gt;

&lt;p&gt;Behind every “simple dashboard” is an entire hidden pipeline quietly moving data across your cluster 24/7.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>kubernetes</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>👋</title>
      <dc:creator>Diya </dc:creator>
      <pubDate>Mon, 14 Oct 2024 03:03:51 +0000</pubDate>
      <link>https://dev.to/diya_r/-57m6</link>
      <guid>https://dev.to/diya_r/-57m6</guid>
      <description>&lt;p&gt;Hi 👋 Hello&lt;br&gt;
it’s more important than ever for us to keep ourselves busy, constantly learning, and building new skills. Age is just a number—what truly matters is our willingness to embrace new challenges, grow, and share our knowledge with others. Continuous learning not only helps us stay ahead in our careers, but it also keeps us away from negative thoughts by focusing our energy on self-improvement and personal growth.&lt;/p&gt;

&lt;p&gt;I’m excited to start my journey!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
