<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: YukiOnodera</title>
    <description>The latest articles on DEV Community by YukiOnodera (@yukionodera).</description>
    <link>https://dev.to/yukionodera</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1233037%2F06e91f94-7604-4892-b9fe-ce75dd21243b.jpeg</url>
      <title>DEV Community: YukiOnodera</title>
      <link>https://dev.to/yukionodera</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yukionodera"/>
    <language>en</language>
    <item>
      <title>Inside Datadog's Log Pipeline: How "Logging without Limits" Actually Works</title>
      <dc:creator>YukiOnodera</dc:creator>
      <pubDate>Mon, 27 Apr 2026 09:54:07 +0000</pubDate>
      <link>https://dev.to/yukionodera/inside-datadogs-log-pipeline-how-logging-without-limits-actually-works-50od</link>
      <guid>https://dev.to/yukionodera/inside-datadogs-log-pipeline-how-logging-without-limits-actually-works-50od</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;In this post, I want to walk through how Datadog processes logs internally — from raw ingestion all the way to indexed, queryable data.&lt;/p&gt;

&lt;p&gt;If you've spent time clicking around Datadog's log management UI, you've probably noticed something satisfying: raw, messy log lines gradually get enriched and structured as they flow through the pipeline. It's a really elegant design, and once you understand the order in which things happen, it becomes clear why Datadog can offer both cost control and deep observability at the same time. Let me break it down.&lt;/p&gt;

&lt;p&gt;:::message&lt;br&gt;
This article focuses on the main steps I studied this time around. In practice, there are more detailed processes — sensitive data scanning, Error Tracking, Live Tail, and so on. For the full picture, please refer to the &lt;a href="https://docs.datadoghq.com/logs/" rel="noopener noreferrer"&gt;official documentation | Datadog&lt;/a&gt;.&lt;br&gt;
:::&lt;/p&gt;
&lt;h1&gt;
  
  
  The Overall Log Processing Flow
&lt;/h1&gt;

&lt;p&gt;Datadog's log management is built around a design philosophy called &lt;strong&gt;Logging without Limits™&lt;/strong&gt;, which lets you independently control "ingestion," "storage," and "analysis."&lt;/p&gt;

&lt;p&gt;The high-level flow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ingest
  ↓
Pipelines (Parse &amp;amp; Enrich)
  ↓
Generate Metrics
  ↓
Exclusion Filters
  ↓
Index
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Walking Through Each Step
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Ingest
&lt;/h2&gt;

&lt;p&gt;First, logs are collected into Datadog from a wide variety of sources.&lt;/p&gt;

&lt;p&gt;Datadog offers &lt;strong&gt;over 500 log integrations&lt;/strong&gt;, covering AWS, GCP, Kubernetes, and all kinds of middleware. I was honestly surprised by just how many there are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pipelines (Parse &amp;amp; Enrich)
&lt;/h2&gt;

&lt;p&gt;Once raw logs are ingested, they pass through &lt;strong&gt;pipelines&lt;/strong&gt; that structure and enrich them (adding extra information).&lt;/p&gt;

&lt;p&gt;Using processors like the &lt;strong&gt;Grok parser&lt;/strong&gt;, unstructured text logs get broken down into fields, and additional attributes can be attached.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Before&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;parsing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(raw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;log)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="mi"&gt;2024-04-27&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;ERROR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;app&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Connection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;timeout:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;host=db&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;duration=&lt;/span&gt;&lt;span class="mi"&gt;5002&lt;/span&gt;&lt;span class="err"&gt;ms&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;After&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;parsing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(structured)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-04-27T12:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ERROR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"app"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Connection timeout"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"db01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5002&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Watching unformatted logs get cleaned up and enriched is honestly the most fun part of this whole process to observe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generate Metrics
&lt;/h2&gt;

&lt;p&gt;This is the most interesting part of the Logging without Limits design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Log-based metrics are generated &lt;em&gt;before&lt;/em&gt; exclusion filters run.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In other words, even for logs that will later be discarded and never make it to the index, you can still retain statistical information as metrics.&lt;/p&gt;

&lt;p&gt;:::message&lt;br&gt;
The benefit of this design is that even if you're aggressively dropping logs to keep costs down, you still get reliable metrics on trends, error rates, and the like.&lt;br&gt;
:::&lt;/p&gt;

&lt;h2&gt;
  
  
  Exclusion Filters
&lt;/h2&gt;

&lt;p&gt;After metric generation, &lt;strong&gt;exclusion filters&lt;/strong&gt; decide which logs are &lt;em&gt;not&lt;/em&gt; saved to the index.&lt;/p&gt;

&lt;p&gt;Debug logs, high-volume boilerplate logs, and anything that isn't needed for ongoing search can be dropped here, helping keep indexing costs under control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Index
&lt;/h2&gt;

&lt;p&gt;Logs that pass through the filters are finally stored in the &lt;strong&gt;Index&lt;/strong&gt;. Once a log is indexed, you can use Datadog's UI for facet search and analysis.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why This Ordering Matters
&lt;/h1&gt;

&lt;p&gt;The key insight in this processing order is the design principle: &lt;strong&gt;"extract metrics before throwing logs away."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Log storage costs balloon quickly, so indexing every single log is rarely realistic. But if you just drop logs, you lose visibility into trends within the discarded data.&lt;/p&gt;

&lt;p&gt;Logging without Limits solves this by placing metric generation &lt;em&gt;before&lt;/em&gt; exclusion filters. You can lower storage costs while still maximizing observability.&lt;/p&gt;

&lt;h1&gt;
  
  
  Wrap-up
&lt;/h1&gt;

&lt;p&gt;Datadog's log pipeline has clearly separated stages: ingest, parse, generate metrics, exclude, and index. The design choice to run metric generation before exclusion filters strikes me as especially important — it's what allows you to balance cost and observability rather than trade one off against the other.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Subscribe for more Datadog &amp;amp; Observability deep-dives.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>datadog</category>
      <category>observability</category>
      <category>logging</category>
    </item>
    <item>
      <title>ECR Costs Had Increased Over 10 Times Without Me Noticing</title>
      <dc:creator>YukiOnodera</dc:creator>
      <pubDate>Fri, 16 Aug 2024 12:10:15 +0000</pubDate>
      <link>https://dev.to/yukionodera/ecr-costs-had-increased-over-10-times-without-me-noticing-15l1</link>
      <guid>https://dev.to/yukionodera/ecr-costs-had-increased-over-10-times-without-me-noticing-15l1</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The other day, while I was looking through Cost Explorer, I discovered that the cost of ECR had ballooned to &lt;strong&gt;more than 10 times&lt;/strong&gt; what it was a few months ago.&lt;/p&gt;

&lt;h2&gt;
  
  
  Investigation Begins
&lt;/h2&gt;

&lt;p&gt;Realizing this was a serious issue, I &lt;strong&gt;immediately began investigating the cause&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Confirming ECR Pricing
&lt;/h3&gt;

&lt;p&gt;I started by &lt;strong&gt;reviewing the pricing structure for ECR&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/ecr/pricing/" rel="noopener noreferrer"&gt;https://aws.amazon.com/ecr/pricing/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The cost of ECR is determined by &lt;strong&gt;storage charges based on the amount of image storage and data transfer out&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reviewing the Invoice
&lt;/h2&gt;

&lt;p&gt;Next, I &lt;strong&gt;checked last month’s invoice&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The goal was to &lt;strong&gt;identify whether the increased costs were due to storage or data transfer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At this point, I noticed that in the ECR section of the invoice, only the storage costs were visible. Data transfer costs are listed under a separate data transfer section, so make sure to check that. I initially overlooked this, which left me puzzled about the discrepancy.&lt;/p&gt;

&lt;p&gt;In my case, &lt;strong&gt;data transfer costs had skyrocketed&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Checking AWS Accounts
&lt;/h3&gt;

&lt;p&gt;Since I was using AWS Organizations, I &lt;strong&gt;checked which member account was seeing the increased ECR costs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Fortunately, only one account had significantly higher costs, so I was able to quickly pinpoint the source.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reviewing ECR Repositories
&lt;/h3&gt;

&lt;p&gt;However, just identifying the AWS account wasn’t enough to solve the problem.&lt;/p&gt;

&lt;p&gt;I decided to open the list of ECR repositories and take a look.&lt;/p&gt;

&lt;p&gt;I noticed that &lt;strong&gt;several repositories had been created around the time costs started increasing&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cause of the Cost Increase
&lt;/h3&gt;

&lt;p&gt;After discussing it with team members and digging deeper, we discovered that this was due to a repository that had been migrated from Docker Hub for use in CI.&lt;/p&gt;

&lt;p&gt;CI is executed on GitHub Actions with each commit, and since multiple images of considerable size were being pulled, &lt;strong&gt;data transfer costs had surged&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I'm glad I found the cause. In fact, I consider myself &lt;strong&gt;lucky&lt;/strong&gt; to have noticed it through Cost Explorer.&lt;/p&gt;

&lt;p&gt;Cloud costs can sometimes spike unexpectedly, so it’s important to stay vigilant.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.docker.com/docker-hub/download-rate-limit/" rel="noopener noreferrer"&gt;https://docs.docker.com/docker-hub/download-rate-limit/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ecr</category>
      <category>container</category>
    </item>
  </channel>
</rss>
