<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tencent Cloud -Cloud Log Service</title>
    <description>The latest articles on DEV Community by Tencent Cloud -Cloud Log Service (@tencentcloud-cls).</description>
    <link>https://dev.to/tencentcloud-cls</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3973507%2F9781ff0f-455c-4728-b0f1-03dd09ad55d4.png</url>
      <title>DEV Community: Tencent Cloud -Cloud Log Service</title>
      <link>https://dev.to/tencentcloud-cls</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tencentcloud-cls"/>
    <language>en</language>
    <item>
      <title>Troubleshooting Kubernetes Events with TKE and Tencent Cloud CLS</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Mon, 15 Jun 2026 11:06:29 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/troubleshooting-kubernetes-events-with-tke-and-tencent-cloud-cls-1ncl</link>
      <guid>https://dev.to/tencentcloud-cls/troubleshooting-kubernetes-events-with-tke-and-tencent-cloud-cls-1ncl</guid>
      <description>&lt;h1&gt;
  
  
  Troubleshooting Kubernetes Events with TKE and Tencent Cloud CLS
&lt;/h1&gt;

&lt;p&gt;Cluster problems rarely appear from nowhere. Before a service outage becomes visible, Kubernetes often records smaller state changes: node pressure, Pod scheduling, Pod eviction, and cluster autoscaler decisions.&lt;/p&gt;

&lt;p&gt;Tencent Kubernetes Engine can send those Events into Tencent Cloud CLS, where they become searchable logs and dashboard data. This gives operators a central way to answer what changed, when it changed, which object was involved, and which component reported it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an Event tells you
&lt;/h2&gt;

&lt;p&gt;Kubernetes Events describe state transitions. The useful fields are:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;What to look for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Type&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Normal&lt;/code&gt;, &lt;code&gt;Warning&lt;/code&gt;, or a custom type.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Involved Object&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pod, Deployment, Node, or another Kubernetes object.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Source&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Component such as Scheduler or Kubelet.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Reason&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Short reason enum.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Message&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Detailed explanation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Count&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;How many times it happened.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The core flow is: Kubernetes emits a state-change record, CLS stores it as a log event, and the operator filters by object, component, reason, message, count, and timestamp.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Event Search
&lt;/h2&gt;

&lt;p&gt;In TKE, go to &lt;strong&gt;Cluster Operations -&amp;gt; Event Search&lt;/strong&gt;. CLS provides collection, storage, search, analysis, and dashboards for the event stream.&lt;/p&gt;

&lt;p&gt;Use the overview when you need warning distribution, affected object types, and event trends. Use global search when you already know the component or object name and need a row-level timeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Runbook 1: an abnormal node
&lt;/h2&gt;

&lt;p&gt;Filter by the abnormal node name in the event overview. In this example, the result included a node disk-space warning.&lt;/p&gt;

&lt;p&gt;The timeline showed that on &lt;code&gt;2020-11-25&lt;/code&gt;, node &lt;code&gt;172.16.18.13&lt;/code&gt; became abnormal because disk space was insufficient. Kubelet then tried to evict Pods from the node to reclaim disk space.&lt;/p&gt;

&lt;p&gt;That sequence gives you a clean next step: check node disk usage, eviction thresholds, and workload placement before treating it as a generic application failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Runbook 2: autoscaler expansion
&lt;/h2&gt;

&lt;p&gt;For node pool autoscaling, query the autoscaler component:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;event.source.component:"cluster-autoscaler"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Display these fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;event.reason&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;event.message&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;event.involvedObject.name&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sort by log time descending. The result should work like a compact ledger of autoscaler decisions: workload object, reason, message, and the timestamp of each scaling step.&lt;/p&gt;

&lt;p&gt;The event stream showed scale-out around &lt;code&gt;2020-11-25 20:35:45&lt;/code&gt;, triggered by three nginx Pods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;nginx-5dbf784b68-tq8rd&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;nginx-5dbf784b68-fpvbx&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;nginx-5dbf784b68-v9jv5&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three nodes were added. Later scale-out did not continue because the node pool had reached its maximum node count.&lt;/p&gt;

&lt;h2&gt;
  
  
  Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use Events to understand state changes, not only current state.&lt;/li&gt;
&lt;li&gt;Start with overview dashboards, then filter by object name.&lt;/li&gt;
&lt;li&gt;For node issues, inspect reason, message, source component, and count.&lt;/li&gt;
&lt;li&gt;For autoscaling, query &lt;code&gt;cluster-autoscaler&lt;/code&gt; and reconstruct the event timeline.&lt;/li&gt;
&lt;li&gt;Use metrics and logs after Events point you to the right object and time window.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why not only use &lt;code&gt;kubectl describe&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;kubectl describe&lt;/code&gt; is useful for one object. CLS is better when you need searchable history, dashboards, and cross-object analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the fastest autoscaler query?
&lt;/h3&gt;

&lt;p&gt;Start with &lt;code&gt;event.source.component:"cluster-autoscaler"&lt;/code&gt; and sort by log time descending.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>logging</category>
      <category>devops</category>
      <category>observability</category>
    </item>
    <item>
      <title>Manage Cloud Product Logs from an Architecture View with CLS and Cloud Advisor</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Thu, 11 Jun 2026 07:42:42 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/manage-cloud-product-logs-from-an-architecture-view-with-cls-and-cloud-advisor-39d3</link>
      <guid>https://dev.to/tencentcloud-cls/manage-cloud-product-logs-from-an-architecture-view-with-cls-and-cloud-advisor-39d3</guid>
      <description>&lt;p&gt;In a complex cloud architecture, log troubleshooting usually starts with a resource map: which services are connected, where traffic flows, and which components have logs enabled. Tencent Cloud CLS and Cloud Advisor bring multi-product log management into the Cloud Advisor architecture view.&lt;/p&gt;

&lt;p&gt;The integration combines three capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unified cloud-product log access management;&lt;/li&gt;
&lt;li&gt;real-time cloud-product log search and analysis;&lt;/li&gt;
&lt;li&gt;out-of-the-box operational dashboards for cloud products.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key idea is that logs are no longer managed only from a separate log console. Operators can inspect log status, search logs, and open dashboards from the same architecture view they use to understand cloud-resource relationships.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjp7sudya50y3rkw9068j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjp7sudya50y3rkw9068j.png" alt=" " width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Cloud Advisor architecture interface keeps the resource topology visible while log status and summary metrics appear on the side. This creates a global operations view: resource topology on the left, log visibility and operational indicators on the right.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capability 1: unified log access management
&lt;/h2&gt;

&lt;p&gt;Cloud Advisor can show whether cloud-product instances have log delivery enabled, and it can support batch enabling or disabling logs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmb7an1n8kdx7orslw39d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmb7an1n8kdx7orslw39d.png" alt=" " width="800" height="559"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The operation path in the article is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;enter the Cloud Advisor architecture view;&lt;/li&gt;
&lt;li&gt;click the log-service plugin;&lt;/li&gt;
&lt;li&gt;choose the cloud product;&lt;/li&gt;
&lt;li&gt;open &lt;code&gt;Access Management&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;review the current log-delivery status of product instances.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The resource topology stays on the left, while a table-style management panel appears on the right. Operators can understand both context and configuration status at once.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capability 2: query product logs from the architecture view
&lt;/h2&gt;

&lt;p&gt;The integration also supports direct log search. Operators can query cloud-product logs by key fields and time range, then use the results to locate failures, trace access behavior, or monitor runtime status.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi09mxjf2k8gfk8konjtg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi09mxjf2k8gfk8konjtg.png" alt=" " width="799" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Open the log-search module from the log-service plugin, choose the cloud product, enter a query in the search box, and execute analysis. The chart and log list remain tied to the selected resource context, which reduces the need to switch between architecture diagrams and log consoles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capability 3: open out-of-the-box dashboards
&lt;/h2&gt;

&lt;p&gt;Cloud Advisor can expose dedicated dashboards for cloud products. These dashboards can show performance monitoring, usage trends, anomaly detection, and other analysis results without extra manual configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1l1wa81k1yqjb2hu4xm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1l1wa81k1yqjb2hu4xm.png" alt=" " width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dashboard view places summary cards and circular charts alongside the architecture view. After choosing &lt;code&gt;Log Service -&amp;gt; Cloud Product -&amp;gt; Dashboard&lt;/code&gt;, operators can inspect product-specific log analysis without building the dashboard from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported products and log types
&lt;/h2&gt;

&lt;p&gt;Nine cloud products are currently available through Cloud Advisor log access and management:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Log type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Content Delivery Network CDN&lt;/td&gt;
&lt;td&gt;Domain access logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Load Balancer CLB&lt;/td&gt;
&lt;td&gt;Load-balancer access logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Object Storage COS&lt;/td&gt;
&lt;td&gt;Bucket access logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tencent Kubernetes Engine TKE&lt;/td&gt;
&lt;td&gt;Container business logs, cluster audit logs, and cluster event logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elastic MapReduce&lt;/td&gt;
&lt;td&gt;Component runtime logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TencentDB&lt;/td&gt;
&lt;td&gt;Slow logs and error logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Video on Demand VOD&lt;/td&gt;
&lt;td&gt;Access logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud File Storage CFS&lt;/td&gt;
&lt;td&gt;Audit logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web Application Firewall WAF&lt;/td&gt;
&lt;td&gt;Access logs and attack logs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;CLS also supports one-click collection and fast analysis for more than 60 cloud-product logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting workflow
&lt;/h2&gt;

&lt;p&gt;A practical troubleshooting workflow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Cloud Advisor to inspect the cloud architecture.&lt;/li&gt;
&lt;li&gt;Use the CLS log-service plugin to check which resources have log delivery enabled.&lt;/li&gt;
&lt;li&gt;Batch enable logs for missing resources when needed.&lt;/li&gt;
&lt;li&gt;Search logs directly from the architecture view using product fields and time filters.&lt;/li&gt;
&lt;li&gt;Open the prebuilt product dashboard to review performance, usage, and anomaly patterns.&lt;/li&gt;
&lt;li&gt;Use the resource topology to connect log findings back to upstream and downstream dependencies.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why this improves cloud operations
&lt;/h2&gt;

&lt;p&gt;The integration is valuable because it joins three layers that are often separated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the architecture view, which explains relationships;&lt;/li&gt;
&lt;li&gt;the log status view, which explains whether evidence is being collected;&lt;/li&gt;
&lt;li&gt;the log analysis view, which explains what happened.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For platform teams, that means faster global inspection, fewer console switches, and a clearer path from resource topology to evidence-level troubleshooting. The integration creates one-stop cloud-product log control and analysis, with future expansion planned for more product integrations and prebuilt log-alert capabilities.&lt;/p&gt;

</description>
      <category>observability</category>
      <category>logging</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>How Beike Migrated a Large-Scale Observability Platform to CLS</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Thu, 11 Jun 2026 07:20:12 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/how-beike-migrated-a-large-scale-observability-platform-to-cls-47d8</link>
      <guid>https://dev.to/tencentcloud-cls/how-beike-migrated-a-large-scale-observability-platform-to-cls-47d8</guid>
      <description>&lt;p&gt;Beike operates at a scale where observability is not a dashboard convenience. It is an operations requirement. Beike migrated from self-built operations systems to a new cloud-based observability platform with Tencent Cloud CLS.&lt;/p&gt;

&lt;p&gt;The migration problem had three parts:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Original constraint&lt;/th&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Low data linkage&lt;/td&gt;
&lt;td&gt;Logs, monitoring, tracing, and other observability data existed in many old systems, with limited connection between systems.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance pressure&lt;/td&gt;
&lt;td&gt;During daily settlement, write volume could increase by more than 10x. Large business lines already wrote more than 10 billion records per day, and broad queries often timed out.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data was hard to use&lt;/td&gt;
&lt;td&gt;Self-built systems lacked systematic display, consistent formatting, aggregation functions such as IP geolocation, and convenient sharing for dashboard results.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The goal was to build a unified, high-performance, reliable observability platform without heavily invading business logic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr8x17nbqd0f6xuflq4hs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr8x17nbqd0f6xuflq4hs.png" alt=" " width="800" height="542"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The platform diagram presents a full-stack observability architecture. At the top are data sources such as logs, tracing, metrics, business data, and cloud products. In the middle are data collection, data processing, storage, analysis, dashboards, and alerting. On the output side, the platform supports data sharing, operational dashboards, and AI analysis. This is not only a log search migration; it is a unification of operations data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data ingestion: reduce delay while keeping existing collection logic
&lt;/h2&gt;

&lt;p&gt;The first pain point was write delay. During settlement peaks, delayed reporting was unacceptable because teams needed same-day data for verification and incident response.&lt;/p&gt;

&lt;p&gt;The first assumption was that expanding cloud resources would solve the delay, but the effect was limited. Further analysis by the CLS and Beike teams found that the bottleneck was mainly in the &lt;code&gt;rdkafka&lt;/code&gt; component used by FluentD Kafka output. Tuning rdkafka alone could no longer satisfy Beike's scale.&lt;/p&gt;

&lt;p&gt;CLS then developed a Fluentd Output plugin, published to the community. Data-reporting delay dropped from more than ten minutes to within one minute.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8tqs78c5y7kc0p7woar4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8tqs78c5y7kc0p7woar4.png" alt=" " width="800" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Peak write throughput reaches around &lt;code&gt;300 GB/min&lt;/code&gt;. This is the scale context for the ingestion redesign: the platform needed to absorb traffic bursts rather than only handle average write volume.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-source ingestion without replacing every collector
&lt;/h2&gt;

&lt;p&gt;Beike's environment included Prometheus-based metrics, SkyWalking-based tracing, and mixed ES/Loki-style log systems for network, business, security, and other logs. Most environments had already moved to containers, and FluentD was widely used for log collection, but each business department had its own collection logic.&lt;/p&gt;

&lt;p&gt;The easiest migration path was to keep the existing collection method and change the target endpoint where possible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F658w3etnjuihmqezzu65.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F658w3etnjuihmqezzu65.png" alt=" " width="800" height="773"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The architecture uses five ingestion lanes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;business logs collected by &lt;code&gt;fluentd&lt;/code&gt; are written through the Kafka protocol;&lt;/li&gt;
&lt;li&gt;security logs collected by &lt;code&gt;winlogbeat&lt;/code&gt; are written through the Kafka protocol;&lt;/li&gt;
&lt;li&gt;tracing data from &lt;code&gt;skywalking&lt;/code&gt; is written through an API path;&lt;/li&gt;
&lt;li&gt;TKE audit logs collected by &lt;code&gt;loglistener&lt;/code&gt; are collected through an agent path;&lt;/li&gt;
&lt;li&gt;metrics written through SDKs are ingested as cloud-product log data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This explains why the migration was low-intrusion: teams could preserve much of the existing collection stack while moving storage, search, and analysis into CLS.&lt;/p&gt;

&lt;p&gt;Beike also configured traffic-change alerts for key business modules so traffic shifts could be detected before they became harder incidents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data processing: structure raw logs before storage
&lt;/h2&gt;

&lt;p&gt;Beike had many business departments, which meant log formats were inconsistent. A central parser would not be enough; different business lines needed configurable parsing rules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5vbt5q3aeyzzmr000jn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5vbt5q3aeyzzmr000jn.png" alt=" " width="800" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The CLS data-processing canvas supports visual processing before logs are stored in a topic. In this example, business logs are first split by delimiters and then fields are extracted with regular expressions. The displayed data is simulated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data analysis: make massive logs searchable and cheaper to retain
&lt;/h2&gt;

&lt;p&gt;Two related problems appear at this scale: some logs must be stored for a long time due to compliance, while full-volume aggregation over very large datasets hurts analysis efficiency.&lt;/p&gt;

&lt;p&gt;The solution combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid storage&lt;/strong&gt;: short-term hot data supports analysis, while long-term cold data can move to low-frequency storage while still remaining queryable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduled SQL&lt;/strong&gt;: complex raw logs are aggregated into business-level metrics and saved for long-term monitoring.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Beike security logs, Windows event logs from employee office environments were collected into CLS. The security team configured more than one thousand SQL rules to aggregate by rule name, alert level, and host name. Scheduled SQL summarized results every minute, reducing complex logs into the indicators the business cared about.&lt;/p&gt;

&lt;p&gt;After switching to CLS, real-time retrieval over more than &lt;strong&gt;50 billion&lt;/strong&gt; log records averaged only &lt;strong&gt;10 seconds&lt;/strong&gt;, and retrieval efficiency improved by &lt;strong&gt;6x+&lt;/strong&gt; compared with the original system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84sgz02gsi53o5z8fhpl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84sgz02gsi53o5z8fhpl.png" alt=" " width="800" height="896"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The operational view combines cards, charts, and log records: high-level indicators for scanability, charts for trend review, and raw records for drill-down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Result display: dashboards and DataSight sharing
&lt;/h2&gt;

&lt;p&gt;Before migration, Beike used open-source display components such as Grafana. Those systems had fixed presentation forms, required complex configuration, and were not convenient for sharing inside domestic office workflows.&lt;/p&gt;

&lt;p&gt;After data was collected into CLS, Beike could configure multiple dashboards in the product console and share them to PC or mobile through the independent DataSight console.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj2kqq42eiohybci61egz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj2kqq42eiohybci61egz.png" alt=" " width="800" height="704"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This dashboard contains multi-dimensional charts such as traffic trend, distribution, and summary indicators. The displayed data is simulated, but the workflow is the real point: business teams can monitor operations through reusable dashboards instead of repeated ad hoc searches.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5m19alt8n9c0k499dy3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq5m19alt8n9c0k499dy3.png" alt=" " width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The summary visual reinforces the platform's role across real-time network dashboards, operations dashboards, multi-end sharing, and reporting. It connects the technical migration to daily operations usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Access control and smooth user migration
&lt;/h2&gt;

&lt;p&gt;Beike already had more than one thousand independent R&amp;amp;D users in its internal operations platform, with permission boundaries by business area. Creating Tencent Cloud accounts for everyone was unrealistic.&lt;/p&gt;

&lt;p&gt;CLS DataSight solved this through an embedded, independent console:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it can be embedded into the existing internal system;&lt;/li&gt;
&lt;li&gt;it supports internal and external network access modes;&lt;/li&gt;
&lt;li&gt;it provides an independent log entry and customizable account-password login;&lt;/li&gt;
&lt;li&gt;it can connect to the user's LDAP system and inherit existing permission logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reported results
&lt;/h2&gt;

&lt;p&gt;The migration outcomes are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more than one thousand business sections were connected to CLS in one person-day;&lt;/li&gt;
&lt;li&gt;old and new systems switched smoothly without changing user habits;&lt;/li&gt;
&lt;li&gt;10x peak write traffic dropped from more than ten minutes of delay to minute-level latency;&lt;/li&gt;
&lt;li&gt;overall business efficiency improved by &lt;strong&gt;20x&lt;/strong&gt;;&lt;/li&gt;
&lt;li&gt;retrieval over tens of billions of logs moved from minute-level to second-level;&lt;/li&gt;
&lt;li&gt;retrieval efficiency improved by &lt;strong&gt;6x+&lt;/strong&gt;;&lt;/li&gt;
&lt;li&gt;dashboards and traffic-change alerts made operations more visible and proactive.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reusable migration pattern
&lt;/h2&gt;

&lt;p&gt;The Beike case suggests a practical sequence for large observability migrations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;identify whether the true bottleneck is storage, query, collector output, or parsing;&lt;/li&gt;
&lt;li&gt;preserve existing collection protocols where possible;&lt;/li&gt;
&lt;li&gt;route logs, tracing, audit, and metrics into one analysis platform;&lt;/li&gt;
&lt;li&gt;structure logs before storage through visual processing rules;&lt;/li&gt;
&lt;li&gt;use scheduled SQL to turn massive raw logs into long-lived metrics;&lt;/li&gt;
&lt;li&gt;separate hot and cold storage to balance cost and query requirements;&lt;/li&gt;
&lt;li&gt;expose dashboards through an access model that matches the organization's existing identity system.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>observability</category>
      <category>logging</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>Detect Malicious Source IPs in CLS Logs with Tencent Security Intelligence</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Thu, 11 Jun 2026 06:49:12 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/detect-malicious-source-ips-in-cls-logs-with-tencent-security-intelligence-285n</link>
      <guid>https://dev.to/tencentcloud-cls/detect-malicious-source-ips-in-cls-logs-with-tencent-security-intelligence-285n</guid>
      <description>&lt;p&gt;Access logs often contain the earliest evidence of attacks. The problem is that an IP address by itself is not enough. Operators need to know whether that source has been associated with attacks, exploitation, web attacks, brute force, or other malicious behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Threat IP Detection&lt;/strong&gt; in Tencent Cloud CLS, jointly released with Tencent Security Keen Lab, is based on Tencent Security threat intelligence from &lt;code&gt;https://tix.qq.com/&lt;/code&gt;. CLS analyzes source IPs in access logs, identifies malicious IPs, and links the result back to business access logs so teams can assess and block risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intelligence source and detection scope
&lt;/h2&gt;

&lt;p&gt;The intelligence library contains &lt;strong&gt;300 million+ security intelligence records&lt;/strong&gt; and processes &lt;strong&gt;more than 3 trillion threat-data records per day&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;After the feature is enabled, CLS automatically analyzes IPs in logs and identifies malicious categories including:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threat category&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Network attack&lt;/td&gt;
&lt;td&gt;Attacks against information systems, infrastructure, computer networks, or personal devices.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exploit&lt;/td&gt;
&lt;td&gt;Abuse of software vulnerabilities to access or damage a system without authorization.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web attack&lt;/td&gt;
&lt;td&gt;Examples include XSS, CSRF, and SQL injection.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brute force&lt;/td&gt;
&lt;td&gt;Attempts to gain account access through repeated password or credential guessing.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When a malicious IP is detected, the system provides threat level, threat classification tags, and related access logs in the current business system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feywrtkc8fmyzcmqpwncm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feywrtkc8fmyzcmqpwncm.png" alt=" " width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The detection dashboard turns threat intelligence into operational context. The visible layout combines summary counts, trend charts, a distribution chart, and a table of detected IPs. Instead of sending operators to a separate intelligence system first, the CLS view starts from business logs and then enriches suspicious sources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ng2gdln7tv5n1vmvny1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ng2gdln7tv5n1vmvny1.png" alt=" " width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The threat profile provides verdict, threat tags, sample records, geographic information, ASN, operator, visit count, and associated samples. The IP is marked as malicious and displays multiple labels such as malicious sample or bot-related risk. In an investigation workflow, this helps decide whether to block, rate-limit, or keep monitoring that IP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Blocking example with CLB
&lt;/h2&gt;

&lt;p&gt;Cloud Load Balancer provides a clear blocking example. After identifying a malicious IP, operators can bind or update a security group to deny that IP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvd9bca14zed6ipke990g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvd9bca14zed6ipke990g.png" alt=" " width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The CLB control plane supports binding a security group to the load balancer path. After the detection result identifies a risky source, attach a security policy and add the malicious IP to a deny rule.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applicable log scenarios
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfc3hq3hzl5jdroi14y3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfc3hq3hzl5jdroi14y3.png" alt=" " width="799" height="304"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Threat IP Detection can analyze several cloud-product access-log sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CLB&lt;/code&gt; access logs;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;COS&lt;/code&gt; access logs;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CDN&lt;/code&gt; access logs;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EdgeOne&lt;/code&gt; access logs;&lt;/li&gt;
&lt;li&gt;cloud-native &lt;code&gt;API Gateway&lt;/code&gt; logs;&lt;/li&gt;
&lt;li&gt;and other access-log sources.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Four usage scenarios are especially relevant:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;How the detection helps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloud-service access security&lt;/td&gt;
&lt;td&gt;Detect malicious IP access to CLB, COS, CDN, EdgeOne, API Gateway, and similar services.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web application security&lt;/td&gt;
&lt;td&gt;Discover malicious IPs visiting websites.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API security&lt;/td&gt;
&lt;td&gt;Identify abusive IP requests and reduce API misuse.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security audit&lt;/td&gt;
&lt;td&gt;Analyze internal traffic and operation logs for abnormal behavior.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Enable Threat IP Detection in CLS
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dx9vrb0bw4ykvzycv09.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dx9vrb0bw4ykvzycv09.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To enable the feature, log in to the CLS console, open the cloud product center, and click &lt;code&gt;Tencent Security | Threat IP Detection&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7lmhre4ptix0sp8a0fsg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7lmhre4ptix0sp8a0fsg.png" alt=" " width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The configuration dialog asks for the log topic and the IP field to analyze. The minimal setup is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;choose the CLS log topic that contains the access logs;&lt;/li&gt;
&lt;li&gt;select the field that stores the source IP;&lt;/li&gt;
&lt;li&gt;confirm the configuration;&lt;/li&gt;
&lt;li&gt;review detected malicious IPs and linked access logs;&lt;/li&gt;
&lt;li&gt;configure an alert policy if teams need proactive notification.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why this is useful in operations
&lt;/h2&gt;

&lt;p&gt;The capability has three operational advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time detection&lt;/strong&gt;: logs do not need preprocessing before analysis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proactive alerting&lt;/strong&gt;: alert policies can notify users when a malicious IP is found.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security collaboration&lt;/strong&gt;: results can work with security groups, firewalls, WAF, and similar controls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, the strongest workflow is closed loop: detect a malicious source from logs, inspect its threat-intelligence profile, review which business endpoints it touched, trigger an alert when needed, and block or mitigate through the relevant security product.&lt;/p&gt;

</description>
      <category>security</category>
      <category>logging</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>Deliver Tencent Cloud CLS Logs to DLC for Spark-Based Analysis</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Thu, 11 Jun 2026 03:27:39 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/deliver-tencent-cloud-cls-logs-to-dlc-for-spark-based-analysis-237d</link>
      <guid>https://dev.to/tencentcloud-cls/deliver-tencent-cloud-cls-logs-to-dlc-for-spark-based-analysis-237d</guid>
      <description>&lt;p&gt;Tencent Cloud CLS already supports log delivery to CKafka and COS. Another delivery target is now available: &lt;strong&gt;DLC&lt;/strong&gt;, Tencent Cloud Data Lake Compute. With this path, logs stored in CLS can be delivered directly into DLC so teams can process and analyze them with Spark.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff53xascs5tz6b5i59ohf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff53xascs5tz6b5i59ohf.png" alt=" " width="800" height="484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A CLS log topic can feed three downstream delivery paths: &lt;code&gt;Data Lake Compute DLC&lt;/code&gt;, &lt;code&gt;Message Queue CKafka&lt;/code&gt;, and &lt;code&gt;Object Storage COS&lt;/code&gt;. DLC is the big-data analysis target to choose when the next step is Spark processing, streaming analysis, machine learning, or graph-style computation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why deliver logs to DLC?
&lt;/h2&gt;

&lt;p&gt;DLC provides two advantages compared with traditional SQL-only processing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Real-time stream processing&lt;/strong&gt;: Spark Streaming can be used for real-time analysis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced Spark libraries&lt;/strong&gt;: Spark includes MLlib for machine learning and GraphX for graph computation. Graph algorithms can support workloads such as relationship analysis in social-network data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This makes the CLS-to-DLC path useful when logs are no longer just operational evidence. They become an input dataset for large-scale analysis pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: open Deliver to DLC from the CLS log topic
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcv9d8x1dyn101r3rn9vq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcv9d8x1dyn101r3rn9vq.png" alt=" " width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From the CLS log topic page, open &lt;code&gt;Deliver to DLC&lt;/code&gt; in the left navigation. This starts the delivery-task configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: choose the DLC database and table
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyh3vrjfzhwh7b073zhy6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyh3vrjfzhwh7b073zhy6.png" alt=" " width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Choose the region, DLC database, and target table. This creates the destination binding between the CLS topic and the DLC table.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: map CLS fields to DLC table fields
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmk7pni7avlkgw9z3z3i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkmk7pni7avlkgw9z3z3i.png" alt=" " width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Field mapping is the most important operational step. Multiple data types are supported. If a CLS log field and a DLC table field use the same name, mapping can be automatic. If field names differ, manually enter the CLS log-field name and map it to the DLC field.&lt;/p&gt;

&lt;p&gt;In practical terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use automatic mapping for same-name fields;&lt;/li&gt;
&lt;li&gt;use manual mapping for renamed fields;&lt;/li&gt;
&lt;li&gt;review data types before confirming the task;&lt;/li&gt;
&lt;li&gt;use the DLC data-type documentation when a field requires type alignment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The DLC data-type documentation is available at &lt;code&gt;https://cloud.tencent.com/document/product/1342/96174&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: configure partition-field mapping
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffg9ik63o2l34arey76vg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffg9ik63o2l34arey76vg.png" alt=" " width="799" height="109"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Partition-field mapping supports three options:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Partition strategy&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Time partition&lt;/td&gt;
&lt;td&gt;Use the CLS log time field for partition mapping.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Other field partition&lt;/td&gt;
&lt;td&gt;Select the corresponding log field and map it to a DLC partition field.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No partition mapping&lt;/td&gt;
&lt;td&gt;Disable the partition-mapping switch when partition mapping is not required.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After the field and partition configuration is complete, click &lt;code&gt;Confirm&lt;/code&gt; to create the delivery task.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this pattern is useful
&lt;/h2&gt;

&lt;p&gt;Use CLS-to-DLC delivery when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;log data must feed Spark jobs;&lt;/li&gt;
&lt;li&gt;real-time stream processing is needed with Spark Streaming;&lt;/li&gt;
&lt;li&gt;teams want to run MLlib-based analysis on operational logs;&lt;/li&gt;
&lt;li&gt;logs need to join a broader data-lake workflow;&lt;/li&gt;
&lt;li&gt;graph processing, such as relationship analysis, is part of the downstream workload.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For lighter asynchronous processing or event streaming, CKafka may still be the better target. For archiving or object-based retention, COS remains the natural delivery target. The value of the DLC path is that the log stream becomes directly available to a Spark-oriented analysis environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Source note: Splunk delivery preview
&lt;/h2&gt;

&lt;p&gt;A future &lt;code&gt;Deliver to Splunk&lt;/code&gt; capability is planned for early June. Splunk becomes another destination for log management and analysis, giving teams more choices for downstream log processing.&lt;/p&gt;

</description>
      <category>logging</category>
      <category>spark</category>
      <category>datalake</category>
      <category>cloud</category>
    </item>
    <item>
      <title>What an Intelligent Observability Maturity Model Means for Cloud Operations</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Thu, 11 Jun 2026 02:39:16 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/what-an-intelligent-observability-maturity-model-means-for-cloud-operations-32hi</link>
      <guid>https://dev.to/tencentcloud-cls/what-an-intelligent-observability-maturity-model-means-for-cloud-operations-32hi</guid>
      <description>&lt;p&gt;Cloud observability is becoming harder because cloud systems are no longer static. Microservices, dynamic topology, cross-team dependencies, and rapidly growing telemetry volume all make traditional operations less predictable. Intelligent technologies, including large models, can help process large-scale observability data and accelerate incident discovery and resolution.&lt;/p&gt;

&lt;p&gt;At the Cloud AI Compute Ignite Forum of the Global Digital Economy Conference, the Cloud Computing Intelligent Observability Capability Maturity Model standard was officially released. The standard is led by the China Academy of Information and Communications Technology, initiated by China Mobile Cloud, and approved by the CCSA TC1 WG5 cloud computing working group.&lt;/p&gt;

&lt;p&gt;This launch defines the overall development direction for cloud operations. Intelligent observability is positioned as a complete capability model, rather than merely a set of tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the standard tries to define
&lt;/h2&gt;

&lt;p&gt;The standard defines key concepts, assessment dimensions, capability levels, and implementation paths for intelligent observability in cloud environments. Its goal is to guide organizations that want to apply intelligent methods to improve cloud-system observability.&lt;/p&gt;

&lt;p&gt;The standard covers two major areas:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Observability capability&lt;/td&gt;
&lt;td&gt;Platform planning, resource design, correlation analysis, data standardization, alert-effectiveness design, data security, observed-object design, metric and threshold design, process design, daily operations, visualization, data validation, and data management.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intelligent capability&lt;/td&gt;
&lt;td&gt;Intelligent data analysis, log analysis, intelligent alert baseline, alert convergence, anomaly detection, trend prediction, root-cause analysis, intelligent optimization suggestions, natural-language interaction, tool calling, memory management, and self-reflection.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The standard contains &lt;strong&gt;6 capability domains, 24 capability items, and more than 200 capability indicators&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxma38rekbsfzc7hf128m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxma38rekbsfzc7hf128m.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model separates capabilities into two layers. The upper layer is &lt;code&gt;intelligent capability&lt;/code&gt;. One side focuses on scenario applications: intelligent data analysis, log analysis, alert baselines, alert convergence, anomaly detection, trend prediction, root-cause analysis, and optimization recommendations. The other side is an "observability intelligence body": natural-language interaction, tool calling, memory management, and self-reflection.&lt;/p&gt;

&lt;p&gt;The lower layer is &lt;code&gt;observability capability&lt;/code&gt;. It begins with planning and design, then moves into daily operations and data management. The data-management section explicitly includes collection, storage, and processing. On the right side, the model ties everything to continuous operations optimization, including platform operations, alert operations, and standardized IT-process operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters to platform teams
&lt;/h2&gt;

&lt;p&gt;The model suggests a practical maturity path:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;first make telemetry reliable and standardized;&lt;/li&gt;
&lt;li&gt;then make data searchable, visual, and alertable;&lt;/li&gt;
&lt;li&gt;then apply intelligent analysis to logs, anomalies, baselines, trends, and root causes;&lt;/li&gt;
&lt;li&gt;finally connect the platform to continuous operational improvement.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That order matters. Large-model-based troubleshooting is much less useful when the underlying log, metric, tracing, alerting, and data-governance layers are inconsistent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where CLS fits in the maturity model
&lt;/h2&gt;

&lt;p&gt;Tencent Cloud CLS is one of the core participating products in the standard work. CLS representatives joined multiple discussions with experts from China Mobile Cloud, ZTE, and other cloud vendors and companies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn97cg5a4fomqwhys06c2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn97cg5a4fomqwhys06c2.png" alt=" " width="800" height="867"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The CLS capability map connects the maturity model to a concrete platform architecture. On the left, data comes from endpoints, online and offline systems, open-source ecosystems, applications, and cloud-product ecosystems. The diagram includes sources such as iOS, Android, webpages, Windows, servers, IDC, Tencent Cloud, AWS, Beats, Log4j, Kubernetes, VictoriaMetrics, Logstash, Fluentd, Logback, OpenTelemetry, syslog, MySQL, Windows events, CVM, TKE, SCF, EKS, CDN, CLB, COS, Oceanus, TDMQ, and cloud development services.&lt;/p&gt;

&lt;p&gt;In the center, CLS provides collection and ingestion through LogListener, Kafka protocol, Prometheus protocol, API, and SDK. It then supports dashboards, charts, alert customization, alert suppression, alert grouping, data processing with 90+ functions, CQL/KQL-compatible search, SQL analysis with 300+ functions, correlation analysis, PromQL, low-frequency log storage, standard log storage, timed SQL, and metric storage.&lt;/p&gt;

&lt;p&gt;Outputs include visualization through DataSight and Grafana, alert channels such as Enterprise WeChat, DingTalk, Feishu, WeChat, email, SMS, custom callbacks, and phone calls, consumption through SCF, Oceanus, Kafka, Spark, Hive, Flink, ClickHouse, and Elasticsearch, plus delivery to COS and CKafka.&lt;/p&gt;

&lt;h2&gt;
  
  
  User examples
&lt;/h2&gt;

&lt;p&gt;Three customer examples show how this capability set is used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NIO used CLS security monitoring capabilities for millisecond-level security monitoring, tagging, desensitization, and an overall log-data security observability platform.&lt;/li&gt;
&lt;li&gt;Beike used CLS search and analysis capabilities to build a new unified observability platform and improve overall business efficiency.&lt;/li&gt;
&lt;li&gt;Lebo used the CLS collection ecosystem for multi-terminal one-stop data collection and reporting, improving full-link observability and user-experience optimization.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical takeaway
&lt;/h2&gt;

&lt;p&gt;For cloud teams, the maturity model is useful because it converts "make observability intelligent" into a capability checklist. A mature platform should not only collect logs and metrics. It should standardize data, support analysis and visualization, provide alert governance, preserve data securely, connect to downstream processing systems, and gradually add intelligent analysis such as anomaly detection, root-cause analysis, tool calling, and natural-language operations.&lt;/p&gt;

</description>
      <category>observability</category>
      <category>cloud</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>AI Agent Observability with OpenClaw: Sessions, Tool Calls, Latency, Errors, and Token Cost in Tencent Cloud CLS</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Wed, 10 Jun 2026 13:14:23 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/monitor-openclaw-cost-operations-sessions-and-security-with-tencent-cloud-cls-5gif</link>
      <guid>https://dev.to/tencentcloud-cls/monitor-openclaw-cost-operations-sessions-and-security-with-tencent-cloud-cls-5gif</guid>
      <description>&lt;p&gt;AI agents are difficult to operate when their behavior is spread across sessions, token usage, operations, model activity, queues, logs, and security-sensitive actions. A cost spike, slow response, repeated failed operation, or risky command is hard to explain unless each agent session can be connected to cost, latency, errors, operation records, and raw logs.&lt;/p&gt;

&lt;p&gt;This guide explains how to use OpenClaw Usage Insights with Tencent Cloud Log Service (CLS) to monitor AI agent cost, operations, sessions, security risks, and log evidence. It focuses on the signals, dashboards, onboarding path, and troubleshooting workflow that help developers and operators understand what happened inside an OpenClaw agent system and where CLS fits as a managed log service for search, analysis, dashboards, and operational visibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use this pattern
&lt;/h2&gt;

&lt;p&gt;Use this pattern when an OpenClaw-based AI agent system needs more than basic application logs. Typical signs include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;token usage or cost increases without a clear session or operation owner;&lt;/li&gt;
&lt;li&gt;agent sessions are difficult to reconstruct after a user complaint;&lt;/li&gt;
&lt;li&gt;operations become slower, fail more often, or create queue backlog;&lt;/li&gt;
&lt;li&gt;operators need to compare cost, latency, errors, sessions, and model usage over time;&lt;/li&gt;
&lt;li&gt;security-sensitive commands or file access need audit records;&lt;/li&gt;
&lt;li&gt;dashboards show an anomaly, but engineers still need raw logs for root cause analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenClaw Usage Insights is built on Tencent Cloud Log Service (CLS). After OpenClaw runtime data is connected to CLS, the system provides prebuilt views for cost governance, operations monitoring, session management, session detail analysis, security audit, and raw log search.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI agent observability signals to collect
&lt;/h2&gt;

&lt;p&gt;Before reviewing dashboards, make sure the logs can connect agent behavior to cost, operations, sessions, and security. The exact field names can follow your application schema, but each event should preserve enough context for CLS log search and dashboard analysis.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal category&lt;/th&gt;
&lt;th&gt;What it helps explain&lt;/th&gt;
&lt;th&gt;Useful fields or dimensions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Session context&lt;/td&gt;
&lt;td&gt;Which session produced an interaction and how the session evolved.&lt;/td&gt;
&lt;td&gt;session identifier, server instance, start time, end time, message count, average turns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost and token usage&lt;/td&gt;
&lt;td&gt;Which sessions, messages, or usage patterns drive token spend.&lt;/td&gt;
&lt;td&gt;total cost, total token usage, average session cost, single-message cost, cost distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operation activity&lt;/td&gt;
&lt;td&gt;What the agent or platform did during execution.&lt;/td&gt;
&lt;td&gt;operation name, command, status, duration, tool invocation count, card distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency and reliability&lt;/td&gt;
&lt;td&gt;Where execution becomes slow or unstable.&lt;/td&gt;
&lt;td&gt;queue backlog, response degradation, execution latency, P95 latency, error growth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session detail&lt;/td&gt;
&lt;td&gt;What happened inside one conversation or task.&lt;/td&gt;
&lt;td&gt;session content, per-turn detail, token usage, problem checks, prompt optimization clues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security and risk&lt;/td&gt;
&lt;td&gt;Whether the agent performed sensitive or high-risk actions.&lt;/td&gt;
&lt;td&gt;high-risk session count, high-risk command execution, sensitive-file access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw log context&lt;/td&gt;
&lt;td&gt;How engineers verify the original event behind a dashboard trend.&lt;/td&gt;
&lt;td&gt;timestamp, instance, filter condition, query statement, raw log content, statistical result&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The important design point is traceability. A dashboard can tell you that cost increased or latency degraded; the log context should let you filter back to the related instance, session, operation, command, or event record.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw Usage Insights and Tencent Cloud CLS workflow
&lt;/h2&gt;

&lt;p&gt;The onboarding flow has three prerequisites:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;OpenClaw is installed and running.&lt;/li&gt;
&lt;li&gt;Tencent Cloud CLS is activated.&lt;/li&gt;
&lt;li&gt;A Tencent Cloud API key is available, including &lt;code&gt;SecretId&lt;/code&gt; and &lt;code&gt;SecretKey&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After the prerequisites are ready, operators open the OpenClaw entry in the CLS Application Center and connect the machines where OpenClaw is running. The workflow supports two deployment paths:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Deployment path&lt;/th&gt;
&lt;th&gt;How it works&lt;/th&gt;
&lt;th&gt;When to use it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tencent Cloud CVM or Lighthouse&lt;/td&gt;
&lt;td&gt;Select uncollected server instances, enter &lt;code&gt;SecretId&lt;/code&gt; and &lt;code&gt;SecretKey&lt;/code&gt;, then let the console complete the installation.&lt;/td&gt;
&lt;td&gt;Use this when OpenClaw runs on Tencent Cloud-hosted machines.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-managed server&lt;/td&gt;
&lt;td&gt;Select the region, enter the API credentials, copy the generated command, and run it on the target server.&lt;/td&gt;
&lt;td&gt;Use this when OpenClaw runs outside Tencent Cloud infrastructure.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After connection, the access-management list becomes the operational inventory. It shows which OpenClaw machines are connected and available for dashboards and log search. From there, operators can select a server instance and open the prebuilt dashboard set.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost monitoring for AI agent sessions
&lt;/h2&gt;

&lt;p&gt;Token cost is one of the first signals teams notice, but total cost alone is not enough for troubleshooting. An OpenClaw operator needs to know whether spend is global, concentrated in a few sessions, caused by a specific interaction pattern, or related to a small group of messages.&lt;/p&gt;

&lt;p&gt;The cost governance dashboard helps break the problem down:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cost view&lt;/th&gt;
&lt;th&gt;What to check&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total cost&lt;/td&gt;
&lt;td&gt;Overall spend trend for the selected OpenClaw instance.&lt;/td&gt;
&lt;td&gt;Confirms whether cost is actually increasing in the observed time range.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total token usage&lt;/td&gt;
&lt;td&gt;Token consumption trend and total token volume.&lt;/td&gt;
&lt;td&gt;Separates token growth from other operational symptoms.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Average session cost&lt;/td&gt;
&lt;td&gt;Typical cost per session.&lt;/td&gt;
&lt;td&gt;Helps identify whether normal sessions became more expensive.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost distribution&lt;/td&gt;
&lt;td&gt;Cost by session, message, or visible usage dimension.&lt;/td&gt;
&lt;td&gt;Finds high-cost sessions or interaction patterns that deserve inspection.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-message cost&lt;/td&gt;
&lt;td&gt;Cost at a more granular interaction level.&lt;/td&gt;
&lt;td&gt;Helps narrow a session-level spike to a specific turn or message.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A practical investigation usually starts with a cost trend, then moves to high-cost sessions, then opens session detail or raw log search to verify what actually happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operations monitoring for latency, failures, and abnormal activity
&lt;/h2&gt;

&lt;p&gt;AI agent reliability is not only about final answers. Operators also need to watch the runtime path: message processing, queue behavior, response time, execution latency, error growth, and repeated abnormal activity.&lt;/p&gt;

&lt;p&gt;The operations monitoring dashboard is useful when the symptom is operational rather than financial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;queue backlog indicates that work is waiting longer than expected;&lt;/li&gt;
&lt;li&gt;response degradation suggests that users may experience slower answers;&lt;/li&gt;
&lt;li&gt;error growth points to instability in the agent workflow or runtime path;&lt;/li&gt;
&lt;li&gt;P95 execution latency helps expose slow-tail behavior that average latency can hide;&lt;/li&gt;
&lt;li&gt;card distribution, log series, and runtime metrics help operators compare behavior across time windows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When latency or errors rise, the dashboard should be treated as the starting point. The next step is to filter the related raw logs by instance, session, time range, condition, or query statement so the team can inspect the original event records.&lt;/p&gt;

&lt;h2&gt;
  
  
  Session analysis for reconstructing agent behavior
&lt;/h2&gt;

&lt;p&gt;Session management is the bridge between user-facing behavior and system-level signals. A session view helps answer questions such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many sessions are active or historical in the selected scope?&lt;/li&gt;
&lt;li&gt;How many turns does a typical session contain?&lt;/li&gt;
&lt;li&gt;Which sessions contain frequent tool invocations or unusual interaction patterns?&lt;/li&gt;
&lt;li&gt;Which channels or models are involved in the observed usage?&lt;/li&gt;
&lt;li&gt;Which session should be opened when investigating cost, latency, errors, or risky actions?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The session detail dashboard adds a more focused troubleshooting layer. Operators can open a session from the session overview by selecting a session identifier or session content row. They can also open the session-detail dashboard directly and filter by server instance and session ID.&lt;/p&gt;

&lt;p&gt;For incident review, this matters because a single user complaint or abnormal cost event is rarely explained by one aggregate chart. The session detail view lets teams reconstruct the interaction path, inspect per-turn details, review token usage, check problem indicators, and identify prompt optimization clues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security audit for risky operations
&lt;/h2&gt;

&lt;p&gt;AI agent systems can execute commands, touch files, and perform actions that need review. The security audit view focuses on security-sensitive behavior rather than normal product usage.&lt;/p&gt;

&lt;p&gt;Use the security audit dashboard to check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;high-risk sessions that need review;&lt;/li&gt;
&lt;li&gt;high-risk command execution;&lt;/li&gt;
&lt;li&gt;sensitive-file access;&lt;/li&gt;
&lt;li&gt;whether a risky action can be connected back to a session or operation;&lt;/li&gt;
&lt;li&gt;whether the original log record supports the dashboard-level security signal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially useful when a team needs an audit trail. The goal is not only to count risky events, but to connect each event to enough context for review: which session it appeared in, what operation or command was involved, and what raw log evidence is available.&lt;/p&gt;

&lt;h2&gt;
  
  
  Raw log search for root cause analysis
&lt;/h2&gt;

&lt;p&gt;Dashboards are good for trends and outliers. Raw log search is where the team verifies the actual event.&lt;/p&gt;

&lt;p&gt;Inside the OpenClaw application page in CLS, operators can open Log Search, select a server instance, add filter conditions, or use AI-assisted query statement generation. The result keeps raw logs and statistical analysis together, which supports a practical investigation loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Notice a cost, latency, error, session, or security anomaly in a dashboard.&lt;/li&gt;
&lt;li&gt;Identify the related instance, session, time range, operation, command, or condition.&lt;/li&gt;
&lt;li&gt;Open log search and filter for the relevant records.&lt;/li&gt;
&lt;li&gt;Compare raw events with the dashboard trend.&lt;/li&gt;
&lt;li&gt;Decide whether the issue is a cost pattern, runtime failure, slow operation, risky action, or session-specific behavior.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This log-first evidence path is what makes the dashboards actionable. Without raw records, a chart can show that something changed, but it cannot prove why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting flow: from symptom to source log
&lt;/h2&gt;

&lt;p&gt;Use the dashboards and log search together instead of treating any single view as the final answer.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;First check&lt;/th&gt;
&lt;th&gt;Next step in CLS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Token cost increases&lt;/td&gt;
&lt;td&gt;Review total cost, total token usage, average session cost, and high-cost sessions.&lt;/td&gt;
&lt;td&gt;Filter logs by instance, session, message, time range, or visible cost dimension.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent responses become slow&lt;/td&gt;
&lt;td&gt;Check queue backlog, response degradation, and P95 execution latency.&lt;/td&gt;
&lt;td&gt;Compare operation records and raw logs in the affected time window.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Errors increase&lt;/td&gt;
&lt;td&gt;Review error growth and related runtime metrics.&lt;/td&gt;
&lt;td&gt;Search raw logs for the related condition, status, or event records.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A user reports an abnormal session&lt;/td&gt;
&lt;td&gt;Open session management, then drill into the related session detail.&lt;/td&gt;
&lt;td&gt;Reconstruct the session in order and inspect per-turn cost, operations, and problem checks.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A risky action appears&lt;/td&gt;
&lt;td&gt;Check security audit records for high-risk sessions, commands, or sensitive-file access.&lt;/td&gt;
&lt;td&gt;Inspect the linked session or log records to verify the event context.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dashboard trend is unclear&lt;/td&gt;
&lt;td&gt;Identify the instance and time range behind the trend.&lt;/td&gt;
&lt;td&gt;Use log search with conditions or AI-assisted query statements to inspect raw records.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Common pitfalls
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Looking only at total cost without breaking it down by session, message, or usage pattern.&lt;/li&gt;
&lt;li&gt;Treating a dashboard trend as the final answer without checking raw logs.&lt;/li&gt;
&lt;li&gt;Connecting OpenClaw machines but not confirming that the access-management list shows the expected instances.&lt;/li&gt;
&lt;li&gt;Reviewing session volume without drilling into session details for abnormal behavior.&lt;/li&gt;
&lt;li&gt;Counting risky operations without preserving enough context for audit review.&lt;/li&gt;
&lt;li&gt;Ignoring P95 latency and queue backlog when users report slow responses.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What should I check when AI agent token cost suddenly increases?
&lt;/h3&gt;

&lt;p&gt;Start with total cost and total token usage, then review average session cost and cost distribution. If a small number of sessions or messages account for the increase, open session detail and raw log search to verify what happened.&lt;/p&gt;

&lt;h3&gt;
  
  
  How can I trace what happened inside one OpenClaw agent session?
&lt;/h3&gt;

&lt;p&gt;Use session management to locate the relevant session, then open the session detail view by session identifier or by filtering for the server instance and session ID. Review the interaction path, per-turn details, token usage, problem checks, and related log records.&lt;/p&gt;

&lt;h3&gt;
  
  
  What logs are useful for AI agent observability?
&lt;/h3&gt;

&lt;p&gt;Useful logs connect session context, token usage, cost, operations, latency, errors, risky commands, sensitive-file access, and raw event records. The exact schema can vary, but the records should let operators move from a dashboard trend back to the original event.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do dashboards still need raw log search?
&lt;/h3&gt;

&lt;p&gt;Dashboards summarize cost, operations, sessions, and security signals. Raw log search provides the evidence layer. When investigating cost spikes, latency degradation, error growth, or risky actions, raw logs help verify the cause behind the trend.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I use Tencent Cloud Log Service for OpenClaw monitoring?
&lt;/h3&gt;

&lt;p&gt;Use Tencent Cloud Log Service (CLS) when OpenClaw operations need searchable logs, cost governance dashboards, runtime monitoring, session analysis, security audit views, and raw-log troubleshooting in one managed log service.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I investigate slow or failed OpenClaw operations?
&lt;/h3&gt;

&lt;p&gt;Start with operations monitoring. Check queue backlog, response degradation, execution latency, P95 latency, and error growth. Then use CLS log search to inspect the affected instance and time range so the team can review the original records.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final checklist
&lt;/h2&gt;

&lt;p&gt;Before relying on OpenClaw Usage Insights for production monitoring, verify that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw is running on the target machine;&lt;/li&gt;
&lt;li&gt;Tencent Cloud CLS has been activated;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SecretId&lt;/code&gt; and &lt;code&gt;SecretKey&lt;/code&gt; are available for onboarding;&lt;/li&gt;
&lt;li&gt;Tencent Cloud-hosted or self-managed servers are connected through the correct path;&lt;/li&gt;
&lt;li&gt;the access-management list shows the expected OpenClaw instances;&lt;/li&gt;
&lt;li&gt;cost, operations, session, session detail, and security audit dashboards are populated;&lt;/li&gt;
&lt;li&gt;raw log search can filter the records needed for investigation;&lt;/li&gt;
&lt;li&gt;cost, latency, error, session, and security reviews can move from dashboard trend to raw event evidence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For AI agent operations, the most useful observability system is not only a set of charts. It is a path from symptom to session, from session to operation, and from operation to raw log evidence. OpenClaw Usage Insights and Tencent Cloud CLS provide that path for teams that need cost control, runtime monitoring, session reconstruction, security audit, and practical troubleshooting.&lt;/p&gt;

</description>
      <category>aiobservability</category>
      <category>llm</category>
      <category>observability</category>
      <category>logging</category>
    </item>
    <item>
      <title>Collect Logs from a Self-Managed Kubernetes Cluster into Tencent Cloud CLS</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Wed, 10 Jun 2026 08:36:08 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/collect-logs-from-a-self-managed-kubernetes-cluster-into-tencent-cloud-cls-4m3k</link>
      <guid>https://dev.to/tencentcloud-cls/collect-logs-from-a-self-managed-kubernetes-cluster-into-tencent-cloud-cls-4m3k</guid>
      <description>&lt;p&gt;Self-managed Kubernetes clusters do not automatically inherit the console-driven log collection experience of managed TKE clusters. The source article explains how Tencent Cloud CLS can collect logs from a self-managed Kubernetes cluster by using a Kubernetes CRD named &lt;code&gt;LogConfig&lt;/code&gt; and three components: &lt;code&gt;Log-Provisioner&lt;/code&gt;, &lt;code&gt;Log-Agent&lt;/code&gt;, and &lt;code&gt;LogListener&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For TKE users, the source points to the TKE log-collection document and console path. This article focuses on the self-managed Kubernetes path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;The source article lists four prerequisites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes cluster version &lt;code&gt;1.10&lt;/code&gt; or later;&lt;/li&gt;
&lt;li&gt;CLS enabled, with a logset and log topic already created;&lt;/li&gt;
&lt;li&gt;the CLS topic ID, &lt;code&gt;topicId&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;the region endpoint for the log topic, &lt;code&gt;CLS_HOST&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;Tencent Cloud API credentials required for CLS-side authentication: &lt;code&gt;TmpSecretId&lt;/code&gt; and &lt;code&gt;TmpSecretKey&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How the collection architecture works
&lt;/h2&gt;

&lt;p&gt;The Kubernetes deployment includes one custom resource and three runtime components.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role from the source article&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LogConfig&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Defines where logs are collected from, how they are parsed, and which CLS log topic receives them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Log-Provisioner&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Synchronizes the log collection configuration defined in &lt;code&gt;LogConfig&lt;/code&gt; to the CLS side.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Log-Agent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Watches &lt;code&gt;LogConfig&lt;/code&gt; and container changes on nodes, then calculates the real host-machine path of container log files.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;LogListener&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Collects matching log files from the host path, parses them, and uploads them to CLS.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Deployment flow
&lt;/h2&gt;

&lt;p&gt;The source article uses this sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define the &lt;code&gt;LogConfig&lt;/code&gt; resource type with a CRD.&lt;/li&gt;
&lt;li&gt;Define a &lt;code&gt;LogConfig&lt;/code&gt; object.&lt;/li&gt;
&lt;li&gt;Create the &lt;code&gt;LogConfig&lt;/code&gt; object.&lt;/li&gt;
&lt;li&gt;Configure the CLS authentication &lt;code&gt;ConfigMap&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Deploy &lt;code&gt;Log-Provisioner&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Deploy &lt;code&gt;Log-Agent&lt;/code&gt; and &lt;code&gt;LogListener&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Search the collected logs in the CLS console.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 1: define the LogConfig CRD
&lt;/h2&gt;

&lt;p&gt;Using &lt;code&gt;/usr/local/&lt;/code&gt; on the master node as the example path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://mirrors.tencent.com/install/cls/k8s/CRD.yaml
kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; /usr/local/CRD.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: define the LogConfig object
&lt;/h2&gt;

&lt;p&gt;Download the sample declaration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://mirrors.tencent.com/install/cls/k8s/LogConfig.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The source explains that &lt;code&gt;LogConfig.yaml&lt;/code&gt; has two main parts:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Section&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;clsDetail&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Defines the log parsing format and target CLS &lt;code&gt;topicId&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;inputDetail&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Defines the log source: where the logs are collected from.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Replace &lt;code&gt;clsDetail.topicId&lt;/code&gt; with the real topic ID created in CLS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported parsing formats
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Single-line full text
&lt;/h3&gt;

&lt;p&gt;Use this when one line is one complete log entry. CLS stores the line in &lt;code&gt;__CONTENT__&lt;/code&gt; and does not extract fields.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clsDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;topicId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xxxxxx-xx-xx-xx-xxxxxxxx&lt;/span&gt;
    &lt;span class="na"&gt;logType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;minimalist_log&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example collected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;__CONTENT__: Tue Jan 22 12:08:15 CST 2019 Installed: libjpeg-turbo-static-1.2.90-6.el7.x86_64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-line full text
&lt;/h3&gt;

&lt;p&gt;Use this for logs such as Java stack traces. The source uses a beginning-line regex so a timestamped line starts a new log event, and later stack-trace lines are appended to the current event.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clsDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;topicId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xxxxxx-xx-xx-xx-xxxxxxxx&lt;/span&gt;
    &lt;span class="na"&gt;logType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;multiline_log&lt;/span&gt;
    &lt;span class="na"&gt;extractRule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;beginningRegex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s.+'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Single-line full regex
&lt;/h3&gt;

&lt;p&gt;Use this when a complete single-line log should be parsed into multiple key-value fields.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clsDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;topicId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xxxxxx-xx-xx-xx-xxxxxxxx&lt;/span&gt;
    &lt;span class="na"&gt;logType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fullregex_log&lt;/span&gt;
    &lt;span class="na"&gt;extractRule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;logRegex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(\S+)[^\[]+(\[[^:]+:\d+:\d+:\d+\s\S+)\s"(\w+)\s(\S+)\s([^"]+)"\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s"([^"]+)"\s"([^"]+)"\s+(\S+)\s(\S+).*'&lt;/span&gt;
      &lt;span class="na"&gt;beginningRegex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(\S+)[^\[]+(\[[^:]+:\d+:\d+:\d+\s\S+)\s"(\w+)\s(\S+)\s([^"]+)"\s(\S+)\s(\d+)\s(\d+)\s(\d+)\s"([^"]+)"\s"([^"]+)"\s+(\S+)\s(\S+).*'&lt;/span&gt;
      &lt;span class="na"&gt;keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;remote_addr&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;time_local&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;request_method&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;request_url&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;http_protocol&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;http_host&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;status&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;request_length&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;body_bytes_sent&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;http_referer&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;http_user_agent&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;request_time&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;upstream_response_time&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-line full regex
&lt;/h3&gt;

&lt;p&gt;Use this when one structured log event spans multiple lines and fields still need to be extracted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clsDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;topicId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xxxxxx-xx-xx-xx-xxxxxxxx&lt;/span&gt;
    &lt;span class="na"&gt;logType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;multiline_fullregex_log&lt;/span&gt;
    &lt;span class="na"&gt;extractRule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;beginningRegex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\[\d+-\d+-\w+:\d+:\d+,\d+\]\s\[\w+\]\s.*'&lt;/span&gt;
      &lt;span class="na"&gt;logRegex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\[(\d+-\d+-\w+:\d+:\d+,\d+)\]\s\[(\w+)\]\s(.*)'&lt;/span&gt;
      &lt;span class="na"&gt;keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;time&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;level&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;msg&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  JSON logs
&lt;/h3&gt;

&lt;p&gt;For JSON logs, CLS extracts first-level keys as fields.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clsDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;topicId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xxxxxx-xx-xx-xx-xxxxxxxx&lt;/span&gt;
    &lt;span class="na"&gt;logType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;json_log&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Delimiter logs
&lt;/h3&gt;

&lt;p&gt;For delimiter logs, define the delimiter and the keys that map to each segment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;clsDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;topicId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xxxxxx-xx-xx-xx-xxxxxxxx&lt;/span&gt;
    &lt;span class="na"&gt;logType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;delimiter_log&lt;/span&gt;
    &lt;span class="na"&gt;extractRule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;delimiter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;:::'&lt;/span&gt;
      &lt;span class="na"&gt;keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;IP&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;time&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;request&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;host&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;status&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;length&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;bytes&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;referer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Supported Kubernetes log sources
&lt;/h2&gt;

&lt;p&gt;The source article gives three source types.&lt;/p&gt;

&lt;h3&gt;
  
  
  Container stdout
&lt;/h3&gt;

&lt;p&gt;Collect all container stdout logs in the &lt;code&gt;default&lt;/code&gt; namespace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;inputDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;container_stdout&lt;/span&gt;
    &lt;span class="na"&gt;containerStdout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
      &lt;span class="na"&gt;allContainers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Collect stdout from the &lt;code&gt;ingress-gateway&lt;/code&gt; deployment in the &lt;code&gt;production&lt;/code&gt; namespace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;inputDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;container_stdout&lt;/span&gt;
    &lt;span class="na"&gt;containerStdout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;allContainers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
      &lt;span class="na"&gt;workloads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
          &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress-gateway&lt;/span&gt;
          &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deployment&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Collect stdout from pods labeled &lt;code&gt;k8s-app=nginx&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;inputDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;container_stdout&lt;/span&gt;
    &lt;span class="na"&gt;containerStdout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
      &lt;span class="na"&gt;allContainers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
      &lt;span class="na"&gt;includeLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;k8s-app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Container files
&lt;/h3&gt;

&lt;p&gt;Collect &lt;code&gt;/data/nginx/log/access.log&lt;/code&gt; from the &lt;code&gt;nginx&lt;/code&gt; container in the &lt;code&gt;ingress-gateway&lt;/code&gt; deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;topicId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xxxxxx-xx-xx-xx-xxxxxxxx&lt;/span&gt;
  &lt;span class="na"&gt;inputDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;container_file&lt;/span&gt;
    &lt;span class="na"&gt;containerFile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
      &lt;span class="na"&gt;workload&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress-gateway&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deployment&lt;/span&gt;
      &lt;span class="na"&gt;container&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
      &lt;span class="na"&gt;logPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/data/nginx/log&lt;/span&gt;
      &lt;span class="na"&gt;filePattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;access.log&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Collect the same file path from pods with label &lt;code&gt;k8s-app=ingress-gateway&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;inputDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;container_file&lt;/span&gt;
    &lt;span class="na"&gt;containerFile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
      &lt;span class="na"&gt;includeLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;k8s-app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ingress-gateway&lt;/span&gt;
      &lt;span class="na"&gt;container&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx&lt;/span&gt;
      &lt;span class="na"&gt;logPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/data/nginx/log&lt;/span&gt;
      &lt;span class="na"&gt;filePattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;access.log&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Host files
&lt;/h3&gt;

&lt;p&gt;Collect every &lt;code&gt;.log&lt;/code&gt; file under &lt;code&gt;/data&lt;/code&gt; on the host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cls.cloud.tencent.com/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LogConfig&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;inputDetail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host_file&lt;/span&gt;
    &lt;span class="na"&gt;hostFile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;logPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/data&lt;/span&gt;
      &lt;span class="na"&gt;filePattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*.log'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: create the LogConfig object
&lt;/h2&gt;

&lt;p&gt;After editing &lt;code&gt;LogConfig.yaml&lt;/code&gt;, create the object:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; /usr/local/LogConfig.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: configure CLS authentication
&lt;/h2&gt;

&lt;p&gt;Download the sample &lt;code&gt;ConfigMap&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://mirrors.tencent.com/install/cls/k8s/ConfigMap.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;TmpSecretId&lt;/code&gt; and &lt;code&gt;TmpSecretKey&lt;/code&gt; to the API key ID and API key value used for CLS authentication. Then create it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; /usr/local/ConfigMap.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: deploy Log-Provisioner
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Log-Provisioner&lt;/code&gt; discovers and watches the log topic ID, collection rule, and file path in &lt;code&gt;LogConfig&lt;/code&gt;, then synchronizes that configuration to CLS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://mirrors.tencent.com/install/cls/k8s/Log-Provisioner.yaml
kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; /usr/local/Log-Provisioner.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before applying the file, set the &lt;code&gt;CLS_HOST&lt;/code&gt; environment variable in &lt;code&gt;Log-Provisioner.yaml&lt;/code&gt; to the endpoint of the target CLS topic region.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: deploy Log-Agent and LogListener
&lt;/h2&gt;

&lt;p&gt;The source separates responsibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Log-Agent&lt;/code&gt; pulls log-source information from &lt;code&gt;LogConfig&lt;/code&gt; and calculates the absolute host path for container logs.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LogListener&lt;/code&gt; collects and parses files from that host path, then uploads them to CLS.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://mirrors.tencent.com/install/cls/k8s/Log-Agent.yaml
kubectl create &lt;span class="nt"&gt;-f&lt;/span&gt; /usr/local/Log-Agent.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the host Docker root is not &lt;code&gt;/var/lib/docker&lt;/code&gt;, update the &lt;code&gt;Log-Agent.yaml&lt;/code&gt; volume mapping. The source screenshot shows &lt;code&gt;/data/docker&lt;/code&gt; mounted into the container as an example.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hodi0axbeh6i158fww4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7hodi0axbeh6i158fww4.png" alt=" " width="800" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In English, the highlighted YAML is saying: when Docker data lives under &lt;code&gt;/data/docker&lt;/code&gt; on the host, mount that path into the Log-Agent container so the agent can map container log files back to their real host locations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 7: verify logs in the CLS console
&lt;/h2&gt;

&lt;p&gt;After CRD creation, &lt;code&gt;LogConfig&lt;/code&gt; creation, authentication, &lt;code&gt;Log-Provisioner&lt;/code&gt;, &lt;code&gt;Log-Agent&lt;/code&gt;, and &lt;code&gt;LogListener&lt;/code&gt; are all deployed, open the CLS log search page for the target topic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ufci3p2lp83cimotxrb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ufci3p2lp83cimotxrb.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes logs have been collected and can be searched in the CLS console. The top area is the histogram over time; the lower area displays matching raw log events.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Confirm Kubernetes version is &lt;code&gt;1.10&lt;/code&gt; or later.&lt;/li&gt;
&lt;li&gt;Create a CLS logset and log topic, then record &lt;code&gt;topicId&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Find the right regional &lt;code&gt;CLS_HOST&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Prepare &lt;code&gt;TmpSecretId&lt;/code&gt; and &lt;code&gt;TmpSecretKey&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Create the &lt;code&gt;LogConfig&lt;/code&gt; CRD.&lt;/li&gt;
&lt;li&gt;Choose a parsing format: single-line text, multi-line text, full regex, multi-line full regex, JSON, or delimiter.&lt;/li&gt;
&lt;li&gt;Choose the log source: container stdout, container file, or host file.&lt;/li&gt;
&lt;li&gt;Apply &lt;code&gt;LogConfig&lt;/code&gt;, &lt;code&gt;ConfigMap&lt;/code&gt;, &lt;code&gt;Log-Provisioner&lt;/code&gt;, and &lt;code&gt;Log-Agent&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If Docker is not rooted at &lt;code&gt;/var/lib/docker&lt;/code&gt;, mount the actual Docker root path into the agent.&lt;/li&gt;
&lt;li&gt;Verify collected logs in CLS search.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>logging</category>
      <category>devops</category>
      <category>observability</category>
    </item>
    <item>
      <title>Monitor CDN Performance with Real-Time CLS Log Analysis</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Wed, 10 Jun 2026 08:10:13 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/monitor-cdn-performance-with-real-time-cls-log-analysis-1om4</link>
      <guid>https://dev.to/tencentcloud-cls/monitor-cdn-performance-with-real-time-cls-log-analysis-1om4</guid>
      <description>&lt;p&gt;A CDN is a performance layer, but its logs are also an operations dataset. Every request can reveal latency, cache behavior, response code, client distribution, traffic volume, and download speed. The source article explains how Tencent Cloud CDN logs can be delivered into Tencent Cloud CLS and analyzed in real time.&lt;/p&gt;

&lt;p&gt;The original problem is familiar: CDN providers expose basic metrics such as request count and bandwidth, but default metrics are not enough for customized troubleshooting. Teams often download raw CDN logs for offline analysis. That approach has two drawbacks from the source article: it adds operations and development cost, and the data is not truly real time. Delays of more than half an hour are common in offline workflows.&lt;/p&gt;

&lt;p&gt;The CDN-to-CLS path is designed for interactive analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one-click log delivery;&lt;/li&gt;
&lt;li&gt;second-level analysis for very large log volumes;&lt;/li&gt;
&lt;li&gt;real-time dashboard visualization;&lt;/li&gt;
&lt;li&gt;one-minute real-time alerting.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CDN log fields that matter
&lt;/h2&gt;

&lt;p&gt;The source article lists the CDN log schema. The key fields are:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;CLS type&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;app_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;Tencent Cloud account APPID.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;client_ip&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Client IP address.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;file_size&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;File size.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;hit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Cache HIT or MISS. Edge-node and parent-node hits are both marked as HIT.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;host&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Domain name.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;http_code&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;HTTP status code.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;isp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Carrier or ISP.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;method&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;HTTP method.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;param&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;URL parameters.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;proto&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;HTTP protocol identifier.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;prov&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Carrier province.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;referer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;HTTP referer.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request_range&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Range request parameter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request_time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;Response time in milliseconds, from node receiving the request to completing response delivery to the client.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request_port&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;Client-to-CDN-node connection port, or &lt;code&gt;-&lt;/code&gt; if unavailable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;rsp_size&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;Response bytes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;Request time as a UNIX timestamp in seconds.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ua&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;User-Agent.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;url&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Request path.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;uuid&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Unique request identifier.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;version&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;CDN real-time log version.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Scenario 1: alert when CDN latency exceeds a threshold
&lt;/h2&gt;

&lt;p&gt;The source recommends percentiles instead of simple averages or individual samples. Averages can hide a small but important set of slow requests, while individual samples are too noisy. The example computes average latency, P50, and P99 over a one-day window represented by 1440 five-minute buckets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;approx_percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;approx_percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;time_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__TIMESTAMP__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'5m'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'%Y-%m-%d %H:%i:%s'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1440&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5cdo92uvcnps1dr16ch5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5cdo92uvcnps1dr16ch5.png" alt=" " width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Chinese chart in this screenshot translates to: compare average latency, P50, and P99 across time. The operational value is that P99 reveals the long-tail experience even when the average line looks acceptable.&lt;/p&gt;

&lt;p&gt;The alert condition in the source is based on P99 latency greater than &lt;code&gt;100 ms&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;approx_percentile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;p99&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmrkixsom5bv5xnfkgxz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpmrkixsom5bv5xnfkgxz.png" alt=" " width="799" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The screenshot is the alert-condition configuration. In English, the rule computes &lt;code&gt;p99&lt;/code&gt; from &lt;code&gt;request_time&lt;/code&gt; and triggers when the configured condition, such as P99 greater than 100 ms, is met.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslnqzxtkxiseimdbzwdq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslnqzxtkxiseimdbzwdq.png" alt=" " width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This image shows multidimensional analysis settings. The source says the alert message should display affected &lt;code&gt;host&lt;/code&gt;, &lt;code&gt;url&lt;/code&gt;, and &lt;code&gt;client_ip&lt;/code&gt;, so developers can quickly determine which domain, path, and client segment are involved.&lt;/p&gt;

&lt;p&gt;Once the alert fires, the key information can be delivered immediately through channels such as WeChat, Enterprise WeChat, or SMS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scenario 2: alert when resource access errors spike
&lt;/h2&gt;

&lt;p&gt;The source's second alert scenario is error-count growth. If page-access errors suddenly increase, the backend server may be failing or the service may be overloaded.&lt;/p&gt;

&lt;p&gt;The source compares the latest one-minute error count with the previous one-minute count. Latest minute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'minute'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;__TIMESTAMP__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;errct&lt;/span&gt;
      &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;http_code&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;
      &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;
      &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
      &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
  &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Previous minute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
  &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;date_trunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'minute'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;__TIMESTAMP__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;errct&lt;/span&gt;
      &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;http_code&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;
      &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;
      &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
      &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;ASC&lt;/span&gt;
  &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trigger expression from the source is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$2.errct - $1.errct &amp;gt; 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compare two query results in the alert policy. &lt;code&gt;$2.errct&lt;/code&gt; is the latest minute's error count, &lt;code&gt;$1.errct&lt;/code&gt; is the previous minute's error count, and the alert fires when the increase is greater than the selected threshold.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build CDN quality and performance dashboards
&lt;/h2&gt;

&lt;p&gt;The source article then turns CDN logs into dashboard metrics.&lt;/p&gt;

&lt;h3&gt;
  
  
  Health score
&lt;/h3&gt;

&lt;p&gt;Health is defined as the percentage of requests whose &lt;code&gt;http_code&lt;/code&gt; is below 500:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;http_code&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;double&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"health"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The panel means: all or nearly all sampled requests returned HTTP status codes below 500 during the selected time range.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache hit rate
&lt;/h3&gt;

&lt;p&gt;Cache hit rate is calculated among successful responses below 400:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;http_code&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;hit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'hit'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt; &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;double&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"cache hit rate"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This panel helps operators see whether traffic is being served from CDN cache or falling back to origin paths.&lt;/p&gt;

&lt;h3&gt;
  
  
  Average download speed
&lt;/h3&gt;

&lt;p&gt;Average download speed is total downloaded data divided by total request time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rsp_size&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_time&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"average download speed (kb/s)"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The panel is converting &lt;code&gt;rsp_size&lt;/code&gt; from bytes to KB and &lt;code&gt;request_time&lt;/code&gt; from milliseconds to seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  ISP-level download analytics
&lt;/h3&gt;

&lt;p&gt;The source uses &lt;code&gt;ip_to_provider(client_ip)&lt;/code&gt; to map client IPs to carriers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;ip_to_provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_ip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;isp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rsp_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"download speed (KB/s)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rsp_size&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"total download volume (MB)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;isp&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each ISP, show request count, total downloaded traffic, and computed download speed. This helps compare CDN quality across carriers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Latency distribution buckets
&lt;/h3&gt;

&lt;p&gt;The source groups requests into custom latency windows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="k"&gt;CASE&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;request_time&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'~5s'&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;request_time&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;6000&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'5s~6s'&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;request_time&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;7000&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'6s~7s'&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;request_time&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'7~8s'&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;request_time&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'8~10s'&lt;/span&gt;
    &lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;request_time&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;15000&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'10~15s'&lt;/span&gt;
    &lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'15s~'&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;latency&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of a single average, the panel shows how many requests fall into each duration range.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical monitoring plan
&lt;/h2&gt;

&lt;p&gt;Start with three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Latency alerting&lt;/strong&gt;: use P99 request latency and include affected &lt;code&gt;host&lt;/code&gt;, &lt;code&gt;url&lt;/code&gt;, and &lt;code&gt;client_ip&lt;/code&gt; in the alert message.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error-growth alerting&lt;/strong&gt;: compare the latest one-minute &lt;code&gt;http_code &amp;gt;= 400&lt;/code&gt; count with the previous minute.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance dashboards&lt;/strong&gt;: track health, cache hit rate, average download speed, ISP-level performance, and latency distribution.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This source-backed setup turns CDN access logs into an operations console: first alert on the abnormal condition, then use the same CLS dataset to explain which domain, path, ISP, client segment, or cache behavior is responsible.&lt;/p&gt;

</description>
      <category>cdn</category>
      <category>logging</category>
      <category>observability</category>
      <category>sql</category>
    </item>
    <item>
      <title>Analyze CLB Access Logs in Tencent Cloud CLS</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Wed, 10 Jun 2026 07:37:22 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/analyze-clb-access-logs-in-tencent-cloud-cls-2f8i</link>
      <guid>https://dev.to/tencentcloud-cls/analyze-clb-access-logs-in-tencent-cloud-cls-2f8i</guid>
      <description>&lt;p&gt;Cloud Load Balancer access logs answer a question that ordinary application logs often cannot: what happened between the client, the load balancer, and the real server?&lt;/p&gt;

&lt;p&gt;The source article focuses on Layer 7 CLB access logs and shows how to send them into Tencent Cloud CLS for search, SQL analysis, dashboards, and alerts. It is especially useful when a small number of requests fail under high QPS, when backend servers do not see a request, or when application-side &lt;code&gt;response_time&lt;/code&gt; looks normal but users still report slow requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common CLB troubleshooting questions
&lt;/h2&gt;

&lt;p&gt;The original article groups CLB log use cases into troubleshooting and statistical analysis.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Source-backed question&lt;/th&gt;
&lt;th&gt;Why CLB logs help&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Exception localization&lt;/td&gt;
&lt;td&gt;Under high QPS, a few client requests fail and the real server does not receive them. Did the load balancer receive the requests?&lt;/td&gt;
&lt;td&gt;CLB access logs show the load-balancer-side request record.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency diagnosis&lt;/td&gt;
&lt;td&gt;End users report slow requests, while the real server's &lt;code&gt;response_time&lt;/code&gt; is normal. Where was time spent?&lt;/td&gt;
&lt;td&gt;CLB logs expose &lt;code&gt;request_time&lt;/code&gt;, &lt;code&gt;upstream_response_time&lt;/code&gt;, &lt;code&gt;upstream_connect_time&lt;/code&gt;, and &lt;code&gt;upstream_header_time&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layer 7 incident scope&lt;/td&gt;
&lt;td&gt;Internal Layer 7 requests fail during a time window. Which part of the path is abnormal?&lt;/td&gt;
&lt;td&gt;Logs can be filtered by CLB VIP, listener port, server name, upstream address, status code, and request.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol analysis&lt;/td&gt;
&lt;td&gt;HTTP/2 is enabled, but teams need to know whether it is actually being used.&lt;/td&gt;
&lt;td&gt;The &lt;code&gt;protocol_type&lt;/code&gt; and &lt;code&gt;server_protocol&lt;/code&gt; fields can be analyzed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Traffic distribution&lt;/td&gt;
&lt;td&gt;Core domains are distributed across different CLB instances. What is the request share by instance?&lt;/td&gt;
&lt;td&gt;Queries can group by &lt;code&gt;server_addr&lt;/code&gt;, &lt;code&gt;server_name&lt;/code&gt;, &lt;code&gt;http_host&lt;/code&gt;, or other dimensions.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Onboarding path 1: enable logs for one Layer 7 instance
&lt;/h2&gt;

&lt;p&gt;For a single CLB instance, the source article uses this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select the target Layer 7 CLB instance.&lt;/li&gt;
&lt;li&gt;Click the edit icon.&lt;/li&gt;
&lt;li&gt;Enable the access-log switch.&lt;/li&gt;
&lt;li&gt;Select the target CLS logset and log topic.&lt;/li&gt;
&lt;li&gt;If no suitable logset or topic exists, create one from the access-log page.&lt;/li&gt;
&lt;li&gt;Submit the configuration.&lt;/li&gt;
&lt;li&gt;Open the target log topic and edit the index.&lt;/li&gt;
&lt;li&gt;When logs arrive, use automatic index configuration and enable statistics for analysis fields.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjc8us08unbufh9jyo9va.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjc8us08unbufh9jyo9va.png" alt=" " width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Chinese UI in this screenshot is showing the CLB instance detail area. The highlighted control is the edit entry for configuring Layer 7 access logs on that specific instance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhdbfbak9tnm6tbdsrvfe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhdbfbak9tnm6tbdsrvfe.png" alt=" " width="800" height="264"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This modal translates to: &lt;code&gt;Enable CLS log service&lt;/code&gt; for the current CLB instance. The operator turns on the switch and confirms the setting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikli04asxev4giu8znlr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikli04asxev4giu8znlr.png" alt=" " width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This configuration page asks where the access logs should be delivered. In English: choose the logset, choose the log topic, and save. If the target does not exist yet, the source article points operators to the access-log page to create it first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdubaqn8fl96pigop5v92.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdubaqn8fl96pigop5v92.png" alt=" " width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After access logs start arriving, go to the CLS log topic and open index editing. Index configuration is required before the fields can be searched and analyzed efficiently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd43841zrb2q9ahlapznd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd43841zrb2q9ahlapznd.png" alt=" " width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The source recommends enabling statistics for all relevant fields. In the screenshot, the operator uses automatic field detection and enables the statistics switch so fields can be used in SQL aggregation and dashboard panels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Onboarding path 2: batch access through the dedicated CLB logset
&lt;/h2&gt;

&lt;p&gt;The source article also describes a batch onboarding path for creating a dedicated &lt;code&gt;clblog&lt;/code&gt; logset.&lt;/p&gt;

&lt;p&gt;Important source note: at the time described by the article, batch onboarding requires CLB product allowlist access before the entry is visible.&lt;/p&gt;

&lt;p&gt;Recommended topic design from the source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;separate topics by technical layer, such as HTTP layer, cache layer, or data layer;&lt;/li&gt;
&lt;li&gt;or separate topics by business dimension, such as finance, main site, or order business;&lt;/li&gt;
&lt;li&gt;remember that CLS can also act as a pipeline, so different topics may later be delivered to COS, CKafka, SCF, or other processing paths for archiving and downstream handling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvizi7d7ngzw3plyjlwg6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvizi7d7ngzw3plyjlwg6.png" alt=" " width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The highlighted Chinese label is &lt;code&gt;Access Logs&lt;/code&gt;. This is the batch configuration entry that opens the dedicated CLB logset setup page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xht5vz7b7kdh2w5t5le.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xht5vz7b7kdh2w5t5le.png" alt=" " width="800" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The UI text means: the CLB logset name is fixed, so the operator mainly chooses retention and creates a log topic. The source says the topic should be named according to the real business grouping.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frli0je8nshvv87kn3nsv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frli0je8nshvv87kn3nsv.png" alt=" " width="800" height="513"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This screenshot shows selecting CLB instances for the new topic. In English: choose the target load balancers, add them to the topic, and save the batch relationship.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwj8io48nn7bdr0hpab5f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwj8io48nn7bdr0hpab5f.png" alt=" " width="799" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final setup step is to save. The source article says the configuration takes about &lt;code&gt;5-10 minutes&lt;/code&gt; to become effective. After that, use the same index and statistics settings as the single-instance path.&lt;/p&gt;

&lt;h2&gt;
  
  
  CLB access-log field reference
&lt;/h2&gt;

&lt;p&gt;The original article lists the CLB log variables. The most operationally useful fields are:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stgw_request_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Request ID.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;time_local&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Access time and timezone, such as &lt;code&gt;01/Jul/2019:11:11:00 +0800&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;protocol_type&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Protocol type: HTTP, HTTPS, SPDY, HTTP2, WS, or WSS.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server_addr&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;CLB VIP.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server_port&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;CLB VPort, meaning listener port.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server_name&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Rule &lt;code&gt;server_name&lt;/code&gt;, the domain configured in the CLB listener.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;remote_addr&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Client IP address.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;remote_port&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;Client port.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;Status code returned from CLB to the client.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;upstream_addr&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Real server address.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;upstream_status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Status code returned from the real server to CLB.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Request line, including method, path, and protocol.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request_length&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;Bytes received from the client.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bytes_sent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;Bytes sent to the client.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;http_host&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Request domain from the HTTP &lt;code&gt;Host&lt;/code&gt; header.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;http_user_agent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;User-Agent header.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;http_referer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;HTTP request source.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;request_time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;double&lt;/td&gt;
&lt;td&gt;Total processing time from the first byte received from the client to the last byte sent back.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;upstream_response_time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;double&lt;/td&gt;
&lt;td&gt;Time spent on the backend request, from connecting to the real server until the response is fully received.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;upstream_connect_time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;double&lt;/td&gt;
&lt;td&gt;TCP connection time to the real server.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;upstream_header_time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;double&lt;/td&gt;
&lt;td&gt;Time from connecting to the real server until the response header is fully received.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tcpinfo_rtt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;TCP RTT.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ssl_handshake_time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;double&lt;/td&gt;
&lt;td&gt;SSL handshake time.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ssl_cipher&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;SSL cipher suite.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ssl_protocol&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;SSL protocol version.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;vip_vpcid&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;long&lt;/td&gt;
&lt;td&gt;VPC ID of the CLB VIP. For public CLB, the value is &lt;code&gt;-1&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;uri&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;Resource identifier.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;server_protocol&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;CLB protocol.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Search examples from the source
&lt;/h2&gt;

&lt;p&gt;Find a specific URL request where request time is greater than a threshold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;request:"HEAD /aaa/ HTTP/1.1" AND request_time:&amp;gt;0.005
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Find 4xx requests for a specific real server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;status:[400 TO 500} AND upstream_addr:"10.0.1.12:80"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Build dashboard panels from CLB logs
&lt;/h2&gt;

&lt;p&gt;The source article gives three dashboard query patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time dashboard: average request duration by CLB instance
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;HISTOGRAM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__TIMESTAMP__&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;MINUTE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"average request duration per CLB instance"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;server_addr&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;server_addr&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This panel is used to observe website response time in real time and identify which CLB instance is slowing down.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3czw4j4m6zcxak65pkcm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3czw4j4m6zcxak65pkcm.png" alt=" " width="800" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The screenshot shows a CLS analysis dashboard. In English, the top area is the time-series result, and the bottom area keeps log details available for drill-down after a spike is found.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1japheusmegqotvti0x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp1japheusmegqotvti0x.png" alt=" " width="800" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The highlighted Chinese operation is &lt;code&gt;Add to dashboard&lt;/code&gt;. After running an analysis query and selecting a chart type, the operator saves the chart into a reusable dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9trb0ywrjk2it15bb1n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw9trb0ywrjk2it15bb1n.png" alt=" " width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This screenshot is the time dashboard. It translates to: compare &lt;code&gt;request_time&lt;/code&gt; by &lt;code&gt;server_addr&lt;/code&gt; in one-minute buckets, so operators can quickly see whether latency is isolated to one CLB instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capacity dashboard: request count by real server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;HISTOGRAM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__TIMESTAMP__&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;MINUTE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nv"&gt;"requests per minute to each real server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;upstream_addr&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;upstream_addr&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This panel checks backend capacity distribution. If one &lt;code&gt;upstream_addr&lt;/code&gt; receives an unexpected share of requests, the team can review CLB rules or backend health.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9rsj9og1ejkpagbk8ra.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9rsj9og1ejkpagbk8ra.png" alt=" " width="800" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The screenshot shows multiple line charts. In English, each line represents a real server address and the count of requests it receives per minute.&lt;/p&gt;

&lt;h3&gt;
  
  
  Status-code dashboard: request count by CLB status
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;HISTOGRAM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__TIMESTAMP__&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;MINUTE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This panel tracks service health by status code. It is useful for separating client errors, backend errors, and normal traffic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbe3k07gogejahxklq3z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbe3k07gogejahxklq3z.png" alt=" " width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The screenshot combines bar and pie-style views. In English, it is grouping request volume by &lt;code&gt;status&lt;/code&gt; so operators can see whether error codes are rising.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxai1s90f85qwqccv18pg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxai1s90f85qwqccv18pg.png" alt=" " width="800" height="385"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dashboard screenshot shows the outcome of the previous steps: CLB access logs become operational panels rather than one-off search results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add real-time alerts from search-analysis results
&lt;/h2&gt;

&lt;p&gt;The source article closes with real-time alerting. CLS can create alert rules from flexible search-analysis queries, attach alert policies, and notify teams through channels such as WeChat, Enterprise WeChat, or webhook.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2rijl12nszz2qedh0jhw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2rijl12nszz2qedh0jhw.png" alt=" " width="800" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This screenshot translates to: define the alert query, set the trigger condition, configure scheduling and notification, and route the alert to the selected receiver. For CLB logs, useful conditions include abnormal status-code count, high &lt;code&gt;request_time&lt;/code&gt;, or unexpected changes in backend traffic distribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical CLB log playbook
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Enable CLB access logs either per instance or through batch onboarding.&lt;/li&gt;
&lt;li&gt;Configure the CLS index and enable statistics for fields used in aggregation.&lt;/li&gt;
&lt;li&gt;Start with search examples for URL latency and real-server 4xx requests.&lt;/li&gt;
&lt;li&gt;Build dashboards around &lt;code&gt;request_time&lt;/code&gt;, &lt;code&gt;upstream_addr&lt;/code&gt;, and &lt;code&gt;status&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Add alerts for latency spikes, backend error growth, and abnormal request distribution.&lt;/li&gt;
&lt;li&gt;Keep the raw log view near charts so every spike can be investigated with the exact request context.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>logging</category>
      <category>devops</category>
      <category>cloud</category>
      <category>observability</category>
    </item>
    <item>
      <title>Unify Tencent Cloud CLS Alert Notifications with Observability Templates</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Wed, 10 Jun 2026 07:20:26 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/unify-tencent-cloud-cls-alert-notifications-with-observability-templates-1iok</link>
      <guid>https://dev.to/tencentcloud-cls/unify-tencent-cloud-cls-alert-notifications-with-observability-templates-1iok</guid>
      <description>&lt;p&gt;Log alerts rarely live alone. In real operations, log monitoring, cloud product monitoring, application monitoring, and endpoint monitoring often need to notify the same teams. If every product maintains its own recipients, channels, rotations, and callbacks, alert operations become duplicated and easy to miss.&lt;/p&gt;

&lt;p&gt;The source article introduces a new CLS capability: Tencent Cloud Cloud Log Service alerts can now send notifications through Tencent Cloud Observability Platform notification templates. The practical change is simple but useful: CLS alert policies can reuse the same notification policy layer as cloud product monitoring, APM, and terminal performance monitoring.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxs2mwzwv5uk85he9s0xi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxs2mwzwv5uk85he9s0xi.png" alt=" " width="799" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why unify alert notification templates?
&lt;/h2&gt;

&lt;p&gt;The original article frames the problem as alert fragmentation. Separate notification settings across products can create three operational issues:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;What happens in practice&lt;/th&gt;
&lt;th&gt;What the new path changes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Repeated maintenance&lt;/td&gt;
&lt;td&gt;Teams configure recipients and channels in several products&lt;/td&gt;
&lt;td&gt;CLS can reuse Observability Platform templates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scattered alerts&lt;/td&gt;
&lt;td&gt;Log alerts and cloud-resource alerts are reviewed in different places&lt;/td&gt;
&lt;td&gt;Notification strategy becomes more consistent across products&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Missed escalation&lt;/td&gt;
&lt;td&gt;A channel or rotation may be updated in one product but not another&lt;/td&gt;
&lt;td&gt;Duty schedules, phone rotation, and callbacks can be managed centrally&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Capability map from the source article
&lt;/h2&gt;

&lt;p&gt;The source article highlights three capabilities.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Source-backed detail&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Unified configuration&lt;/td&gt;
&lt;td&gt;CLS alert policies can directly reuse Observability Platform notification templates. The same notification strategy can be shared with cloud product monitoring, APM, and endpoint performance monitoring alerts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-channel notification&lt;/td&gt;
&lt;td&gt;Supported channels include SMS, email, phone calls, WeChat, Enterprise WeChat, DingTalk, Feishu, Slack, PagerDuty, Teams, and custom callback APIs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Advanced alert handling&lt;/td&gt;
&lt;td&gt;The Observability Platform can provide duty schedules, phone notification rotation, and alert-message delivery to SCF. The source also mentions future support for alert convergence.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Configure CLS to use an Observability Platform template
&lt;/h2&gt;

&lt;p&gt;The usage flow is short:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In a CLS alert policy, choose &lt;code&gt;Observability Platform notification template&lt;/code&gt; as the notification method.&lt;/li&gt;
&lt;li&gt;Select an existing notification template that was already created in the Tencent Cloud Observability Platform.&lt;/li&gt;
&lt;li&gt;If no suitable template exists, create a new template.&lt;/li&gt;
&lt;li&gt;After alerts are delivered, review CLS alert history through &lt;code&gt;Alert Governance&lt;/code&gt; in the Observability Platform.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50e7w1hnsd52d0jl09yv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50e7w1hnsd52d0jl09yv.png" alt=" " width="799" height="524"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The screenshot above shows the CLS alert-policy configuration page. The highlighted area is the notification method. Instead of configuring a standalone CLS-only receiver, the policy uses the Observability Platform template type.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmy4xn1k64n6jv6vnco58.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmy4xn1k64n6jv6vnco58.png" alt=" " width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This screenshot shows the template-selection step. In English, the operator is choosing a previously created notification template from the Observability Platform and applying it to the CLS alert policy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5xupcyooychg4ssqjpa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5xupcyooychg4ssqjpa.png" alt=" " width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the existing list does not contain the right template, the dialog allows the operator to create one. The practical translation of this step is: define the receiver policy once, then attach it to CLS alerts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsu7e4tr36oqwtpu05434.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsu7e4tr36oqwtpu05434.png" alt=" " width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final screenshot shows the Observability Platform's &lt;code&gt;Alert Governance&lt;/code&gt; area. The source article says CLS alert history can be reviewed there, which gives operators one place to trace notification events after alerts fire.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this is the right pattern
&lt;/h2&gt;

&lt;p&gt;Use this integration when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;log alerts and cloud-resource alerts should notify the same team;&lt;/li&gt;
&lt;li&gt;a team already manages duty schedules or notification rotations in the Observability Platform;&lt;/li&gt;
&lt;li&gt;CLS alerts need channels such as Enterprise WeChat, Slack, PagerDuty, Teams, or custom callbacks;&lt;/li&gt;
&lt;li&gt;operations teams want alert history and governance to be reviewed in a central alert console.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep standalone CLS notification settings only when a log alert has an intentionally isolated audience or an independent callback path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Create or identify the Observability Platform notification template first.&lt;/li&gt;
&lt;li&gt;In the CLS alert policy, set the notification method to the Observability Platform template option.&lt;/li&gt;
&lt;li&gt;Select the existing template or create a new one from the policy configuration flow.&lt;/li&gt;
&lt;li&gt;Test that the expected channel receives the alert.&lt;/li&gt;
&lt;li&gt;Use Alert Governance to review historical CLS alert events after delivery.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What problem does this solve for CLS alerting?
&lt;/h3&gt;

&lt;p&gt;It reduces duplicated notification configuration and makes CLS alert delivery part of the same alert-management layer used by other Tencent Cloud observability products.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which notification channels are supported by the source article?
&lt;/h3&gt;

&lt;p&gt;The source lists SMS, email, phone calls, WeChat, Enterprise WeChat, DingTalk, Feishu, Slack, PagerDuty, Teams, and custom callback APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this replace alert rules?
&lt;/h3&gt;

&lt;p&gt;No. The source article describes notification delivery and governance. CLS still owns the log alert policy and alert condition; the Observability Platform template owns the notification route.&lt;/p&gt;

</description>
      <category>observability</category>
      <category>logging</category>
      <category>alerts</category>
      <category>devops</category>
    </item>
    <item>
      <title>How Tencent Cloud CLS Optimized Lucene for Time-Series Log Search</title>
      <dc:creator>Tencent Cloud -Cloud Log Service</dc:creator>
      <pubDate>Wed, 10 Jun 2026 07:08:51 +0000</pubDate>
      <link>https://dev.to/tencentcloud-cls/how-tencent-cloud-cls-optimized-lucene-for-time-series-log-search-1di8</link>
      <guid>https://dev.to/tencentcloud-cls/how-tencent-cloud-cls-optimized-lucene-for-time-series-log-search-1di8</guid>
      <description>&lt;p&gt;Log search looks like text search until every query includes a time range. That time predicate changes the problem. In a high-volume log platform, timestamps are high-cardinality values, and scanning timestamp ranges can dominate query latency.&lt;/p&gt;

&lt;p&gt;The source article describes Tencent Cloud CLS's time-series search engine, built on top of Lucene and accepted by VLDB 2022 under the paper title &lt;code&gt;TencentCLS: The Cloud Log Service with High Query Performances&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;According to the source article, the time-series search engine achieved nearly 40x improvement over a traditional search engine in massive log retrieval. It also reports 38x improvement for head queries, 24x for tail queries, and 7.6x for histogram queries in the paper-related experiment tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furakh8z1a2lrug9qldns.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Furakh8z1a2lrug9qldns.png" alt=" " width="800" height="484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cog0k7exwiq6si964yl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cog0k7exwiq6si964yl.png" alt=" " width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wmf5cttv6kwoig1yhz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wmf5cttv6kwoig1yhz7.png" alt=" " width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why timestamp range search is hard in Lucene-style indexes
&lt;/h2&gt;

&lt;p&gt;The source article starts with a typical log record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[2021-09-28 10:10:39T1234] [ip=192.168.1.1]
XXXXXXXX
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A log platform indexes the timestamp, attributes such as &lt;code&gt;ip&lt;/code&gt;, and tokenized text. A typical query specifies a time range:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;Select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;xxxx_index&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xxxx&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;timestmap&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="n"&gt;xxxx&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;timestmap&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;09&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;29&lt;/span&gt; &lt;span class="n"&gt;xxxx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lucene is strong at text search, but the article points out that timestamp range search is a high-cardinality numeric range problem. If a timestamp is stored at millisecond precision, one day contains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;24 * 60 * 60 * 1000 = 86,400,000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;possible timestamp values. At microsecond precision, the possible values are another 1000x larger.&lt;/p&gt;

&lt;p&gt;In an inverted index, a timestamp key maps to a posting list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;timestamp -&amp;gt; [docid1, docid2]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For an exact timestamp lookup, the search complexity is efficient. The source article describes normal search as &lt;code&gt;O(log(n))&lt;/code&gt;. But a one-day timestamp range may require scanning a massive number of timestamp terms. The article describes the high-cardinality range search complexity as &lt;code&gt;O(n)&lt;/code&gt;, where &lt;code&gt;n&lt;/code&gt; is the number of index terms.&lt;/p&gt;

&lt;p&gt;The source gives a concrete scale example: in a 10-billion-log index, the observed timestamp-index data can be around 30GB. Reading that at 100MB/s would take about 300 seconds just to load the index data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 1: order logs by timestamp
&lt;/h2&gt;

&lt;p&gt;The central design shift is to organize log data by timestamp order. In the old layout, timestamps are unordered, so the engine must handle many timestamp index terms for a range query. In the time-ordered layout, a time range can be reduced to endpoint handling.&lt;/p&gt;

&lt;p&gt;The source article states that this reduces the timestamp terms handled from hundreds of thousands or hundreds of millions down to two endpoints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funouve8813p4kxtz050p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funouve8813p4kxtz050p.png" alt=" " width="800" height="264"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7sd766m6uk3g0ewp5zcg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7sd766m6uk3g0ewp5zcg.png" alt=" " width="800" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 2: add a secondary index for disk access
&lt;/h2&gt;

&lt;p&gt;Simple binary search works well in memory, but the source article notes that it causes scattered disk access when applied to ordered column data. The solution is a secondary index that reduces disk access from dozens of operations to three.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmt0yspesit0nj1lwnquy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmt0yspesit0nj1lwnquy.png" alt=" " width="799" height="279"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccu8uqu03g3ymnh1ozap.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccu8uqu03g3ymnh1ozap.png" alt=" " width="799" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxa5u6mhov8o1fosryvey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxa5u6mhov8o1fosryvey.png" alt=" " width="800" height="203"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 3: make reverse search fast
&lt;/h2&gt;

&lt;p&gt;The source article says the original underlying iterators only supported one-way iteration. That is a problem for reverse chronological search: if the target data sits at the tail of a timestamp-ordered sequence, a one-way iterator must traverse all previous data first.&lt;/p&gt;

&lt;p&gt;CLS solves this with a reverse binary-search algorithm built on top of the one-way iterator. The article reports that iteration count drops from tens of thousands or hundreds of thousands to dozens, and the complexity changes from &lt;code&gt;O(n)&lt;/code&gt; to &lt;code&gt;O(logn * logn)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2tvlek1iwfnp9ozmcl50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2tvlek1iwfnp9ozmcl50.png" alt=" " width="798" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization 4: compute histograms from bucket boundaries
&lt;/h2&gt;

&lt;p&gt;Histogram is one of the most common log-analysis operations. The source article says the original system computed histograms by reading timestamps back for every matching log, producing tens of thousands or hundreds of thousands of back-table lookups.&lt;/p&gt;

&lt;p&gt;The optimized approach uses bucket boundaries to determine log-ID ranges. Instead of fetching timestamps for every matched log, the engine performs a few index accesses to find boundaries, then assigns internal points by comparing them with the bucket limits. The secondary index is also used here to reduce disk access for boundary lookups.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1s9wubhpac0lmreorca.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1s9wubhpac0lmreorca.png" alt=" " width="799" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9q8v24hlgcoy0w2hb0gg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9q8v24hlgcoy0w2hb0gg.png" alt=" " width="799" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Reported performance results
&lt;/h2&gt;

&lt;p&gt;The source article reports several performance results:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test context&lt;/th&gt;
&lt;th&gt;Source-reported result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Paper experiment, head query&lt;/td&gt;
&lt;td&gt;38x improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Paper experiment, tail query&lt;/td&gt;
&lt;td&gt;24x improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Paper experiment, histogram query&lt;/td&gt;
&lt;td&gt;7.6x improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Offline prototype test on 8 million rows, 100 concurrent requests&lt;/td&gt;
&lt;td&gt;50x response improvement, &lt;code&gt;1.059s&lt;/code&gt; vs &lt;code&gt;56.9s&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrency under sub-second response target&lt;/td&gt;
&lt;td&gt;90+ vs 4, a 20x improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Online testing with writes present&lt;/td&gt;
&lt;td&gt;Core operations were more than 10x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold-data scenario&lt;/td&gt;
&lt;td&gt;Core operation response speed improved by 10x+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F09g55h2wm2xy27e9pmql.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F09g55h2wm2xy27e9pmql.png" alt=" " width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4m9zyyjc8jg0c2x8528.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4m9zyyjc8jg0c2x8528.png" alt=" " width="799" height="545"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The article also notes that IO jitter had to be optimized before online testing, because a 2-3 second long-tail jitter is less visible when the original query takes more than 10 seconds, but severely distorts results when the optimized query runs in hundreds of milliseconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison with a Lucene-based cloud log service
&lt;/h2&gt;

&lt;p&gt;The source article compares CLS with another cloud log service in a one-billion-row scenario. It explains the difference through timestamp granularity:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System design&lt;/th&gt;
&lt;th&gt;Timestamp index implication from the source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Minute-level index&lt;/td&gt;
&lt;td&gt;One day has &lt;code&gt;24 * 60 = 1440&lt;/code&gt; index terms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLS microsecond-level index&lt;/td&gt;
&lt;td&gt;One day can theoretically have &lt;code&gt;24 * 60 * 60 * 1000 * 1000 = 86,400,000,000&lt;/code&gt; timestamp values&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The source states that CLS previously used millisecond timestamps and moved to microsecond timestamps after the new index went online. The time-series index is the reason CLS can support high-cardinality timestamp retrieval while maintaining performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engineering takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Log search is not only text search; the time-range predicate can dominate the query plan.&lt;/li&gt;
&lt;li&gt;High-cardinality timestamp fields are expensive when a range query must scan many index terms.&lt;/li&gt;
&lt;li&gt;Ordering logs by timestamp changes range search from many-term processing to endpoint processing.&lt;/li&gt;
&lt;li&gt;Secondary indexes matter when a theoretically efficient binary search would otherwise produce scattered disk reads.&lt;/li&gt;
&lt;li&gt;Reverse chronological queries and histograms need specialized handling, because they are common in real log troubleshooting.&lt;/li&gt;
&lt;li&gt;The source article's reported gains come from combining data layout, secondary indexing, reverse access, histogram boundary lookup, and IO jitter optimization.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>lucene</category>
      <category>logging</category>
      <category>database</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
