Tencent Cloud -Cloud Log Service

Posted on Jun 15

Troubleshooting Kubernetes Events with TKE and Tencent Cloud CLS

#kubernetes #logging #devops #observability

Troubleshooting Kubernetes Events with TKE and Tencent Cloud CLS

Cluster problems rarely appear from nowhere. Before a service outage becomes visible, Kubernetes often records smaller state changes: node pressure, Pod scheduling, Pod eviction, and cluster autoscaler decisions.

Tencent Kubernetes Engine can send those Events into Tencent Cloud CLS, where they become searchable logs and dashboard data. This gives operators a central way to answer what changed, when it changed, which object was involved, and which component reported it.

What an Event tells you

Kubernetes Events describe state transitions. The useful fields are:

Field	What to look for
`Type`	`Normal`, `Warning`, or a custom type.
`Involved Object`	Pod, Deployment, Node, or another Kubernetes object.
`Source`	Component such as Scheduler or Kubelet.
`Reason`	Short reason enum.
`Message`	Detailed explanation.
`Count`	How many times it happened.

The core flow is: Kubernetes emits a state-change record, CLS stores it as a log event, and the operator filters by object, component, reason, message, count, and timestamp.

Open Event Search

In TKE, go to Cluster Operations -> Event Search. CLS provides collection, storage, search, analysis, and dashboards for the event stream.

Use the overview when you need warning distribution, affected object types, and event trends. Use global search when you already know the component or object name and need a row-level timeline.

Runbook 1: an abnormal node

Filter by the abnormal node name in the event overview. In this example, the result included a node disk-space warning.

The timeline showed that on 2020-11-25, node 172.16.18.13 became abnormal because disk space was insufficient. Kubelet then tried to evict Pods from the node to reclaim disk space.

That sequence gives you a clean next step: check node disk usage, eviction thresholds, and workload placement before treating it as a generic application failure.

Runbook 2: autoscaler expansion

For node pool autoscaling, query the autoscaler component:

event.source.component:"cluster-autoscaler"

Display these fields:

event.reason
event.message
event.involvedObject.name

Sort by log time descending. The result should work like a compact ledger of autoscaler decisions: workload object, reason, message, and the timestamp of each scaling step.

The event stream showed scale-out around 2020-11-25 20:35:45, triggered by three nginx Pods:

nginx-5dbf784b68-tq8rd
nginx-5dbf784b68-fpvbx
nginx-5dbf784b68-v9jv5

Three nodes were added. Later scale-out did not continue because the node pool had reached its maximum node count.

Checklist

Use Events to understand state changes, not only current state.
Start with overview dashboards, then filter by object name.
For node issues, inspect reason, message, source component, and count.
For autoscaling, query cluster-autoscaler and reconstruct the event timeline.
Use metrics and logs after Events point you to the right object and time window.

FAQ

Why not only use `kubectl describe`?

kubectl describe is useful for one object. CLS is better when you need searchable history, dashboards, and cross-object analysis.

What is the fastest autoscaler query?

Start with event.source.component:"cluster-autoscaler" and sort by log time descending.

DEV Community

Troubleshooting Kubernetes Events with TKE and Tencent Cloud CLS

Troubleshooting Kubernetes Events with TKE and Tencent Cloud CLS

What an Event tells you

Open Event Search

Runbook 1: an abnormal node

Runbook 2: autoscaler expansion

Checklist

FAQ

Why not only use `kubectl describe`?

What is the fastest autoscaler query?

Top comments (0)

Troubleshooting Kubernetes Events with TKE and Tencent Cloud CLS

What an Event tells you

Open Event Search

Runbook 1: an abnormal node

Runbook 2: autoscaler expansion

Checklist

FAQ

Why not only use kubectl describe?

What is the fastest autoscaler query?

Why not only use `kubectl describe`?