DEV Community

Falolu Olaitan
Falolu Olaitan

Posted on

Why Your AKS Pods Keep Getting OOMKilled Even When CPU Looks Fine

Introduction

One of the most misleading situations in Kubernetes is when a pod keeps restarting because of an OOMKilled event while CPU utilization looks perfectly healthy.

I have seen engineers spend hours investigating CPU throttling, autoscaling, node capacity, and even networking, only to discover later that memory was the actual problem.

The reality is that Kubernetes treats CPU and memory very differently. CPU can be throttled. Memory cannot. Once memory is exhausted, Kubernetes has no choice but to terminate the container.

Understanding why this happens is critical for running production workloads reliably.


Understanding OOMKilled

OOM stands for Out Of Memory.

When a container exceeds its allocated memory limit, the Linux kernel invokes the Out Of Memory Killer and terminates the process consuming memory.

From Kubernetes' perspective, the container exits unexpectedly and the pod enters a restart cycle.

You will typically see something similar to:

kubectl describe pod payment-api-5f4d7d8d9f-xqk2r
Enter fullscreen mode Exit fullscreen mode

Output:

Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Enter fullscreen mode Exit fullscreen mode

Exit code 137 is usually the first indication that memory exhaustion caused the restart.


Why CPU Looks Healthy

Many teams monitor CPU aggressively while paying little attention to memory consumption.

Consider this example:

resources:
  requests:
    cpu: 250m
    memory: 512Mi
  limits:
    cpu: 500m
    memory: 1Gi
Enter fullscreen mode Exit fullscreen mode

Application metrics show:

CPU Usage: 120m
Memory Usage: 1.1Gi
Enter fullscreen mode Exit fullscreen mode

CPU utilization appears healthy.

However memory has exceeded the configured limit.

The container gets terminated immediately.

The result is:

CPU Fine
Memory Exhausted
Container Killed
Enter fullscreen mode Exit fullscreen mode

This is why relying solely on CPU dashboards often leads engineers in the wrong direction.


Requests and Limits Are Not the Same Thing

One of the most common misunderstandings in Kubernetes is confusing requests with limits.

Requests

Requests determine scheduling.

requests:
  memory: 512Mi
Enter fullscreen mode Exit fullscreen mode

Kubernetes uses this value when deciding where to place the pod.

Limits

Limits determine maximum consumption.

limits:
  memory: 1Gi
Enter fullscreen mode Exit fullscreen mode

Once memory exceeds this value, Kubernetes terminates the container.

Think of requests as reservation and limits as a hard wall.

Cross the wall and the container dies.


How to Confirm an OOMKill

Start with:

kubectl get pods
Enter fullscreen mode Exit fullscreen mode

You may see:

CrashLoopBackOff
Enter fullscreen mode Exit fullscreen mode

Then inspect the pod:

kubectl describe pod <pod-name>
Enter fullscreen mode Exit fullscreen mode

Look for:

Reason: OOMKilled
Enter fullscreen mode Exit fullscreen mode

You can also check previous logs:

kubectl logs <pod-name> --previous
Enter fullscreen mode Exit fullscreen mode

This is useful because the current container may already have restarted.


Investigating Memory Consumption

Check actual consumption:

kubectl top pod
Enter fullscreen mode Exit fullscreen mode

Example:

NAME                 CPU     MEMORY
payment-api          90m     1050Mi
Enter fullscreen mode Exit fullscreen mode

If the limit is:

memory: 1024Mi
Enter fullscreen mode Exit fullscreen mode

The container will eventually be terminated.

Also inspect node utilization:

kubectl top node
Enter fullscreen mode Exit fullscreen mode

This helps determine whether the issue is isolated to the workload or affecting the entire node.


Common Causes of OOMKilled Events

Memory Leaks

Applications continuously allocate memory but never release it.

Typical examples:

  • Unclosed database connections
  • Large object caching
  • Static collections
  • Long-running background workers

The memory graph steadily increases until the limit is reached.


Large Payload Processing

Applications processing large files often experience memory spikes.

Examples:

  • PDF generation
  • Image manipulation
  • Bulk imports
  • Report generation

The workload may run successfully hundreds of times before encountering a payload large enough to trigger an OOMKill.


Incorrect Limits

Sometimes the application simply requires more memory than allocated.

For example:

limits:
  memory: 512Mi
Enter fullscreen mode Exit fullscreen mode

while production usage averages:

750Mi
Enter fullscreen mode Exit fullscreen mode

In this case Kubernetes is behaving exactly as configured.

The configuration is wrong.


.NET Applications

Many modern .NET applications can consume significant memory under load.

Common contributors include:

  • Large object heap growth
  • Heavy caching
  • Excessive serialization
  • Background processing

The application may perform perfectly in development but fail under production traffic.


Why Increasing Memory Is Not Always the Fix

The immediate reaction is usually:

limits:
  memory: 2Gi
Enter fullscreen mode Exit fullscreen mode

Problem solved.

Or maybe not.

If a memory leak exists, the application will eventually consume:

2Gi
3Gi
4Gi
Enter fullscreen mode Exit fullscreen mode

and fail again.

Increasing limits without understanding consumption patterns only delays the problem.

Always determine whether memory growth is expected or abnormal.


Monitoring OOMKills in AKS

Container Insights provides visibility into:

  • Memory trends
  • Pod restarts
  • Node pressure
  • Container consumption

Useful Kusto query:

KubePodInventory
| where ContainerStatusReason == "OOMKilled"
| project TimeGenerated, Namespace, PodName, ContainerName
| order by TimeGenerated desc
Enter fullscreen mode Exit fullscreen mode

This helps identify recurring offenders before they become production incidents.


Preventing OOMKilled Events

Right-Size Resources

Avoid guessing.

Measure actual workload consumption.

Use production metrics to determine realistic values.


Configure Horizontal Pod Autoscaler

Scaling based on memory can help distribute workload.

Example:

targetAverageUtilization: 70
Enter fullscreen mode Exit fullscreen mode

However remember that autoscaling cannot fix memory leaks.


Implement Resource Governance

Every workload should define:

resources:
  requests:
  limits:
Enter fullscreen mode Exit fullscreen mode

Running without limits can allow a single application to consume excessive node memory and affect other workloads.


Perform Load Testing

Many memory-related issues only appear under production-like traffic.

Load testing reveals:

  • Memory spikes
  • Allocation patterns
  • Scaling behaviour

before customers encounter them.


Final Thoughts

When a pod is OOMKilled, Kubernetes is usually not the problem.

The platform is enforcing the limits you defined.

The real challenge is understanding why the application exceeded those limits.

Before increasing memory allocations, determine whether the issue is caused by workload growth, configuration mistakes, or application behaviour.

The most effective troubleshooting process is simple:

  1. Confirm the OOMKilled event.
  2. Measure actual memory consumption.
  3. Compare usage against configured limits.
  4. Identify memory growth patterns.
  5. Fix the root cause before increasing resources.

In production Kubernetes environments, memory issues are often harder to diagnose than CPU issues, but they are also among the most common causes of unexpected application restarts. Understanding how Kubernetes manages memory is one of the most valuable skills a platform engineer can develop.

Top comments (0)