DEV Community

Riya Mittal
Riya Mittal

Posted on

Kubernetes Admission Controllers Block Oversized Pods Before They Drain Your Budget

Kubernetes Admission Controllers Block Oversized Pods Before They Drain Your Budget

A pod with no CPU limit can consume every core on a 32-core node. It will pass your linter, pass your code review, and pass your CI pipeline. The first time you see it is on the cloud bill, three weeks after it deployed. Admission controllers fix this at the source.

OPA Gatekeeper and Kyverno sit inside the Kubernetes API server request path. They evaluate every create and update request against a set of policies before the object reaches etcd. A pod that violates a policy never gets scheduled. No compute consumed, no overspend, no post-incident cleanup.

The Pod That Ate Your Budget Passed Every Code Review

Cost problems in Kubernetes enter through three gaps: missing resource limits, missing cost allocation labels, and unpinned image tags. None of these trigger a compilation error. None fail a unit test. All three show up in your FinOps review.

Missing CPU and memory limits are the most expensive gap. A pod without a CPU limit runs in the Burstable or BestEffort QoS class, meaning the scheduler places it on a node without guaranteeing isolation. During a traffic spike, that pod expands to fill available capacity. We measured a single misconfigured batch job consume 28 of 32 cores on a shared node for six hours, costing $14,000 in a single incident on a cluster that was otherwise well-managed.

diagram

Missing cost labels compound over time. Without team, cost-center, and environment labels on every workload, 40 to 60% of your Kubernetes spend becomes unattributable. Chargeback and showback reporting breaks down when the underlying objects lack ownership metadata. Six months of unlabeled pods means six months of spend that cannot be allocated to a team budget or a product line.

Unpinned image tags introduce a different risk. Images tagged latest bypass reproducible build pipelines. The image running in production today may not be the image that runs after the next node restart. Snyk's 2023 container report found that 1 in 4 latest-tagged production images contained at least one unpatched critical CVE, because teams had no mechanism to detect when the base image changed under them.

What Admission Controllers Actually Intercept

Kubernetes has two admission webhook types. Mutating webhooks run first and can modify the incoming object. Validating webhooks run second and can only approve or reject. For cost governance, you use both.

A mutating webhook injects default resource requests when a developer omits them. This is the safe fallback: instead of rejecting a pod with no resource spec, you inject a sane default and let it through. The validating webhook then checks that the injected or explicitly set values fall within policy bounds.

The sequence matters. Mutating before validating means developers with missing specs get defaults, not rejections. Developers who explicitly request 64 CPU cores get a rejection with a clear error message explaining the limit. This distinction reduces noise tickets while still enforcing ceilings.

Admission webhook latency is under 10ms for most policies at production scale. After a pod starts, the webhook has zero runtime overhead. The cost checkpoint runs once at admission, not on every pod heartbeat.

Three Policies That Pay for Themselves

These three policies cover the most common sources of Kubernetes cost waste. Each can be implemented in OPA Gatekeeper or Kyverno. Kyverno requires 60 to 70% fewer lines of configuration for the same rule, making it faster to adopt for teams new to policy engines.

Policy What It Blocks Cost Impact Per Violation Implementation Effort
Resource limit ceiling CPU requests above 4 cores, memory above 8Gi per container $300-$2,000/month per violation Low
Required cost labels Pods missing team, cost-center, environment labels Unattributable spend, chargeback failure Low
No latest image tag Containers using unpinned or :latest tags Audit and remediation cost, CVE exposure Low

Resource limit ceiling. Set the ceiling at 4x your p99 observed usage for the workload type. For a typical API service with p99 CPU usage of 0.5 cores, the ceiling is 2 cores. This blocks outlier requests without rejecting legitimate high-memory workloads like Spark jobs, which you handle with a separate policy namespace. Right-sizing EKS node groups and admission ceiling policies work together: the ceiling prevents individual pods from defeating the right-sizing work at the node level.

Required cost labels. The policy rejects any pod that does not carry all three labels: team, cost-center, and environment. The error message should include a link to the label documentation and the onboarding guide. Teams that implement tag governance at discovery time rather than at cleanup time reduce unattributed spend by 40% within 90 days.

No latest image tag. The policy checks the image field of each container spec and rejects any value ending in :latest or containing no tag at all. Untagged images default to latest in most container runtimes. The fix for developers is one line: pin the image to a SHA256 digest or a versioned tag. Cloud governance RBAC tooling enforces who can override this policy in specific namespaces for legitimate use cases.

diagram

Rollout Without Breaking Production

Deploying admission policies to a running cluster requires a phased rollout. Skipping phases is how platform teams create P1 incidents.

The Deploy-Time Cost Governance rollout has three phases: audit, warn, enforce.

diagram

In audit mode, the policy runs but never rejects. Every violation is logged to the policy engine's audit log. Run audit mode for two weeks. At the end of week two, you have a complete list of every object in the cluster that would be rejected under enforcement. This is your blast radius.

In warn mode, the API server admits the object but annotates it with the policy violation. Developers see the warning in their deployment output. Most teams fix violations proactively when the warning appears, before enforcement starts. CPU throttling patterns surface in this phase for workloads that were previously unconstrained.

In enforce mode, violations are rejected. The error message must include the policy name, the specific violation, and a link to the fix. A rejection with a clear error message takes a developer 5 minutes to fix. A rejection with a cryptic error message creates a support ticket.

Measuring the Financial Return

The Deploy-Time Cost Governance Scorecard tracks three numbers before and 90 days after enforcement begins.

Metric Baseline (Pre-Enforcement) 90-Day Target
Unattributed Kubernetes spend 45-60% of total Under 15%
Workloads exceeding resource ceiling 8-12% of pods Under 1%
Workloads using latest image tag 15-25% of containers Under 2%
Wasted compute (idle reserved capacity) Measured at baseline 23-37% reduction

The unattributed spend metric is the most important for FinOps teams. Before enforcement, label violations accumulate silently. After enforcement, every new workload carries ownership metadata, and the unattributed percentage drops steadily as old unlabeled workloads are replaced or updated.

Wasted compute reduction averages 23% within 90 days across clusters that enforce resource ceilings. The mechanism is direct: pods that previously consumed 8 cores with no limit now run within a 4-core ceiling, releasing capacity that the autoscaler no longer needs to provision. Autonomous cloud cost remediation can act on these signals automatically once the policy layer provides clean, labeled cost data.

The ceiling policy works because it forces the conversation about resource requirements to happen before deployment rather than during incident response. A developer who requests 16 cores for a new service has to justify it to the platform team at review time, not to the finance team three months later when the bill arrives.

Top comments (0)