Ivan Porta for Todea

Posted on May 19 • Originally published at todea.co.kr

Kubecost Explained: Kubernetes FinOps That Moves the Bill

#kubecost #finops #kubernetes #sre

Most platform teams are familiar with Kubernetes costs. The monthly cloud bill arrives, finance asks why it’s higher, and engineering can only respond with “more workloads.” This gap between what the bill shows and what platform teams can explain is exactly what FinOps aims to address to help optimize operational costs. The real question is whether you need an enterprise platform for this. For most teams, the answer is no. OpenCost and Kubecost can give platform teams the visibility they need, as long as the tool is paired with an operating cadence.

The pressure is real

Kubernetes accounting is no longer just about a single cluster or a single cloud. Most teams now manage fleets of clusters, often across multiple cloud providers, and sometimes combine on-premises control planes with managed services. Containers move between nodes, nodes move between zones, and the same workload might run in several regions to meet latency or compliance needs, making the cost attribution even harder.

Industry data has shown the same problem for years. Back in 2021, a CNCF FinOps survey found that most teams couldn’t reliably measure their Kubernetes spending, with over-provisioning and lack of accountability as the main issues. The same story happened in 2025, with a fleet-telemetry benchmark using real cluster data from over 2,100 organizations on AWS, GCP, and Azure, showing average CPU utilization at 10% and memory utilization at 23%. These numbers come from production telemetry, not just survey responses. The problem hasn’t changed in four years; if anything, it’s become clearer as fleets have grown.

The real issue isn’t that Kubernetes is too expensive. It’s that most teams don’t have enough visibility into what they’re spending.

What Kubecost actually is

Kubecost is a Kubernetes platform for cost allocation, optimization, and governance. After IBM acquired it in 2024, Kubecost now offers both open-source and enterprise versions. At its core, it sits on top of OpenCost, and runs as an in-cluster agent stack with several microservices: data collectors, a cloud-cost ingestor, a forecasting service, a per-node network-costs DaemonSet, and a fast ClickHouse-backed aggregator. By combining Kubernetes telemetry with cloud-provider billing data, Kubecost provides detailed cost allocation views, and accuracy improves when cloud billing integrations are configured and reconciled.

The platform is built around three primary pillars:

Cost allocation rolls spend up by namespace, label, service, workload, or Collection. Kubecost v3 adds a grouping concept that bundles Kubernetes and external cloud-side costs into a single deduplicated unit. Costs are tracked across CPU, memory, persistent volumes, GPUs, and network traffic.
Optimization recommendations suggest right-sized requests based on real usage, propose cheaper node types, and let teams configure quantile-based controls instead of accepting a one-size default. Recommendations can be archived for historical reference and exported as CSV or PDF.
Alerts and governance ship as configurable budget actions, scheduled reports, and Slack/email notifications. Alerts live next to the budget they belong to rather than as a separate alerting subsystem.

How a Kubecost install flows

Cost data collection begins inside your clusters. When it starts, the FinOps agent sets up a watch on the Kubernetes API for the pricing ConfigMap, so any custom pricing rules can be applied without a restart. It also resolves node pricing per node (falling back to a computed value when the cloud provider's price is unavailable). It then collects metrics from each Network Costs DaemonSet pod, creates a new binary snapshot, and, if configured, writes it to external storage such as Azure Blob Storage or AWS S3.

The Network Costs DaemonSet subscribes to the kernel's conntrack table via a netlink socket, parsing each flow per-direction byte and packet counters, and maintains an in-memory map of pod, node, service, and endpoint state via Kubernetes API watches. It uses this map to link observed connections to specific workloads.

While these agents send internal snapshots to shared storage, the Cloud Cost Ingestor manages external financial data. It runs on a schedule, connects to cloud provider billing exports, pulls daily CSVs, and backfills historical data. Because cloud providers release billing data with a several-hour delay, Kubecost reconciles cluster data with cloud billing after a short wait. This means the most recent day or two of cost data is only an estimate, while older data is fully reconciled (assuming a working cloud-billing integration).

The Aggregator is the main engine and uses an embedded ClickHouse database. In a multi-cluster deployment, it fans in snapshots from several agent clusters. It checks multiple ConfigMaps for configuration and falls back to defaults when none are present. It ingests agent snapshots and, when configured, external billing CSVs, then drives them through a multi-stage SQL pipeline that reconciles and de-duplicates overlapping costs and produces the final cost tables that other microservices consume. The Aggregator also manages data retention by setting per-table, per-resolution TTLs in ClickHouse, so fine-grained windows expire within days while rollups are kept for weeks or months.

Finally, the Forecasting Service serves as a predictive cost-monitoring tool, using this data to generate cost forecasts. At the same time, the Cluster Controller uses the Aggregator’s optimization insights to take actions, such as applying right-sizing recommendations directly in the cluster.

Allocate to a real owner first

Kubecost spreads node costs across the pods running on each node, typically weighted by resource requests, and rolls the result up by namespace, label, service, workload, or Collection, covering CPU, memory, PV, GPU, and network.

What makes this approach effective is good organization, not just technical setup. Each namespace should match a real owner, like a team, product, or department. When this mapping is in place, the allocation view shows which team is responsible for each cost, without needing a spreadsheet. Teams that already use labels like team, cost-center, or product can use these for the same purpose, and Collections help make label-based views easy to use.

Labels can change over time, namespaces can increase, and someone will eventually deploy into default. Reviewing the unlabeled bucket each week helps keep the data accurate and useful.

Right-size requests against actual usage

This is the lever that moves the bill the most. Often, developers set CPU and memory requests defensively, never revisit them, and the gap between request and use shows up directly on the invoice. The 10%-CPU / 23%-memory benchmark cited above is a useful authority anchor when finance asks for a number.

A pattern that reliably finds savings: plot requested vs. actual CPU and memory per workload over a few weeks, then walk each workload's request down to what it actually uses. One practical case from a service mesh deployment had proxy sidecars set to 100 millicores each. The node could host roughly 200 pods on paper, but the scheduler exhausted allocatable CPU at around 90 pods because every pod carried a 100 millicores sidecar request on top of its own. After the request-rightsizing pass, pod density per node tripled with no application change and no node fleet change.

Kubecost 3.0 makes this loop tighter. Container Request Sizing Insights show usage visualizations directly in the UI, recommendations can be archived, CSV/PDF exports include labels, and quantile-based controls let you set tighter recommendation percentiles for predictable services and looser ones for bursty workloads. The enterprise tier adds an Automated Container Request Sizing UI that operates across clusters with custom profiles, suspension controls, audit history, and a comparison between recommended and realized savings; the open-source tier gets a free allowance up to 250 cores on EKS primary clusters.

Using a percentile-based recommendation policy is usually the most effective approach in production.

Set the CPU request to the 90th percentile of actual CPU usage from the past week or month, and then add a safety margin. The Kubernetes VPA default is about 15% for CPU. Because CPU is time-shared, the kernel lets bursty workloads use extra capacity when it is available. Adding more padding for rare spikes usually just increases requests without much benefit.
Set the memory request to a high percentile of peak usage, and then add a safety margin. Memory is not time-shared, so going over the limit can cause OOM kills instead of graceful degradation. Aim for about the 90th percentile of peaks, then add a margin. The VPA default is about 20% for memory.

These settings decide the QoS class. A pod is considered Guaranteed only if every container has CPU and memory requests set equal to their limits. If this is not the case for any container, the whole pod is treated as Burstable, or as BestEffort if no container sets any requests or limits. This setup works well for most workloads. Reserve the fully Guaranteed class for critical workloads with strict latency SLAs, where CPU and memory requests match their limits. In those cases, you might waste some headroom, but you get the best eviction protection and, if needed, exclusive CPU pinning.

If you want an additional feedback before automating any of this, you can run VPA in recommendation-only mode for several weeks, or use KRR open-source. Comparing recommendations across KRR, VPA-recommendation-mode, and Kubecost is more reliable than trusting any single tool's number.

Capacity-versus-request as the North Star metric

The ratio that tells you most about cluster efficiency is total pod requests ÷ total node allocatable capacity, across CPU and memory: how much of what you pay for is even claimed by a pod. It is also what Kubecost's request right-sizing is built on. Kubecost ships with several targets against which recommendations are computed: Production 0.65, Development 0.80, High Availability 0.50 (Cluster Right-Sizing API). Below it you carry capacity nothing asks for; above it you've spent the headroom that cluster class should keep. Kubecost also picks the utilization it sizes against by context; development the trending 85th-percentile, production the 98th, HA the 99.9th; and only on a one-day window; longer windows use maximum usage. The often-quoted "85th percentile" is just the development one-day default, not a universal setting.

The ratio is what you watch; an autoscaler (Karpenter, Cluster Autoscaler) moves it — but only if requests are honest, which is why Kubecost's request right-sizing sits upstream of any autoscaling story. The autoscaler reacts to requests; Kubecost tells you whether they reflect reality.

Recent days are directional by design: reconciliation needs a full day of billing data, so for a roughly 48-hour window costs stay at public on-demand pricing unless a node is provably not on-demand; Spot is accurate sooner only through a separately configured AWS Spot data feed (Cloud Billing Integrations). Read efficiency — independent of reconciled pricing — separately from cost.
Node Group Sizing — formerly Cluster Right-Sizing, rebuilt in v3.0 — turns this into an action: it analyzes in-cluster CPU, RAM, and GPU utilization against node capacity over a configurable window and recommends, per node group, changing the node count or switching the instance type. It runs from a preset profile or a custom metric — usage.max/p95/p85/avg or request.max/avg — with a target-utilization threshold per resource, never below average requested resources. It detects node groups by each provider's standard label, so it works across EKS, AKS, and GKE without setup (v3.x docs).

Find the always-on workloads that don't need to be

Once you’ve handled allocation and right-sizing, look at workloads that run all day, every day, even when they don’t have to. In one platform team’s review, 31% of workloads used less than 25% CPU for almost the entire day, yet Kubernetes costs still went up by about 18% over the year. This happened because engineers spent a lot of time tuning capacity and dealing with alerts, and because each team set up its own autoscaling rules differently.

The triage falls into three buckets. Production services that are genuinely over-spec’d belong in the right-sizing loop above. Non-production environments — dev, integration, demo — rarely need to run on weekends or overnight; a scheduled scale-to-zero is the highest-ROI change in this category. Batch and stateless workloads with retry tolerance are candidates for Spot instances, which trade roughly a 90% discount for a two-minute interruption notice.

Kubecost helps you find underused workloads. With Kubecost 3.0’s Advanced Filters, you can quickly sort workloads by namespace, label, or service using AND/OR conditions right in the UI, instead of having to do it elsewhere.

Track commitment coverage and utilization separately

Reserved capacity and savings commitments are common sources of unnecessary cloud costs. Teams often either ignore them and pay full on-demand prices, or buy them and forget to check if they are being used, leaving discounts unused on resources that are no longer needed. There are two important metrics to watch, and they are easy to mix up:

Coverage means the portion of your regular usage that is protected by a commitment.
Utilization is how much of your commitment you actually use.

Each cloud provider has different tools, but they all fit into three main types, and the calculations work the same way everywhere:

Mechanism	Typical max discount vs on-demand	Commitment
Flexible spend commitment	~60–66%	1 or 3 yr; hourly $ commitment; applies across families/regions
Instance-specific reservation	~55–72%	1 or 3 yr; locked to a region + instance family/SKU
Spot / preemptible	up to ~90%	none; interruption notice from ~30 sec to ~2 min

A good approach is to aim for commitment utilization between 80% and 95%, instead of trying to reach 100%. Going for 100% leaves no room for normal changes, like removing unused instances, changing instance types, or handling a drop in traffic. It may look efficient in a quarterly review, but it can cause problems day-to-day. For coverage, aiming for 60% to 75% is reasonable. This range is high enough to get a good discount, but low enough to allow for changes each quarter. These ranges are based on practical experience, not rules set by the cloud provider.

With Kubecost, costs are first estimated using public on-demand cloud provider prices until the actual cloud bill is ready. When the bill becomes available, usually within about 48 hours, Kubecost updates its estimates with the real costs. This update includes Reserved Instances, Savings Plans, committed-use discounts, and Spot pricing, along with any special rates you might have, such as Enterprise Discount Programs.

When to use a commercial FinOps platform instead

Most teams should start with the open-source chart. You can look at the commercial tiers later, once your needs grow.

Capability / Feature	Open-source Kubecost	Commercial FinOps platform
Cost allocation (namespace, label, workload)	✓	✓
Optimization recommendations (right-sizing)	✓ (manual application)	✓ + automated application across clusters
Cloud-billing reconciliation	✓ (basic)	✓ + EDP / RI / custom-discount aware
Multi-cluster aggregation	Manual / federation	✓ (built-in)
SSO, RBAC, audit log	Limited	✓
History retention	Limited by your storage layer	Long-term, vendor-managed
Collections (cloud + K8s dedup)	✓ (3.x)	✓
Automated Container Request Sizing UI	✕ (free tier limited)	✓
Quantile-based recommendation controls	✓ (3.x)	✓
Advanced filters (AND/OR)	✓ (3.x)	✓
Support / SLA	Community	Vendor SLA

If you just need per-namespace allocation, basic recommendations, and Slack alerts for a few clusters, the open-source version is enough. But if you manage many clusters across different clouds, need automated fixes, want vendor-managed history, or need to give your finance team detailed, reconciled discount numbers, then a commercial platform is worth it. The decision should be based on these needs, not just on how the dashboard looks.

Operational reality

ClickHouse and a unified agent replace the old stack. In v3, the 2.x DuckDB store is replaced with a ClickHouse database. This change makes allocation and cloud-cost API queries much faster and more reliable at scale. It also removes the need for Prometheus, which cuts down on memory use and makes deployment easier, while still providing OpenCost-standard metrics.
History is a deliberate choice, not a default. Whatever the storage backend, the retention window is the upper bound on the period-over-period reporting you can produce. Monthly reporting requires at least 30 days; year-over-year requires a year. Tier cold data to object storage if on-cluster retention gets expensive faster than the engineering time to set up the tiered pipeline.
Reconciliation lag is structural. The 24–48 hour billing-reconciliation delay is a property of cloud-provider billing exports, not of Kubecost. Build the operating model around it: argue about last week, not yesterday.
Multi-cluster needs a story. Open-source Kubecost can federate across clusters, but the experience is rougher than the commercial multi-cluster aggregator. Beyond five or six clusters, decide early whether to run per-cluster Kubecost and aggregate externally — into your own warehouse, for example — or pay for the commercial multi-cluster path. Either is defensible; drifting between the two is not.
The EKS add-on offers a quick way to get started. The Kubecost v3 free tier has a $100k USD spend limit over 30 days, while the Amazon EKS optimized Kubecost bundle is listed by AWS as exempt from that spend limit.
The operating model is the deliverable. If a team installs Kubecost but does not set up regular reviews, they will drift just like a team without any FinOps tools. The standard approach is to have a small FinOps group, such as a platform engineer, a finance analyst, and an SRE on rotation, meet each week to review the capacity-versus-request ratio, identify the most over-provisioned workloads, and check any namespace with a significant change in monthly cost. For smaller teams, a 30-minute review every two weeks with the platform engineer and CTO can achieve the same results.

A practical recommendation

If you are considering a FinOps approach for a Kubernetes platform and do not have a contractual obligation to choose a commercial product, start by piloting open-source Kubecost 3.x. Installation can be completed in an afternoon. Assign at least one namespace to a designated owner, provide a request-versus-usage dashboard to one team for two weeks, and share the capacity-versus-request ratio in a channel visible to the platform team. If regular reviews of these metrics become routine, you have achieved FinOps. If not, adopting a commercial platform will not resolve the underlying issues.

DEV Community