What is OpenCost?
OpenCost is an open-source CNCF project that provides real-time cost monitoring for Kubernetes clusters. It breaks down your cloud bill by namespace, deployment, pod, and even container — showing exactly where your money goes.
Why OpenCost?
- 100% free and open-source — CNCF sandbox project
- Real-time cost allocation — not monthly bills, but live spending
- Multi-cloud — AWS, GCP, Azure, on-prem pricing
- Namespace-level — chargeback per team/service
- Prometheus integration — export cost metrics alongside performance data
- No vendor lock-in — unlike Kubecost Pro or CloudHealth
Quick Start
# Install via Helm
helm install opencost opencost/opencost \
--namespace opencost --create-namespace \
--set opencost.prometheus.internal.enabled=true
# Port forward to UI
kubectl port-forward -n opencost svc/opencost 9090:9090
# Open http://localhost:9090
Query Costs via API
# Get cost allocation by namespace (last 24h)
curl -s 'http://localhost:9090/allocation/compute?window=24h&aggregate=namespace' | jq '.data[0]'
# Get cost by deployment
curl -s 'http://localhost:9090/allocation/compute?window=7d&aggregate=deployment' | jq '.data'
# Get cost by label (e.g., team)
curl -s 'http://localhost:9090/allocation/compute?window=30d&aggregate=label:team' | jq '.data'
API Response Example
{
"production": {
"name": "production",
"cpuCost": 45.23,
"gpuCost": 0,
"ramCost": 28.67,
"pvCost": 12.40,
"networkCost": 5.30,
"totalCost": 91.60,
"cpuEfficiency": 0.34,
"ramEfficiency": 0.52
}
}
Prometheus Metrics
# Grafana dashboard queries
# Total cluster cost per day
sum(opencost_cluster_cost_total) by (cluster)
# Cost by namespace
sum(opencost_allocation_cost_total) by (namespace)
# CPU efficiency (actual vs requested)
opencost_allocation_cpu_usage / opencost_allocation_cpu_request
# Idle resources cost (wasted money)
sum(opencost_allocation_cpu_idle_cost + opencost_allocation_ram_idle_cost)
Set Up Alerts for Cost Spikes
# PrometheusRule for cost alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cost-alerts
spec:
groups:
- name: cost-alerts
rules:
- alert: NamespaceCostSpike
expr: |
sum by (namespace) (rate(opencost_allocation_cost_total[1h])) * 24
> 100
for: 30m
labels:
severity: warning
annotations:
summary: "Namespace {{ $labels.namespace }} spending >$100/day"
- alert: LowCPUEfficiency
expr: |
opencost_allocation_cpu_usage / opencost_allocation_cpu_request < 0.1
for: 2h
annotations:
summary: "{{ $labels.namespace }}/{{ $labels.pod }} using <10% of requested CPU"
OpenCost vs Alternatives
| Feature | OpenCost | Kubecost | CloudHealth | Spot.io |
|---|---|---|---|---|
| Cost | Free | Free/Pro | Enterprise | Enterprise |
| Open source | Yes (CNCF) | Partial | No | No |
| Real-time | Yes | Yes | Hourly | Hourly |
| Multi-cloud | Yes | Yes | Yes | Yes |
| API | REST | REST | REST | REST |
| GPU costs | Yes | Pro only | Limited | No |
Real-World Impact
A SaaS company discovered through OpenCost that their staging namespace cost $2,100/month — almost as much as production. Investigation revealed: 15 forgotten load tests left running, 8 dev deployments with production-sized resource requests. After cleanup: staging costs dropped to $340/month, saving $21K/year.
Overspending on Kubernetes? I help teams implement cost monitoring and right-sizing. Contact spinov001@gmail.com or explore my automation tools on Apify.
Top comments (3)
Alex, this is useful. One edge we keep seeing in tenant chargeback audits is that labels are present but retry hops silently rewrite originator identity, so totals look right while ownership is wrong.
Concrete pass/fail criterion we now use before calling attribution finance-safe:
PASS: every billed allocation row can be joined to an immutable attribution envelope (tenant_id, originator_id, workflow_id, operation_id, issuance_id) across async retries, with zero orphaned joins in a 24h sample.
FAIL: any charged row is missing one of those keys or changes originator_id after a retry.
Have you seen OpenCost API users add this retry-lineage gate, or are teams still treating namespace or label allocation as sufficient?
Alex, implementation question on the "shows exactly where your money goes" claim: when an async retry hop rewrites caller context, where do you enforce originator identity integrity before any destructive cost write/backfill?
We found label-complete rows can still mis-attribute ownership unless each billed allocation row is joinable to an immutable envelope (tenant_id, originator_id, workflow_id, operation_id, issuance_id) with HMAC verification at the destructive call-site. Have you validated that join in your OpenCost setup, or is it an audit caveat?
Source check from OpenCost issue #3211 (shaunster666 + patsevanton): both report "did not find allocations for asset key ... pvc-*" while the PVC is still Bound via kubectl output.
Question for practitioners here: when you see this pattern, is it usually an ingest/data-shape mismatch (labels/joins missing at allocation time), or an allocator-side resolver gap for PVC asset-key -> workload linkage?
I am trying to separate data-shape drift from a real attribution bug before treating tenant chargeback outputs as reliable.