DEV Community

John  Ajera
John Ajera

Posted on

EKS Metrics: Amazon Managed Prometheus vs Self-Managed Prometheus

EKS Metrics: Amazon Managed Prometheus vs Self-Managed Prometheus

Once your cluster is running workloads, you need a metrics backend: something that scrapes (or receives) time series, stores them, and powers dashboards and alerts. On AWS the fork is usually Amazon Managed Service for Prometheus (AMP)—a managed, Prometheus-compatible store—or self-managed Prometheus in the cluster (Helm chart, operator, or agent + remote storage).

This article is a practical decision guide for that choice on EKS, especially when you are not migrating a decade of PromQL dashboards and recording rules. It covers what each path optimizes for, how cost shows up on the invoice versus in engineer time, how alerting differs, and a short rubric for defaulting to AMP, self-managed, or a hybrid (remote_write).


1. Overview

This guide helps you decide, for a new or early EKS observability stack:

  • What AMP and self-managed Prometheus each own (ingest, storage, query, alerting)
  • How to compare cost at a high level (ingestion, storage, queries, and cluster resources)
  • How to measure or estimate active series and ingestion before you pick a tier
  • What you still run in-cluster either way (exporters, scrape config, ADOT)
  • When a hybrid (Prometheus in-cluster → AMP via remote_write) is the least painful path

2. Prerequisites


3. Name the two starting paths

Both paths speak Prometheus (PromQL, exposition format, alert rule semantics). The split is who runs the TSDB and query API.

Side-by-side

Amazon Managed Prometheus (AMP) Self-managed Prometheus (in-cluster)
AWS runs Ingestion pipeline, durable TSDB, query plane (serverless-style ops)
You run Collectors/agents, scrape or receive config, IAM, workspace wiring Prometheus server (often StatefulSet), PVCs, upgrades, HA, backups
Typical ingest AWS Distro for OpenTelemetry (ADOT) collector, Prometheus agent mode, or remote_write from an in-cluster server In-cluster scrape of ServiceMonitors, Pod annotations, static targets
Alerting AMP managed Alertmanager and/or route to SNS; Grafana alerting is a separate choice Alertmanager you deploy (often same Helm release or a sibling chart)
Dashboards Often Amazon Managed Grafana or self-hosted Grafana with AMP as datasource Grafana (or similar) pointing at in-cluster Prometheus Service
Multi-cluster Natural fit: one workspace per env/region or federation patterns with less TSDB ops Per-cluster Prometheus + optional Thanos/Mimir if you outgrow one server

AMP — responsibility flow

+-------------------------------+
| EKS: workloads + exporters    |
| (node-exporter, kube-state,   |
|  app /metrics, ADOT collector)|
+-------------------------------+
              |
              | remote_write / ADOT pipeline
              v
+-------------------------------+
| AWS: AMP workspace (TSDB +      |
| PromQL query API)             |
+-------------------------------+
              |
              v
+-------------------------------+
| You: Grafana / AMP Alertmgr / |
| SNS routes, dashboards, SLOs  |
+-------------------------------+
Enter fullscreen mode Exit fullscreen mode

Self-managed — responsibility flow

+-------------------------------+
| EKS: Prometheus server        |
| (scrape + TSDB on PVC/emptyDir)|
+-------------------------------+
              |
              v
+-------------------------------+
| EKS: Alertmanager (optional)  |
| receivers → Slack/PagerDuty   |
+-------------------------------+
              |
              v
+-------------------------------+
| You: Grafana, rule lifecycle, |
| upgrades, capacity, backups   |
+-------------------------------+
Enter fullscreen mode Exit fullscreen mode

Same for both: You still own what gets scraped, label cardinality, RBAC for the UI, and runbooks when alerts fire. Picking AMP does not remove the need for good metric and alert design.


4. Cost at a glance

Pricing moves; verify against AMP pricing and your EC2/EBS bill for self-managed. Think in three buckets: ingestion, storage/retention, queries (AMP) versus compute + disk + people (self-managed).

AMP (managed)

Cost driver What drives it up
Ingested samples Short scrape intervals, high-cardinality labels, many targets
Storage Long retention, high churn, many series
Queries Heavy Grafana dashboards, ad-hoc PromQL, recording rules evaluated in AMP

AMP can be cheap at small scale and expensive when cardinality explodes (unbounded pod labels, high-cardinality HTTP paths, per-UUID labels). Cost guardrails (sampling, relabel drops, allowed metric lists) matter more than on a single dev cluster where you only notice disk growth.

Self-managed (in-cluster)

Cost driver What drives it up
EC2 / Karpenter nodes CPU/memory for Prometheus replicas and rule evaluation
EBS (or equivalent) TSDB size × retention; compactions and WAL
Engineer time Chart upgrades, PVC expansion, HA drills, backup/restore testing

A minimal HA self-managed stack (two Prometheus replicas, anti-affinity, 20–50 Gi PVCs each, small Alertmanager) is often tens of dollars per month in AWS resources for a small cluster—before counting on-call and upgrade work. The invoice line is predictable; the hidden cost is operational.

Bottom line: AMP trades variable, usage-based spend for less TSDB operations. Self-managed trades fixed-ish infra cost for more control and more chores. Neither removes the need to design metrics carefully.


5. How to check and estimate your usage

Cost conversations are easier when you separate active series (how many distinct time series the TSDB holds) from ingestion rate (how many samples per second you push—what AMP bills most directly).

What counts as one series

A time series is one metric name plus one unique label set. Example: http_requests_total{method="GET",status="200",pod="abc"} is one series. More pods, paths, or IDs in labels → more series.

Rough scale tiers (planning only)

These are order-of-magnitude guides, not limits:

Tier Active series (ballpark) Typical EKS picture
Small ~1k–5k Minimal scrape (little kube-state-metrics, no full cAdvisor), or aggressive metric_relabel_configs drops
Medium ~10k–30k node-exporter + kube-state-metrics + kubelet/cAdvisor + apiserver on a small cluster (handful of nodes, tens of pods)
Large 50k+ Full default chart scrape, many microservices, per-path HTTP metrics, or duplicate exporters

A standard platform scrape (kube-state-metrics, kubernetes-nodes-cadvisor, apiserver, node-exporter) on a small cluster often lands above 10k even when it still feels “small” operationally—measure rather than assuming the 1k tier.

Turn series into ingestion (AMP-style math)

At a steady scrape interval, each active series produces about one sample per scrape:

samples_per_second ≈ active_series ÷ scrape_interval_seconds
Enter fullscreen mode Exit fullscreen mode

Examples at 30s scrape:

Active series ~samples/s ~samples/month (30 days)
1,000 ~33 ~86M
10,000 ~333 ~864M
25,000 ~833 ~2.2B
50,000 ~1,667 ~4.3B

Use 86,400 × 30 seconds per month for back-of-envelope planning. If jobs use different intervals, sum per job or use a weighted average.

Back-of-envelope before deploy

List scrape jobs and estimate series per target:

Source How to estimate
node-exporter ~400–1,000 series × node count (disks and NICs add labels)
kube-state-metrics ~20–40 series × pod count, plus Deployments, PVCs, HPA, PDB, …
kubelet / cAdvisor (kubernetes-nodes-cadvisor) Often the largest bucket—scales with containers, not just nodes
apiserver Often thousands of series (histogram buckets and verb/resource labels)
Application /metrics Highly variable—histograms and high-cardinality labels dominate
active_series ≈ Σ (targets × series_per_target)   # plus a little for Prometheus self-metrics
Enter fullscreen mode Exit fullscreen mode

Churn (pods created and destroyed) affects AMP storage more than steady active series; dynamic fleets need headroom.

How to check (self-managed Prometheus already running)

Port-forward to the Prometheus server (adjust namespace and service name to match your install):

kubectl -n prometheus port-forward svc/prometheus-server 9090:80
Enter fullscreen mode Exit fullscreen mode

Headline count — TSDB stats API:

curl -s http://127.0.0.1:9090/api/v1/status/tsdb | jq '.data.numSeries'
Enter fullscreen mode Exit fullscreen mode

Same value in the UI: Status → TSDB Stats. The metric prometheus_tsdb_head_series tracks it over time.

Top metrics by cardinality (can be expensive on very large TSDBs—use in non-prod or off-peak):

curl -sG 'http://127.0.0.1:9090/api/v1/query' \
  --data-urlencode 'query=topk(20, count by (__name__) ({__name__=~".+"}))' \
  | jq '.data.result[] | {metric: .metric.__name__, series: .value[1]}'
Enter fullscreen mode Exit fullscreen mode

Top scrape jobs (find what to trim):

curl -sG 'http://127.0.0.1:9090/api/v1/query' \
  --data-urlencode 'query=topk(10, count by (job) ({__name__=~".+"}))' \
  | jq '.data.result[] | {job: .metric.job, series: .value[1]}'
Enter fullscreen mode Exit fullscreen mode

In the UI: Status → Targets shows per-target health and last scrape size—useful when a single job spikes.

How to check (AMP)

  • CloudWatch metrics on the AMP workspace (ingestion and active series—see Monitor AMP).
  • AWS console → AMP workspace usage views for growth over time.
  • If you use ADOT or remote_write, the sender (collector or in-cluster Prometheus) still exposes scrape stats—debug cardinality at the source before it hits the workspace.

What usually blows up series count

  1. High-cardinality labelspod, url, trace_id, unbounded user_id on high-volume metrics.
  2. Duplicate scrape — EKS addon node-exporter and Helm node-exporter, or two Prometheus replicas each scraping everything without needing two full TSDBs for AMP.
  3. Histograms — each bucket, plus _sum and _count, is multiple series per logical metric.
  4. Full cAdvisor / apiserver defaults with no relabel drops.

After you have numSeries and the top two job values, you can map yourself to the small / medium / large table above and plug numbers into the samples per month formula for AMP pricing.


6. Who runs what—and what the first year feels like

Where the work lands

Area AMP Self-managed
TSDB durability & scaling AWS You (PVC size, retention flags, compaction behavior)
Prometheus version upgrades Managed compatibility window You (Helm chart / operator upgrades, CRD drift)
Scrape discovery You (collector config, EKS receivers, ServiceMonitor CRDs) You (same; often more familiar with prometheus.yml in-cluster)
Recording / alerting rules AMP rule groups or in-cluster evaluation + remote_write Native serverFiles / PrometheusRule CRDs in Git
Long-term retention / global view AMP + optional export; or Mimir/Thanos later Add Thanos/Mimir/Cortex when one server is not enough
UI for debugging Grafana → AMP; limited “SSH into Prometheus” kubectl port-forward to :9090—fast for platform engineers

AMP in practice

  • Less toil on TSDB HA, backups, and “disk full” pages for the metrics store
  • Strong fit for org-wide standards and IAM-bound workspaces
  • Watch ingestion billing and cardinality; use ADOT/Prometheus relabeling deliberately
  • Alertmanager behavior is AWS-shaped—read AMP Alertmanager docs before assuming OSS Alertmanager feature parity

Self-managed in practice

  • Full control of scrape config, rule files, federation, and “break glass” PromQL on localhost
  • Familiar path: prometheus-community/prometheus Helm chart, GitOps overlays per cluster
  • Recurring work: chart bumps with Kubernetes upgrades, PVC growth, proving Alertmanager HA, securing the Prometheus UI (it has no built-in auth—use network policy, private ingress, or an auth proxy)
  • EKS add-ons can cover node-exporter (managed addon) while you keep a narrow Prometheus release for the server and rules—avoid duplicating the same metrics twice

7. EKS-specific wiring (both paths)

Neither option removes in-cluster collection:

Component Typical role
Metrics server HPA CPU/memory—not a Prometheus replacement
prometheus-node-exporter Node/host metrics (DaemonSet or EKS managed addon)
kube-state-metrics Kubernetes object metrics (Deployment, Pod, PVC, …)
CoreDNS / kubelet / apiserver Cluster health; scrape config or ADOT receivers
Application /metrics Your SLOs and business metrics

AMP path: deploy ADOT (or Prometheus in agent mode) with remote_write to the AMP workspace endpoint; use EKS Pod Identity or IRSA for SigV4 authentication.

Self-managed path: enable targets in Helm values or the Prometheus Operator; pin kube-state-metrics and exporters intentionally so you do not scrape the same series twice.

Hybrid (common): run a small in-cluster Prometheus for fast local debugging and federation-style rules, remote_write aggregates or critical series to AMP for org dashboards and long retention. You pay complexity once, not double storage for every raw sample.


8. Alerting and Grafana

Topic AMP Self-managed
Rule evaluation AMP rule groups and/or in-cluster Prometheus firing → AMP Prometheus alerting_rules.yml / PrometheusRule CRDs
Alert routing AMP Alertmanager, SNS, EventBridge; fewer “random webhook” examples in AWS docs Alertmanager receivers (Slack, PagerDuty, Opsgenie) in Git—with secrets via External Secrets
Grafana Managed Grafana with AMP datasource is the path of least resistance Platform Grafana in-cluster; datasource = Prometheus Service DNS
Double paging Risk if both AMP rules and Grafana unified alerting fire on the same metric Risk if both Prometheus and Grafana own the same alerts—pick one owner

9. Decision rubric (greenfield)

Lean toward AMP when:

  • You want no StatefulSet TSDB to babysit and are fine with usage-based metrics cost
  • Centralized observability across many accounts/clusters matters soon
  • You will standardize on ADOT / AWS observability patterns and Managed Grafana
  • The team is small and should not own Prometheus compaction, PVC resize, and version skew

Lean toward self-managed Prometheus when:

  • You need maximum control (custom scrape hooks, exotic service discovery, air-gapped patterns)
  • Most engineers live in port-forward PromQL and Git-managed prometheus.yml / rules
  • Predictable infra cost matters more than eliminating ops (small cluster, disciplined cardinality)
  • You already run kube-prometheus-stack patterns and want one chart to own rules + Alertmanager

Lean toward hybrid when:

  • You need in-cluster debugging and org-wide long retention or dashboards in AMP
  • You are migrating: start self-managed, remote_write to AMP, cut over Grafana datasources, then shrink in-cluster retention

Default suggestion for a greenfield EKS platform team: if nobody wants to own TSDB operations, start with AMP + ADOT and Managed Grafana; if the team is building GitOps muscle on platform charts and wants the fastest “see /targets and /rules locally” loop, self-managed with a narrow chart (server + optional Alertmanager, exporters wired deliberately) is a solid teaching path. Revisit when cardinality, retention, or multi-cluster pain appears.


10. Troubleshooting: common misconceptions

  • “AMP means we don’t run anything in the cluster.” You still run collectors/exporters and own cardinality.
  • “Self-managed is always cheaper.” EBS + replicas can be modest; engineer time and incident cost often dominate.
  • “We installed the prometheus-node-exporter EKS addon, so we have Prometheus.” The addon is node metrics, not the TSDB server.
  • “Grafana alerting replaces Prometheus/Alertmanager.” It can—but two owners for the same alert is how you get double pages.
  • “remote_write is free duplication.” It is not; you pay network, ingest, and often double evaluation unless you design what gets forwarded.

11. References

Top comments (0)