DEV Community

Alex Spinov
Alex Spinov

Posted on

Thanos Has a Free API: Highly Available Prometheus with Long-Term Storage

Thanos extends Prometheus with unlimited retention, global query view across clusters, and downsampling. It turns multiple Prometheus instances into a single, highly available metric system.

What Is Thanos?

Thanos is a CNCF incubating project that adds HA, long-term storage, and global querying to Prometheus. It uses object storage (S3, GCS, Azure) for cost-effective retention of years of metrics data.

Key Features:

  • Global query across Prometheus instances
  • Unlimited metric retention via object storage
  • Automatic downsampling (5m, 1h)
  • Deduplication of HA Prometheus pairs
  • Compatible with PromQL and Grafana
  • Compaction and cleanup
  • Rule evaluation
  • Store API (gRPC)

Architecture Components

# Thanos Sidecar (runs alongside Prometheus)
thanos sidecar \
  --tsdb.path=/prometheus \
  --prometheus.url=http://localhost:9090 \
  --objstore.config-file=bucket.yml

# Thanos Query (global query layer)
thanos query \
  --store=sidecar-1:10901 \
  --store=sidecar-2:10901 \
  --store=store-gateway:10901

# Thanos Store (serves data from object storage)
thanos store \
  --data-dir=/tmp/thanos-store \
  --objstore.config-file=bucket.yml

# Thanos Compactor (downsampling + compaction)
thanos compact \
  --data-dir=/tmp/thanos-compact \
  --objstore.config-file=bucket.yml \
  --retention.resolution-raw=90d \
  --retention.resolution-5m=1y \
  --retention.resolution-1h=3y
Enter fullscreen mode Exit fullscreen mode

Object Storage Config

# bucket.yml
type: S3
config:
  bucket: thanos-metrics
  endpoint: s3.amazonaws.com
  region: us-east-1
  access_key: AKIAIOSFODNN7EXAMPLE
  secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Enter fullscreen mode Exit fullscreen mode

Thanos Query API (PromQL compatible)

import requests

THANOS = "http://thanos-query:9090/api/v1"

# Instant query across ALL clusters
result = requests.get(f"{THANOS}/query", params={
    "query": 'sum(rate(http_requests_total[5m])) by (cluster)'
}).json()
for r in result["data"]["result"]:
    print(f"Cluster {r['metric']['cluster']}: {float(r['value'][1]):.2f} rps")

# Range query with long-term data
result = requests.get(f"{THANOS}/query_range", params={
    "query": "avg_over_time(up[1h])",
    "start": "2025-01-01T00:00:00Z",
    "end": "2026-03-29T00:00:00Z",
    "step": "24h"
}).json()
print(f"Data points: {len(result['data']['result'][0]['values'])}")

# Get all metric names
metrics = requests.get(f"{THANOS}/label/__name__/values").json()
print(f"Total metrics: {len(metrics['data'])}")

# Check stores
stores = requests.get(f"{THANOS}/stores").json()
for store in stores.get("data", {}).get("store", []):
    print(f"Store: {store['name']}, Min: {store['minTime']}, Max: {store['maxTime']}")
Enter fullscreen mode Exit fullscreen mode

Kubernetes Deployment

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install thanos bitnami/thanos -n monitoring --create-namespace \
  --set query.enabled=true \
  --set storegateway.enabled=true \
  --set compactor.enabled=true \
  --set objstoreConfig="$(cat bucket.yml)"
Enter fullscreen mode Exit fullscreen mode

Thanos vs Alternatives

Feature Thanos Cortex/Mimir VictoriaMetrics
Architecture Sidecar Write path Standalone
Storage Object store Object store Local + remote
PromQL Full Full MetricsQL + PromQL
Dedup Built-in Write-time N/A
Downsampling 5m, 1h No No

Resources


Need to scrape web data for your monitoring stack? Check out my web scraping tools on Apify — production-ready actors for Reddit, Google Maps, and more. Questions? Email me at spinov001@gmail.com

Top comments (0)