Thanos Has a Free API: Highly Available Prometheus with Long-Term Storage

#monitoring #prometheus #devops #cloudnative

Thanos extends Prometheus with unlimited retention, global query view across clusters, and downsampling. It turns multiple Prometheus instances into a single, highly available metric system.

What Is Thanos?

Thanos is a CNCF incubating project that adds HA, long-term storage, and global querying to Prometheus. It uses object storage (S3, GCS, Azure) for cost-effective retention of years of metrics data.

Key Features:

Global query across Prometheus instances
Unlimited metric retention via object storage
Automatic downsampling (5m, 1h)
Deduplication of HA Prometheus pairs
Compatible with PromQL and Grafana
Compaction and cleanup
Rule evaluation
Store API (gRPC)

Architecture Components

# Thanos Sidecar (runs alongside Prometheus)
thanos sidecar \
  --tsdb.path=/prometheus \
  --prometheus.url=http://localhost:9090 \
  --objstore.config-file=bucket.yml

# Thanos Query (global query layer)
thanos query \
  --store=sidecar-1:10901 \
  --store=sidecar-2:10901 \
  --store=store-gateway:10901

# Thanos Store (serves data from object storage)
thanos store \
  --data-dir=/tmp/thanos-store \
  --objstore.config-file=bucket.yml

# Thanos Compactor (downsampling + compaction)
thanos compact \
  --data-dir=/tmp/thanos-compact \
  --objstore.config-file=bucket.yml \
  --retention.resolution-raw=90d \
  --retention.resolution-5m=1y \
  --retention.resolution-1h=3y

Object Storage Config

# bucket.yml
type: S3
config:
  bucket: thanos-metrics
  endpoint: s3.amazonaws.com
  region: us-east-1
  access_key: AKIAIOSFODNN7EXAMPLE
  secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Thanos Query API (PromQL compatible)

import requests

THANOS = "http://thanos-query:9090/api/v1"

# Instant query across ALL clusters
result = requests.get(f"{THANOS}/query", params={
    "query": 'sum(rate(http_requests_total[5m])) by (cluster)'
}).json()
for r in result["data"]["result"]:
    print(f"Cluster {r['metric']['cluster']}: {float(r['value'][1]):.2f} rps")

# Range query with long-term data
result = requests.get(f"{THANOS}/query_range", params={
    "query": "avg_over_time(up[1h])",
    "start": "2025-01-01T00:00:00Z",
    "end": "2026-03-29T00:00:00Z",
    "step": "24h"
}).json()
print(f"Data points: {len(result['data']['result'][0]['values'])}")

# Get all metric names
metrics = requests.get(f"{THANOS}/label/__name__/values").json()
print(f"Total metrics: {len(metrics['data'])}")

# Check stores
stores = requests.get(f"{THANOS}/stores").json()
for store in stores.get("data", {}).get("store", []):
    print(f"Store: {store['name']}, Min: {store['minTime']}, Max: {store['maxTime']}")

Kubernetes Deployment

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install thanos bitnami/thanos -n monitoring --create-namespace \
  --set query.enabled=true \
  --set storegateway.enabled=true \
  --set compactor.enabled=true \
  --set objstoreConfig="$(cat bucket.yml)"

Thanos vs Alternatives

Feature	Thanos	Cortex/Mimir	VictoriaMetrics
Architecture	Sidecar	Write path	Standalone
Storage	Object store	Object store	Local + remote
PromQL	Full	Full	MetricsQL + PromQL
Dedup	Built-in	Write-time	N/A
Downsampling	5m, 1h	No	No

Resources

Need to scrape web data for your monitoring stack? Check out my web scraping tools on Apify — production-ready actors for Reddit, Google Maps, and more. Questions? Email me at spinov001@gmail.com