DEV Community

Cover image for Pushing application metrics to otel-collector
Ashok Nagaraj
Ashok Nagaraj

Posted on

Pushing application metrics to otel-collector

Prometheus uses a pull model to collect metrics from targets. This means that Prometheus actively queries the targets for their metrics at regular intervals. The pull model is simple to implement and can be used to monitor a wide variety of targets. However, it can also be inefficient, as Prometheus may have to query targets that do not have any new metrics to report.
While not equally popular, Prometheus also supports push model through PUSH gateway. The push model scores as it can be used out-of-box without service-discovery challenges or altering prometheus configuation, but can be more complex to implement, and it can also be more difficult to scale.

OpenTelemetry (new kid on the block) supports both pull and push models for collecting metrics. The main difference between the two models in OpenTelemetry is that the push model is implemented using a standardized protocol called the OpenTelemetry Collector API. This makes it easier to integrate OpenTelemetry with a variety of monitoring systems.

Comparison
Feature Pull Model Push Model
Implementation Prometheus scrapes metrics from the target at regular intervals. The target sends metrics to the OpenTelemetry Collector at regular intervals.
Protocol Prometheus uses its own proprietary protocol. The OpenTelemetry Collector API is a standardized protocol that can be used with a variety of monitoring systems.
Ease of implementation The pull model is simpler to implement. The push model is more complex to implement, but it can be more efficient for targets that do not change frequently.
Scalability The pull model can be more scalable, as Prometheus can scale out to handle more targets. The push model can be less scalable, as the OpenTelemetry Collector may need to be scaled to handle more targets.
How to use opentelemetry APIs to PUSH metrics to otel-collector

Let us setup otel-collector to accept metrics and PUSH to prometheus-remote-write exporter

❯ cat otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:

exporters:
  logging:
    verbosity: detailed
  prometheusremotewrite:
    endpoint: "https://demo-mimir:9009/api/v1/push"
    external_labels:
      foo: bar
      env: demo

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging, prometheusremotewrite]


# Note: we use opentelemetry-collector-contrib as opentelemetry-collector doesn't support prometheusremotewrite yet
❯ cat docker-compose.yaml

...

otel-collector:
  image: otel/opentelemetry-collector-contrib
  volumes:
    - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
  ports:
    - 1888:1888 # pprof extension
    - 8888:8888 # Prometheus metrics exposed by the collector
    - 8889:8889 # Prometheus exporter metrics
    - 13133:13133 # health_check extension
    - 4317:4317 # OTLP gRPC receiver
    - 4318:4318 # OTLP http receiver
    - 55679:55679 # zpages extension

...

Enter fullscreen mode Exit fullscreen mode

Open-telemetry metrics API is very extensive and has a hierarchy of components to create, group, process and export metrics. It also supports two kinds of metrics (instruments in Otel terms) - synchronous and asynchronous; read more here

The code to instrument is a simple flask application to roll dice

import random
import time
from flask import Flask, request, jsonify

app = Flask(__name__)

def do_roll():
  time.sleep(random.randint(1, 3))
  return random.randint(1, 8)

@app.route('/roll')
def roll():
  roll = do_roll()
  if roll > 6:
    return jsonify({'roll': roll, 'error': 'out of bounds'}), 500
  return jsonify({'roll': roll})


@app.route('/ping')
def ping():
  message = 'pong'
  return jsonify({'message': message})


if __name__ == '__main__':
  app.run(host='0.0.0.0', port=15000, debug=True)
Enter fullscreen mode Exit fullscreen mode

Once we add instrumentation, the code looks like this:

import random
import time
from flask import Flask, request, jsonify

from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.resources import SERVICE_NAME, SERVICE_NAMESPACE, SERVICE_VERSION, Resource

# Service name is required for most backends
resource = Resource(attributes={
    SERVICE_NAME: "dice-roller",
    SERVICE_NAMESPACE: "devx",
    SERVICE_VERSION: "1.0.0"
})

COLLECTOR_ENDPOINT = "http://localhost:4317"
INTERVAL_SEC = 10

# Boiler plate initialization
metric_reader = PeriodicExportingMetricReader(OTLPMetricExporter(endpoint=COLLECTOR_ENDPOINT), INTERVAL_SEC)
provider = MeterProvider(metric_readers=[metric_reader], resource=resource)

# Sets the global default meter provider
metrics.set_meter_provider(provider)

# Creates a meter from the global meter provider
meter = metrics.get_meter("dice-roller", "1.0.0")

# Add instruments
calls = meter.create_counter(name='api_calls')
duration = meter.create_up_down_counter(name='api_duration')
errors = meter.create_counter(name='api_errors')
size = meter.create_histogram(name='response_size')

app = Flask(__name__)

# decorator to add metrics to a function
def add_metrics(func):
    def wrapper(*args, **kwargs):
        attributes = {'path': func.__name__}
        calls.add(1, attributes)

        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()

        duration.add(end_time - start_time, attributes)
        size.add(len(result), attributes)

        return result
    return wrapper


def do_roll():
    time.sleep(random.randint(1, 3))
    return random.randint(1, 8)


@add_metrics
@app.route('/roll')
def roll():
    roll = do_roll()
    if roll > 6:
        errors.add(1, {'path': '/roll'})
    return jsonify({'roll': roll})


@add_metrics
@app.route('/ping')
def ping():
    message = 'pong'
    return jsonify({'message': message})


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=15000, debug=True)
Enter fullscreen mode Exit fullscreen mode
  1. We create a meter provider with the OTLP exporter. This exporter sends metrics to the endpoint http://localhost:4317.
  2. We create a meter from the meter provider. This is where we create our instruments.
  3. We create our instruments. We create a counter for the number of API calls, an up-down counter for the duration of the API calls, a counter for the number of errors, and a histogram for the size of the response.
  4. We create a decorator to add metrics to our functions. This decorator adds to the number of API calls, measures the duration of the function, and measures the size of the response.
  5. We create the function that rolls the dice. We add a random delay to simulate a real API call.

To verify one can check the otel-collector log for the PUSHed metrics

...
otel-collector  | Descriptor:
otel-collector  |      -> Name: api_calls
otel-collector  |      -> Description:
otel-collector  |      -> Unit:
otel-collector  |      -> DataType: Sum
otel-collector  |      -> IsMonotonic: true
otel-collector  |      -> AggregationTemporality: Cumulative
otel-collector  | NumberDataPoints #0
otel-collector  | Data point attributes:
otel-collector  |      -> path: Str(/roll)
otel-collector  | StartTimestamp: 2023-08-15 16:01:34.465746 +0000 UTC
otel-collector  | Timestamp: 2023-08-15 16:01:34.466164 +0000 UTC
otel-collector  | Value: 1
otel-collector  | Metric #1
otel-collector  | Descriptor:
otel-collector  |      -> Name: api_duration
otel-collector  |      -> Description:
otel-collector  |      -> Unit:
otel-collector  |      -> DataType: Sum
otel-collector  |      -> IsMonotonic: false
otel-collector  |      -> AggregationTemporality: Cumulative
otel-collector  | NumberDataPoints #0
otel-collector  | Data point attributes:
otel-collector  |      -> path: Str(/roll)
otel-collector  | StartTimestamp: 2023-08-15 16:01:34.465789 +0000 UTC
...
Enter fullscreen mode Exit fullscreen mode

Same can be verified in the Grafana connected to Mimir configured as the prometheusremotewrite exporter in the otel-collector


More reading

Top comments (0)