AI Ocean

Posted on Jun 26, 2023

Simple Aggregate Logs with Benthos

#datascienc #programming #developer

Motivation

Recently, I had a simple task of creating sample data to simulate multiple users sending requests to the server. I wrote a simple program, which loops and creates records in the database. Every second, it prints the log of the number of records saved within that second (op/s); it was easy to know the total number of ops when running locally with only one instance. When deploying to k8s, I wanted to increase the number of pods to create more stress, but it was difficult to count the total_ops from all these pods.

What I have are logs like this:

{
  "node": "xxx",
  "ops_per_second": 10,
  "avg_latency_ns": 2
}
{
  "node": "yyy",
  "ops_per_second": 10,
  "avg_latency_ns": 4
}

What I want:

{
  "total_ops": 20
}

Solution

To get such a result, I need to stream, parse the logs and store their values. Writing a script wouldn't be too hard, but in the process of researching data processing, I came across Bentos.

What is Benthos?

Benthos is a data processor that can receive data from various sources, has flexible and powerful data processing capabilities with bloblang, and can output to various destinations. It is incredibly easy to install and use and can run locally with a simple command or automatically scale on k8s or cloud (AWS Lambda).

I will have a detailed introduction to Bentos later. In this post, I will only use it in a simple use case to get familiar with: Using stdin as input, and returning metrics in Prometheus format.

Installation

The simplest way is to use a script:

curl -Lsf https://sh.benthos.dev | bash

or Docker:

docker pull jeffail/benthos
docker run --rm -v /path/to/your/config.yaml:/benthos.yaml jeffail/benthos

or Homebrew:

brew install benthos

Configuration

We need to define two components:

Input
Processor

Input

Our input consists of logs from pods on k8s, using kubeclt to stream logs from pods with the specified label.

kubectl logs -l app=my-app -f

We can use a pipeline to provide this input to benthos:

kubectl logs -l app=my-app -f | benthos -c config.yaml

But we can also run the kubectl command itself in benthos using the input type: subprocess:

input:
  subprocess:
    name: "kubectl"
    args:
      - "logs"
      - "-l"
      - "app=my-app-app"
      - "-f"
      - "--max-log-requests"
      - "100"

After this step, you can run the following command to get the log stream:

benthos -c benthos.yaml

Example output:

INFO Running main config from specified file       @service=benthos path=./benthos.yaml
INFO Listening for HTTP requests at: http://0.0.0.0:4195  @service=benthos
INFO Launching a benthos instance, use CTRL+C to close  @service=benthos
{"node": "my-app-7b9bd88d65-xrvfc" ,"ops_per_second": 0, "avg_latency_ms": 0}
{"node": "my-app-7b9bd88d65-flqx7" ,"ops_per_second": 1, "avg_latency_ms": 1000}
{"node": "my-app-7b9bd88d65-6gt9t" ,"ops_per_second": 7, "avg_latency_ms": 142}
{"node": "my-app-7b9bd88d65-hjtcj" ,"ops_per_second": 0, "avg_latency_ms": 0}

Aggregate

Now we need to collect the fields ops_per_second and avg_latency_ms as well as count total_op. Fortunately, Benthos supports components for collecting metrics:

pipeline:
  processors:
    - metric:
        name: total_ops
        type: counter_by
        value: ${!json("ops_per_second")}

counter_by will increase the value of the counter based on the value obtained from value.

benthos -c benthos.yaml

Metrics won't show up in the output but will be displayed on the page: http://0.0.0.0:4195/metrics

...
total_ops{label="",path="root.pipeline.processors.0"} 348
...

So we have the total number of ops created.

Conclusion

However, this method only provides an approximate value, as it only pulls log messages from when the benthos runs, not when the pod is created. Nevertheless, it meets my simple needs, although it could be improved by deploying benthos as a service, running continuously, caching metrics in a database, and automatically refreshing when pods change. We may return to this issue in the future.

DEV Community