DEV Community

Cover image for Simple Aggregate Logs with Benthos
AI Ocean
AI Ocean

Posted on

Simple Aggregate Logs with Benthos

Motivation

Recently, I had a simple task of creating sample data to simulate multiple users sending requests to the server. I wrote a simple program, which loops and creates records in the database. Every second, it prints the log of the number of records saved within that second (op/s); it was easy to know the total number of ops when running locally with only one instance. When deploying to k8s, I wanted to increase the number of pods to create more stress, but it was difficult to count the total_ops from all these pods.

What I have are logs like this:

{
  "node": "xxx",
  "ops_per_second": 10,
  "avg_latency_ns": 2
}
{
  "node": "yyy",
  "ops_per_second": 10,
  "avg_latency_ns": 4
}
Enter fullscreen mode Exit fullscreen mode

What I want:

{
  "total_ops": 20
}
Enter fullscreen mode Exit fullscreen mode

Solution

To get such a result, I need to stream, parse the logs and store their values. Writing a script wouldn't be too hard, but in the process of researching data processing, I came across Bentos.

What is Benthos?

what is benthos

Benthos is a data processor that can receive data from various sources, has flexible and powerful data processing capabilities with bloblang, and can output to various destinations. It is incredibly easy to install and use and can run locally with a simple command or automatically scale on k8s or cloud (AWS Lambda).

I will have a detailed introduction to Bentos later. In this post, I will only use it in a simple use case to get familiar with: Using stdin as input, and returning metrics in Prometheus format.

Installation

The simplest way is to use a script:

curl -Lsf https://sh.benthos.dev | bash
Enter fullscreen mode Exit fullscreen mode

or Docker:

docker pull jeffail/benthos
docker run --rm -v /path/to/your/config.yaml:/benthos.yaml jeffail/benthos
Enter fullscreen mode Exit fullscreen mode

or Homebrew:

brew install benthos
Enter fullscreen mode Exit fullscreen mode

Configuration

We need to define two components:

  1. Input
  2. Processor

Input

Our input consists of logs from pods on k8s, using kubeclt to stream logs from pods with the specified label.

kubectl logs -l app=my-app -f
Enter fullscreen mode Exit fullscreen mode

We can use a pipeline to provide this input to benthos:

kubectl logs -l app=my-app -f | benthos -c config.yaml
Enter fullscreen mode Exit fullscreen mode

But we can also run the kubectl command itself in benthos using the input type: subprocess:

input:
  subprocess:
    name: "kubectl"
    args:
      - "logs"
      - "-l"
      - "app=my-app-app"
      - "-f"
      - "--max-log-requests"
      - "100"
Enter fullscreen mode Exit fullscreen mode

After this step, you can run the following command to get the log stream:

benthos -c benthos.yaml
Enter fullscreen mode Exit fullscreen mode

Example output:

INFO Running main config from specified file       @service=benthos path=./benthos.yaml
INFO Listening for HTTP requests at: http://0.0.0.0:4195  @service=benthos
INFO Launching a benthos instance, use CTRL+C to close  @service=benthos
{"node": "my-app-7b9bd88d65-xrvfc" ,"ops_per_second": 0, "avg_latency_ms": 0}
{"node": "my-app-7b9bd88d65-flqx7" ,"ops_per_second": 1, "avg_latency_ms": 1000}
{"node": "my-app-7b9bd88d65-6gt9t" ,"ops_per_second": 7, "avg_latency_ms": 142}
{"node": "my-app-7b9bd88d65-hjtcj" ,"ops_per_second": 0, "avg_latency_ms": 0}
Enter fullscreen mode Exit fullscreen mode

Aggregate

Now we need to collect the fields ops_per_second and avg_latency_ms as well as count total_op. Fortunately, Benthos supports components for collecting metrics:

pipeline:
  processors:
    - metric:
        name: total_ops
        type: counter_by
        value: ${!json("ops_per_second")}
Enter fullscreen mode Exit fullscreen mode

counter_by will increase the value of the counter based on the value obtained from value.

benthos -c benthos.yaml
Enter fullscreen mode Exit fullscreen mode

Metrics won't show up in the output but will be displayed on the page: http://0.0.0.0:4195/metrics

...
total_ops{label="",path="root.pipeline.processors.0"} 348
...
Enter fullscreen mode Exit fullscreen mode

So we have the total number of ops created.

Conclusion

However, this method only provides an approximate value, as it only pulls log messages from when the benthos runs, not when the pod is created. Nevertheless, it meets my simple needs, although it could be improved by deploying benthos as a service, running continuously, caching metrics in a database, and automatically refreshing when pods change. We may return to this issue in the future.

Top comments (0)