Motivation
Recently, I had a simple task of creating sample data to simulate multiple users sending requests to the server. I wrote a simple program, which loops and creates records in the database. Every second, it prints the log of the number of records saved within that second (op/s); it was easy to know the total number of ops when running locally with only one instance. When deploying to k8s, I wanted to increase the number of pods to create more stress, but it was difficult to count the total_ops from all these pods.
What I have are logs like this:
{
"node": "xxx",
"ops_per_second": 10,
"avg_latency_ns": 2
}
{
"node": "yyy",
"ops_per_second": 10,
"avg_latency_ns": 4
}
What I want:
{
"total_ops": 20
}
Solution
To get such a result, I need to stream, parse the logs and store their values. Writing a script wouldn't be too hard, but in the process of researching data processing, I came across Bentos.
What is Benthos?
Benthos is a data processor that can receive data from various sources, has flexible and powerful data processing capabilities with bloblang, and can output to various destinations. It is incredibly easy to install and use and can run locally with a simple command or automatically scale on k8s or cloud (AWS Lambda).
I will have a detailed introduction to Bentos later. In this post, I will only use it in a simple use case to get familiar with: Using stdin as input, and returning metrics in Prometheus format.
Installation
The simplest way is to use a script:
curl -Lsf https://sh.benthos.dev | bash
or Docker:
docker pull jeffail/benthos
docker run --rm -v /path/to/your/config.yaml:/benthos.yaml jeffail/benthos
or Homebrew:
brew install benthos
Configuration
We need to define two components:
- Input
- Processor
Input
Our input consists of logs from pods on k8s, using kubeclt to stream logs from pods with the specified label.
kubectl logs -l app=my-app -f
We can use a pipeline to provide this input to benthos:
kubectl logs -l app=my-app -f | benthos -c config.yaml
But we can also run the kubectl command itself in benthos using the input type: subprocess:
input:
subprocess:
name: "kubectl"
args:
- "logs"
- "-l"
- "app=my-app-app"
- "-f"
- "--max-log-requests"
- "100"
After this step, you can run the following command to get the log stream:
benthos -c benthos.yaml
Example output:
INFO Running main config from specified file @service=benthos path=./benthos.yaml
INFO Listening for HTTP requests at: http://0.0.0.0:4195 @service=benthos
INFO Launching a benthos instance, use CTRL+C to close @service=benthos
{"node": "my-app-7b9bd88d65-xrvfc" ,"ops_per_second": 0, "avg_latency_ms": 0}
{"node": "my-app-7b9bd88d65-flqx7" ,"ops_per_second": 1, "avg_latency_ms": 1000}
{"node": "my-app-7b9bd88d65-6gt9t" ,"ops_per_second": 7, "avg_latency_ms": 142}
{"node": "my-app-7b9bd88d65-hjtcj" ,"ops_per_second": 0, "avg_latency_ms": 0}
Aggregate
Now we need to collect the fields ops_per_second
and avg_latency_ms
as well as count total_op
. Fortunately, Benthos supports components for collecting metrics:
pipeline:
processors:
- metric:
name: total_ops
type: counter_by
value: ${!json("ops_per_second")}
counter_by
will increase the value of the counter based on the value obtained from value
.
benthos -c benthos.yaml
Metrics won't show up in the output but will be displayed on the page: http://0.0.0.0:4195/metrics
...
total_ops{label="",path="root.pipeline.processors.0"} 348
...
So we have the total number of ops created.
Conclusion
However, this method only provides an approximate value, as it only pulls log messages from when the benthos runs, not when the pod is created. Nevertheless, it meets my simple needs, although it could be improved by deploying benthos as a service, running continuously, caching metrics in a database, and automatically refreshing when pods change. We may return to this issue in the future.
Top comments (0)