Preface
There is a program to process events (metrics, logs, traces) from different sources (files, scripts, scraping, HTTP servers, syslogs, etc.) into different sinks (Prometheus exporter, remote HTTP, files, etc.), while transforming it in different ways.
It's called Vector. And it is amazing by all means, except one: there is no tutorial for VRL. There are docs on transforms, sinks, sources; there is a language reference; but there is no proper tutorial for the main feature of that program, the built-in language to do transformations, called Vector Remap Language.
I found myself in a situation when I needed to write a complicated transformation (~0.05 kloc), and to do so I learned VRL for real.
Because I found no introductions to VRL, I decided to write my own.
Minimal required onboarding into Vector
While I won't talk much about other transforms, sinks, and sources, I need to give enough background about them to use VRL.
Vector has sources, transforms, and sinks. Events flow from sources, through transforms, into sinks. Events are always moving forward. They can be "fanned out" into multiple transforms/sinks or can come from multiple sources/transforms, but they never create a loop (never go back). It's called a DAG (directed acyclic graph).
Each transform or sink has a list of "inputs" to previous sources or transforms.
This is an over-engineered cat
command. It reads data from the stdin source and writes it to the stdout sink:
sources:
stdin:
type: "stdin"
sinks:
stdout:
type: "console"
inputs: ["stdin"]
encoding:
codec: "text"
To run it we run Vector --config example.yaml
.
Everything we put into stdin gets output to stdout (but will be normalized in terms of Unicode characters, so it's not a true cat
program).
This is almost enough for our goals. One more thing: we will create a remap
transforms in the middle. A VRL description is coming, but for now just believe me that empty text is a valid VRL program: it changes nothing, and the incoming event is passed further.
This is our tutorial setup; we will use it to learn VRL. The actual VRL program is written inside the source
stanza of the our_example
transforms.
sources:
stdin:
type: "stdin"
transforms:
our_example:
inputs: ["stdin"]
type: remap
source: ""
sinks:
stdout:
type: "console"
inputs: ["our_example"]
encoding:
codec: "json"
This creates a very simple DAG: stdin->our_example->stdout
. In this chain of articles we will focus on the source
part of our_example
. This is where VRL happens. (Note: you can have multiple remap transforms, and each will have its own VRL program.)
Note: we changed the codec for sink stdout
to json
; this will help us see what exactly we have done.
What is 'event'?
If you run the previous program, you will see this as output (plus some log info from Vector itself; ignore it). I split lines into a few, but you will see it as a single line if you run it yourself.
{
"host":"shuttle1",
"message":"test",
"source_type":"stdin",
"timestamp":"1985-10-09T17:43:54.287916009Z"
}
I typed test
into stdin and got this output. This is an event, and it has type log
(we will talk about logs vs. metrics a bit later).
Note that I typed only test
, but we got a lot of metadata — hostname, timestamp, source information — and the message
, the stuff I typed.
Events come from sources. If they pass through a transforms of type remap
, the corresponding VRL program from the source
will run. A single event contains either a log line, or a sample for a series, or a trace. VRL programs never run by themselves; they always require an event to come from a source.
To save myself on long listings I won't show the full config for Vector anymore, only the content of the source
(the VRL program).
Hello world
Because we can't just "run" a VRL program, we need to pass something into stdin. To make it easier, I run it like this:
echo 'test'|vector --config example.yaml
or even like this
echo 'test'|vector --config example.yaml -q
(the latter removes most of the logs from output)
So, let's write a hello world program. (Note, I show only the source
content for the transforms our_example
.)
.message = "Hello world"
Note: YAML allows writing multiline strings, so my transform looks like this:
transforms:
our_example:
inputs: ["stdin"]
type: remap
source: |
.message = "Hello World"
(just keep indentation correct and you may pretend YAML does not exist)
If we run it with a command line (see above), we get...
As expected:
{
"host": "shuttle1",
"message": "Hello World",
"source_type": "stdin",
"timestamp": "1985-10-09T18:02:33.317302641Z"
}
Now, let's run Vector without an echo |
before:
vector --config example.yaml -q|jq .
and type two lines of some text (in my case test
and no!
):
test
{
"host": "shuttle1",
"message": "Hello World",
"source_type": "stdin",
"timestamp": "1985-10-09T18:03:51.444177436Z"
}
no!
{
"host": "shuttle1",
"message": "Hello World",
"source_type": "stdin",
"timestamp": "1985-10-09T18:03:52.829102974Z"
}
As you can see, every input line caused a new run of our VRL program, which replaced the message with "Hello World".
Note: we kept our metadata intact. And we got a timestamp.
Let's clean it a bit:
.message = "Hello World"
del(.timestamp)
del(.host)
del(.source_type)
Now, if we type two lines, we get only the message:
test
{
"message": "Hello World"
}
test
{
"message": "Hello World"
}
We will learn about VRL in the next chapter. Also, the thing I did (removing almost all metadata) is not recommended; it can break many sinks. We do it here just to get a taste of how VRL looks.
Basics of the basics of VRL
As you can see, the VRL program consists of zero or more expressions. There is no ';' , and spaces are not important. If we call a function, arguments are in brackets , strings are in quotes, =
is the assignment operator. There is something fishy about .
at the beginning of things, but that's the topic for the next chapter.
This is the end of Chapter 1. Chapter 2 will follow soon.
Top comments (0)