Logstash:
Traditionally, Logstash has been used to process logs from applications and send them to Elasticsearch, hence the name. That’s still a popular use case, but Logstash has evolved into a more general purpose tool, meaning that Logstash is a data processing pipeline.
The data that Logstash receives, will be handled as events, which can be anything of your choice. They could be log file entries, e-commerce orders, customers, chat messages, etc. These events are then processed by Logstash and shipped off to one or more destinations.
A couple of examples could be Elasticsearch, a Kafka queue, an e-mail message, or to an HTTP endpoint.
A Logstash pipeline consists of three stages: an Input stage, Filter stage, and output stage. Each stage can make use of a plugin to do its task.
- Input Stage: Input stage is how the Logstash receives the data. An input plugin could be a file so that the Logstash reads events from a file, It could be an HTTP endpoint or it could be a relational database or even a Kafka queue Logstash can listen to.
- Filter Stage: Filter stage is all about how Logstash would process the events received from Input stage plugins. Here we can parse CSV, XML, or JSON. We can also do data enrichment, such as looking up an IP address and resolving its geographical location or look up data in a relational database.
- Output Stage: An output plugin is where we send the processed events to. Formally, those places are called stashes. These places can be a database, a file, an Elasticsearch instance, a Kafka queue and so on.
So, Logstash receives events from one or more inputs plugins at the Input Stage, processes them at Filter Stage, and sends them to one or more stashes at Output Stage.
You can have multiple pipelines running within the same Logstash instance if you want to, and Logstash is horizontally scalable.
A Logstash pipeline is defined in a proprietary markup format that is somewhat similar to JSON. Technically, It’s not only a markup language, as we can also add conditional statements and make a Logstash pipeline dynamic.
A Sample Logstash Pipeline Configuration:
input { file { path => "/path/to/your/logfile.log" } } filter { if [request] in ["/robots.txt"] { drop {} } } output { file { path => "%{type}_%{+yyyy_MM_dd}.log" } }
Let us consider a basic use case of Logstash before moving to other components of our ELK stack. Suppose that we want to process access logs from the web server.
We can actually configure Logstash to read the log file line by line and consider each line as a separate event. This can be easily done by using the input plugin named “file,” but there is a handy tool named Beats that is far better for this task.
Once the Logstash received a line, it can process it further, Technically, a line is just a string, a collection of words and we need to parse this string so that we can fetch valuable information out of it like the status code, request path, IP address and so on.
We can do so by writing a "Grok" pattern which is somewhat similar to a regular expression, to match pieces of information and save them into fields. Now suppose our "stash" here is Elasticsearch, we can easily save our processed bits of information stored in fields to the Elasticsearch as JSON objects.
Liked this blog? Don't miss out on any future blog posts by Subscribing Here
Top comments (0)