Ahmed M Raihan

Posted on Sep 6, 2020

Elasticsearch: Use ELK Stack to analyze logs for Feature Usage Insights

#elasticsearch #elk #loganalytics

Excerpt

Once you implement a feature into your application, you need to get an insight on feature usage. In this post, my intention is to guide you through the steps for achieving this using ELK (Elasticsearch, Logstash, Kibana) Stack.

Planning: Data Preparation

Regardless of underlying platform, we should have curated data ready on our hand when we want to analyze. Of the biggest challenge in Data Engineering (DE) or Data Science (DS) is to curate data from raw/source data.
For example, assume an analytical question is: Which browser is most popular among our users?. To analyze, you have 2 option to consider:

Raw NGinx log which looks like this: 123.123.123.123 - - [21/Jul/2007:16:37:24 +0100] "GET /feature1/a2e1a325-938a-41e2-a0ef-517330f398b1.html HTTP/1.1" 404 999 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20090907 Firefox/60.0".
Parsed and indexed log into Elasticsearch which looks like this: {"Browser": "Firefox 60.0", ", "AccessDate": "21/Jul/2007:16:37:24 +0100" }

The parsed log is better. You can write SQL on it and get desired result of the analytical question. However, it is worth to point out that parsed log cannot be created magically. For this we have take some additional steps.

Planning: Simulation

We will simulate real world usage via Artillery.io scenario. If you are not familiar with it, no issue. Just look into the configuration yml file and it is self-explanatory.
First, we will call the scenario with 100 users with 80/20 split between Feature1 and Feature2. Then, we will replicate this with 50/50 split. So, our end result should show that Feature1 is used ~65% times and Feature2 is used ~35%.

Setup preparation - Application Server

Docker is my preferred way to work. For this post, there is no necessity to setup an application to demo. We will just setup Nginx and call http://example.com/feature1 and http://example.com/feature2 urls to mimic actual call to Feature1 and Feature2.

Setup preparation - Docker images

We will use Docker for preparing environment. In the final setup, we will run everything in docker-compose so that 1 container can communicate with another. However, before going there, we will manually pull the images to speed up the process. You will get the elasticsearch docker images here and nginx image in Dockerhub.

Setup preparation - Backend configuration

Let us mention technologies and major configuration(s) required:

Nginx: Where to write files? We are not routing anything for this post, so primary configuration can be deferred.
Elasticsearch: Since we are serving from single-server in this post, our configuration has to reflect that
Logstash: We need to configure input location, grok pattern for extracting information from log. Example from here:

input {
  file {
    type => "nginx"
    start_position => "beginning"
    path => [ "/var/log/nginx/*.log" ]
  }
}

filter {
  if [type] == "nginx" {
    grok {
      patterns_dir => "/etc/logstash/patterns"
      match => { "message" => "%{IPORHOST:remote_addr} - %{USERNAME:remote_user} \[%{HTTPDATE:time_local}\] \"%{DATA:request}\" %{INT:status} %{NUMBER:bytes_sent} \"%{DATA:http_referer}\" \"%{DATA:http_user_agent}\"" }
      remove_tag => [ "_grokparsefailure" ]
      add_tag => [ "nginx_access" ]
    }

    geoip {
      source => "remote_addr"
    }
  }
}
output {
   elasticsearch { 
      hosts => ["elasticsearch:9200"] 
   }
}

Kibana: Elasticsearch location has to be mentioned in configuration

Initiate setup via docker-compose

The docker compose file with required technologies is as followed:

version: "3"

services:
  elasticsearch:
    build: ./elk_elasticsearch
    ports:
      - "9200:9200"
      - "9300:9300" 
    environment:
      - "discovery.type=single-node" 
      - "http.cors.enabled=true" 
      - "http.cors.allow-origin=*" 
      - "http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization"
      - "http.cors.allow-credentials=true" 

  logstash:
    build: ./elk_logstash
    volumes:
      - ./cmn/:/trylogs
    ports:
      - "9600:9600"

  kibana:
    build: ./elk_kibana
    environment:
      ELASTICSEARCH_HOSTS: http://elasticsearch:9200
    ports:
      - "5601:5601"

Test environment setup

I prefer to test with actual domain name instead of localhost. So, please open your hosts file and add the following:
127.0.0.1 http://url.local

Go to http://url.local:80/ --> Nginx test page should appear
Go to http://url.local:9200/ --> Default page from Elasticsearch should appear
Go to http://url.local:5601/ --> Kibana dashboard should appear

Afterwards, setup test-elk.yml file with the following content:

config:
  target: http://url.local/
  phases:
    - duration: 300
      arrivalRate: 10
  defaults:
    headers:
      x-api-key: "{{ $processEnvironment.SERVICE_API_KEY }}"
scenarios:
  - flow:
      - get:
          url: "/?param1=valu1"
      - get:
          url: "/?param2=valu2"

Run simulation and observe Kibana dashboard

First, start artillery.io simulation using command artillery run test-elk.yml. The logstash configuration should be picking logs automatically and send to Elasticsearch. Lastly, Open kibana and create index pattern for the logs. Lastly, generate a dashboard (i.e. pie chart) like this:

Conclusion and Homework

If you have read so far, thank you :-) Now it is time for you to try additional things as homework to validate that you have understood the concept and ok with deployment approach.
I would recommend following tasks:

In elasticsearch, I have collected only browser version. Think of other information (e.g. geocode) from the log and enhance
Within Logstash, I have inserted the pattern inline. Try to use Grok debugger for additional usage.
I have used pie chart as Kibana visualization object. What if I want to see a dashboard with multiple visualization for multiple analytics questions? Design this.

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

DEV Community

Elasticsearch: Use ELK Stack to analyze logs for Feature Usage Insights

Excerpt

Planning: Data Preparation

Planning: Simulation

Setup preparation - Application Server

Setup preparation - Docker images

Setup preparation - Backend configuration

Initiate setup via docker-compose

Test environment setup

Run simulation and observe Kibana dashboard

Conclusion and Homework

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Top comments (0)

Read next

Compensating Open Source Maintainers: A Necessary Shift

Elevate Your Software: The Importance of Coding Best Practices

Navigating Contributor License Agreements (CLAs) in Open Source

The Ethical Imperative: Coding Ethical Practices in Software Development