James Rivers

Posted on May 18

Kubernetes Logging Architecture in 2025: Fluent Bit vs Vector vs Logstash (With Real Configs)

#kubernetes #devops #logging #sre

Kubernetes Logging Architecture in 2025: Fluent Bit vs Vector vs Logstash (With Real Configs)

After working with 50+ Kubernetes clusters in production, I've seen teams make the same architectural mistakes with logging. The wrong choice at the collector layer costs you 3x more in compute and 10x more in operational pain.

This post breaks down the three main collectors I've deployed in anger, with real config snippets and the gotchas nobody documents.

The Three-Layer Problem

Kubernetes logging has three distinct concerns that teams conflate:

Collection — Reading from container stdout/stderr (CRI-O or containerd format)
Processing — Parsing, filtering, enriching (adding pod labels, stripping noise)
Shipping — Sending to your aggregator (Elasticsearch, Loki, Datadog, Grafana Cloud)

Pick the wrong tool at layer 1 and you'll be fighting cardinality explosions and dropped multiline stacktraces for months.

Fluent Bit: The Right Default

Fluent Bit is written in C, uses ~10MB RAM per node, and handles the CRI-O format correctly out of the box. If you're starting fresh, this is your answer.

[INPUT]
    Name              tail
    Path              /var/log/containers/*.log
    multiline.parser  cri
    Tag               kube.*
    Mem_Buf_Limit     50MB
    Skip_Long_Lines   On

[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Merge_Log           On
    Keep_Log            Off
    K8S-Logging.Parser  On
    K8S-Logging.Exclude On

[OUTPUT]
    Name  loki
    Match *
    Host  loki.monitoring.svc.cluster.local
    Port  3100
    Labels job=fluent-bit

The critical gotcha: multiline.parser cri handles the [FP] flags in CRI-O logs. Without this, multiline Java stacktraces get split into hundreds of single-line log entries, and your alerting on Exception patterns fires constantly on partial lines.

Vector: When You Need ETL-Level Processing

Vector (by Datadog, open source) is the right choice when you need complex transformations — routing different log streams to different destinations, applying VRL transforms, or doing real-time aggregation before shipping.

[sources.kubernetes_logs]
type = "kubernetes_logs"
auto_partial_merge = true

[transforms.parse_nginx]
type = "remap"
inputs = ["kubernetes_logs"]
source = '''
  if exists(.kubernetes.pod_labels."app") && .kubernetes.pod_labels."app" == "nginx" {
    . = merge(., parse_nginx_log!(.message, "combined"))
  }
'''

[transforms.add_environment]
type = "remap"
inputs = ["parse_nginx"]
source = '''
  .environment = get_env_var!("ENVIRONMENT")
  .cluster_name = get_env_var!("CLUSTER_NAME")
'''

[sinks.loki]
type = "loki"
inputs = ["add_environment"]
endpoint = "http://loki:3100"
encoding.codec = "json"
labels.app = "{{ kubernetes.pod_labels.app }}"
labels.namespace = "{{ kubernetes.pod_namespace }}"

Where Vector shines: cardinality control. You can hash or drop high-cardinality fields (user IDs, session tokens, request IDs) before they hit Loki/Elasticsearch. I've seen Loki clusters go from 500GB/day to 50GB/day after adding a Vector transform that strips request IDs from labels.

Where Vector struggles: the VRL language has a learning curve, and error handling is verbose. If your transforms error at runtime, Vector drops the event silently unless you explicitly route errors.

Logstash: The Legacy Default (Use With Caution)

Logstash still dominates in Elasticsearch-first shops. It's battle-tested, has 200+ input/output plugins, and the grok patterns are well-documented. But it runs on the JVM and uses 500MB-1GB RAM per instance — 50x more than Fluent Bit.

input {
  beats {
    port => 5044
  }
}

filter {
  if [kubernetes][container][name] =~ /nginx/ {
    grok {
      match => {
        "message" => [
          '%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status} %{NUMBER:body_bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"',
          # IPv6 variant
          '%{IPV6:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status} %{NUMBER:body_bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"'
        ]
      }
    }
  }

  mutate {
    add_field => { "cluster" => "${CLUSTER_NAME}" }
    remove_field => ["agent", "ecs", "input"]
  }
}

output {
  elasticsearch {
    hosts => ["https://elasticsearch:9200"]
    index => "kubernetes-logs-%{+YYYY.MM.dd}"
    user => "${ES_USER}"
    password => "${ES_PASSWORD}"
  }
}

The IPv6 gotcha: naive grok patterns only match %{IPORHOST} which handles IPv4 and hostnames but not bare IPv6 addresses like [2001:db8::1]. You need a separate pattern or a conditional match. This is the #1 reason log lines silently drop in nginx access log pipelines.

Choosing the Right Architecture

Criteria	Fluent Bit	Vector	Logstash
RAM per node	~10MB	~50MB	500MB-1GB
CRI-O multiline	Native	Native	Manual
Transform power	Lua scripts	VRL (powerful)	Ruby/Grok
Plugin ecosystem	Good	Growing	Excellent
Debugging	Hard	OK	Good
Best for	Most K8s setups	Complex routing	Elastic stack

The Architecture I'd Deploy Today

For a 20-50 node cluster:

Fluent Bit as DaemonSet collector on every node (handles CRI-O, low overhead)
Vector as a middle aggregator (deployed as Deployment, 2-3 replicas) for transforms, enrichment, routing
Loki as the primary store (much cheaper than Elasticsearch for log retention)
Grafana for querying and alerting

This "fan-in" architecture means your Fluent Bit configs stay simple (just collect and forward to Vector), while Vector handles all the complex logic in one place. When you need to change a parser, you update one Vector config instead of a DaemonSet rollout.

The Patterns That Actually Trip Teams Up

I've compiled 50+ production-tested regex patterns and complete configs for every layer of this stack — CRI-O, containerd, kubelet, Nginx (with IPv6), Spring Boot, Go, Node.js — plus the multiline handling rules that prevent stacktrace mangling.

If you're building or migrating a logging stack, the Kubernetes Logging Architecture Guide covers this in depth with case studies from real migrations (Docker → containerd, ELK → Loki, Logstash → Vector).

Also worth bookmarking: the Production Log Parsing Pack — 50+ copy-paste regex patterns for the formats listed above, tested across 50+ clusters.

Questions? Drop them in the comments — happy to dig into specific edge cases.

James Rivers — DevOps/SRE consultant specialising in observability stacks

DEV Community

Kubernetes Logging Architecture in 2025: Fluent Bit vs Vector vs Logstash (With Real Configs)

Kubernetes Logging Architecture in 2025: Fluent Bit vs Vector vs Logstash (With Real Configs)

The Three-Layer Problem

Fluent Bit: The Right Default

Vector: When You Need ETL-Level Processing

Logstash: The Legacy Default (Use With Caution)

Choosing the Right Architecture

The Architecture I'd Deploy Today

The Patterns That Actually Trip Teams Up

Top comments (0)