Kubernetes Logging Architecture in 2025: Fluent Bit vs Vector vs Logstash (With Real Configs)
After working with 50+ Kubernetes clusters in production, I've seen teams make the same architectural mistakes with logging. The wrong choice at the collector layer costs you 3x more in compute and 10x more in operational pain.
This post breaks down the three main collectors I've deployed in anger, with real config snippets and the gotchas nobody documents.
The Three-Layer Problem
Kubernetes logging has three distinct concerns that teams conflate:
- Collection — Reading from container stdout/stderr (CRI-O or containerd format)
- Processing — Parsing, filtering, enriching (adding pod labels, stripping noise)
- Shipping — Sending to your aggregator (Elasticsearch, Loki, Datadog, Grafana Cloud)
Pick the wrong tool at layer 1 and you'll be fighting cardinality explosions and dropped multiline stacktraces for months.
Fluent Bit: The Right Default
Fluent Bit is written in C, uses ~10MB RAM per node, and handles the CRI-O format correctly out of the box. If you're starting fresh, this is your answer.
[INPUT]
Name tail
Path /var/log/containers/*.log
multiline.parser cri
Tag kube.*
Mem_Buf_Limit 50MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
[OUTPUT]
Name loki
Match *
Host loki.monitoring.svc.cluster.local
Port 3100
Labels job=fluent-bit
The critical gotcha: multiline.parser cri handles the [FP] flags in CRI-O logs. Without this, multiline Java stacktraces get split into hundreds of single-line log entries, and your alerting on Exception patterns fires constantly on partial lines.
Vector: When You Need ETL-Level Processing
Vector (by Datadog, open source) is the right choice when you need complex transformations — routing different log streams to different destinations, applying VRL transforms, or doing real-time aggregation before shipping.
[sources.kubernetes_logs]
type = "kubernetes_logs"
auto_partial_merge = true
[transforms.parse_nginx]
type = "remap"
inputs = ["kubernetes_logs"]
source = '''
if exists(.kubernetes.pod_labels."app") && .kubernetes.pod_labels."app" == "nginx" {
. = merge(., parse_nginx_log!(.message, "combined"))
}
'''
[transforms.add_environment]
type = "remap"
inputs = ["parse_nginx"]
source = '''
.environment = get_env_var!("ENVIRONMENT")
.cluster_name = get_env_var!("CLUSTER_NAME")
'''
[sinks.loki]
type = "loki"
inputs = ["add_environment"]
endpoint = "http://loki:3100"
encoding.codec = "json"
labels.app = "{{ kubernetes.pod_labels.app }}"
labels.namespace = "{{ kubernetes.pod_namespace }}"
Where Vector shines: cardinality control. You can hash or drop high-cardinality fields (user IDs, session tokens, request IDs) before they hit Loki/Elasticsearch. I've seen Loki clusters go from 500GB/day to 50GB/day after adding a Vector transform that strips request IDs from labels.
Where Vector struggles: the VRL language has a learning curve, and error handling is verbose. If your transforms error at runtime, Vector drops the event silently unless you explicitly route errors.
Logstash: The Legacy Default (Use With Caution)
Logstash still dominates in Elasticsearch-first shops. It's battle-tested, has 200+ input/output plugins, and the grok patterns are well-documented. But it runs on the JVM and uses 500MB-1GB RAM per instance — 50x more than Fluent Bit.
input {
beats {
port => 5044
}
}
filter {
if [kubernetes][container][name] =~ /nginx/ {
grok {
match => {
"message" => [
'%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status} %{NUMBER:body_bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"',
# IPv6 variant
'%{IPV6:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] "%{WORD:method} %{DATA:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status} %{NUMBER:body_bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"'
]
}
}
}
mutate {
add_field => { "cluster" => "${CLUSTER_NAME}" }
remove_field => ["agent", "ecs", "input"]
}
}
output {
elasticsearch {
hosts => ["https://elasticsearch:9200"]
index => "kubernetes-logs-%{+YYYY.MM.dd}"
user => "${ES_USER}"
password => "${ES_PASSWORD}"
}
}
The IPv6 gotcha: naive grok patterns only match %{IPORHOST} which handles IPv4 and hostnames but not bare IPv6 addresses like [2001:db8::1]. You need a separate pattern or a conditional match. This is the #1 reason log lines silently drop in nginx access log pipelines.
Choosing the Right Architecture
| Criteria | Fluent Bit | Vector | Logstash |
|---|---|---|---|
| RAM per node | ~10MB | ~50MB | 500MB-1GB |
| CRI-O multiline | Native | Native | Manual |
| Transform power | Lua scripts | VRL (powerful) | Ruby/Grok |
| Plugin ecosystem | Good | Growing | Excellent |
| Debugging | Hard | OK | Good |
| Best for | Most K8s setups | Complex routing | Elastic stack |
The Architecture I'd Deploy Today
For a 20-50 node cluster:
- Fluent Bit as DaemonSet collector on every node (handles CRI-O, low overhead)
- Vector as a middle aggregator (deployed as Deployment, 2-3 replicas) for transforms, enrichment, routing
- Loki as the primary store (much cheaper than Elasticsearch for log retention)
- Grafana for querying and alerting
This "fan-in" architecture means your Fluent Bit configs stay simple (just collect and forward to Vector), while Vector handles all the complex logic in one place. When you need to change a parser, you update one Vector config instead of a DaemonSet rollout.
The Patterns That Actually Trip Teams Up
I've compiled 50+ production-tested regex patterns and complete configs for every layer of this stack — CRI-O, containerd, kubelet, Nginx (with IPv6), Spring Boot, Go, Node.js — plus the multiline handling rules that prevent stacktrace mangling.
If you're building or migrating a logging stack, the Kubernetes Logging Architecture Guide covers this in depth with case studies from real migrations (Docker → containerd, ELK → Loki, Logstash → Vector).
Also worth bookmarking: the Production Log Parsing Pack — 50+ copy-paste regex patterns for the formats listed above, tested across 50+ clusters.
Questions? Drop them in the comments — happy to dig into specific edge cases.
James Rivers — DevOps/SRE consultant specialising in observability stacks
Top comments (0)