Katam Bala

Posted on Feb 15 • Edited on Feb 21

Backpressure, Buffers, and Logging Sidecars

#kubernetes #monitoring #devops #sre

It’s 9 PM. I’m casually scrolling through YouTube when a Slack notification interrupts the quiet: our logging sidecar has crashed — exit code 137.

At first glance, it looked like a simple backpressure issue. Increase a buffer, maybe tweak a limit, redeploy.

It wasn’t.

That crash sent me down an unexpected rabbit hole — not just into configuration tweaks, but into how our logging pipeline actually works under pressure.

What looks like a “simple” logging sidecar (we use Fluent Bit) turns out to behave more like a small distributed system. It buffers. It retries. It manages memory. It makes trade-offs. And when things go wrong, those trade-offs start to matter.

In this post i would like share my experiences and learnings throughout this journey.

The Logging Pipeline:

In our setup, logging sidecar (Fluent Bit) runs as a sidecar alongside each application container. Its responsibility is straightforward: collect logs from the application, process them, and forward them to an external logging platform.

This post uses Fluent Bit as the example, but the underlying lessons are applicable to any sidecar-driven logging architecture

The Crash

Fluent Bit can go out of memory for several reasons. Most of them boil down to imbalance within the pipeline.

Slow outputs — If downstream systems lag while inputs continue ingesting logs, backpressure builds up and memory usage increases.

Heavy filtering or processing — Filters can temporarily increase in-memory footprint.

Unbounded ingestion vs buffer limits — When log volume exceeds configured memory limits, the sidecar becomes the bottleneck.

How did we Fix It? Or Did We?

This is where we went down the rabbit hole - buckle up folks.

One of the tricky things about abrupt container crashes is the lack of visibility. When the sidecar was killed, we lost the very signals we needed to debug it. Our container memory graphs looked stable, which made the crash even more confusing.

Later, we realized we weren’t observing memory in real time.

To validate what was actually happening inside Fluent Bit, we enabled the Memory Input Plugin and inspected internal metrics through logs. That’s when we finally saw it — sudden memory spikes that weren’t visible in our external monitoring.

From there, we started experimenting.

Iteration 1 — Will Limiting Memory Fix It?

Fluent Bit lets you cap memory per input using Mem_Buf_Limit. It seemed like the obvious first fix — limit memory, prevent OOM.

But what happens when that limit is reached?

The input gets paused.

The application, however, doesn’t pause. It continues emitting logs. If those logs are being streamed in real time rather than written to a durable store, they get dropped.

In our case, logs were forwarded via the forward input plugin. There was no upstream persistence layer. Pausing the input meant losing logs.

The memory pressure would be controlled — but at the cost of reliability.

Iteration 2 — Can Disk Buffering Save Us?

If limiting memory risked dropping logs, the next logical step was to reduce memory pressure without sacrificing durability.

Fluent Bit supports filesystem buffering (storage.type filesystem), allowing chunks to be written to disk once memory thresholds are reached. In theory, this shifts the pressure from RAM to disk and prevents abrupt OOM kills.

This helped — but not in the way we initially expected.

The sidecar container still crashed with OOM error.

Iteration 3: Where Else Is Memory Going?

We eventually realized that limiting memory at the input level wasn’t enough. Inputs aren’t the only components consuming RAM — filters and outputs do too.

Increasing the container’s CPU and memory was the obvious workaround. But sidecars are supposed to be lightweight. Scaling them blindly defeats the purpose.

Filters were our next suspect. Parsers like JSON and Multiline temporarily hold records in memory during processing. In our case, the JSON parser turned out to be a major contributor. Since the application emits structured logs, they needed to be parsed before being forwarded.

Unlike some buffering parameters, parser memory isn’t something you can finely tune. So we made a trade-off — we moved part of the parsing responsibility downstream.

Outputs introduced another layer of memory usage. Before logs are sent, they may be compressed, retried, or reloaded from disk buffers. Even with filesystem buffering enabled, chunks need to be brought back into memory before being flushed. If this loading isn’t controlled, memory usage can spike again.

Fluent Bit exposes parameters to limit how much backlog is loaded into memory — and that became the next lever to tune.

Things remained stable for a while.

We were hit with another crash — but this time it wasn’t exit code 137 (OOM). The container exited with 139.

A segmentation fault.

That’s when we realized memory pressure wasn’t the only problem we were dealing with.

On digging into the container logs, we found this:

`[2019/01/09 17:06:01] [error] [plugins/in_forward/forward_fs.c:218 errno=28] No space left on device
[2019/01/09 17:06:01] [error] [in_forward] could not register file into fs_events`

The disk had run out of space.

That was my exact reaction: What the f*?* 🤯 How does a logging sidecar exhaust disk?

On ECS Fargate, the default disk storage allocation is 20GB — which should be more than sufficient for buffering.

So we looked deeper.

When we compared input ingestion metrics with output flush metrics, the pattern became obvious: ingestion rate was significantly higher than flush rate. And that makes sense — memory and disk writes are always faster than network calls.

Here’s what was happening:

Memory buffer fills up.

Chunks spill over to disk.

Outputs load chunks from disk and attempt to flush.

A sudden spike in ingestion occurs.

Flush rate remains steady (or capped).

Disk usage grows faster than it can be drained.

Given enough time under sustained imbalance, filling the disk is not only possible — it’s inevitable.

Enforcing Limits

Rate limiting can be applied at a few key layers:

Application

Sampling
Deduplication
Strict production log levels

Fluent Bit

Cap disk usage per output
Drop oldest chunks when limits are hit
Apply throttle filter for burst control

Operational

Alert when ingestion consistently exceeds processing
Increase ephemeral storage (short-term buffer, not a fix)

Final Thoughts

This wasn’t a memory issue.
It wasn’t a mysterious crash.

It was physics.

When you generate data faster than you can move it, pressure builds.
And pressure always escapes somewhere.

In our case — it escaped through disk exhaustion.

Logging is infrastructure. It deserves guardrails.

If you don’t design for bursts, bursts will design your outage.

Curious how you’d approach it differently.

DEV Community