Raj Kundalia

Posted on Feb 22

From println to Production Logging: Internals and Performance Across Languages and the OS

#logging #java

If you do not want to read the article, it is A-OK:

I got interested in logging — and because now we have LLM at our fingertips for asking questions, I decided to form a question bank first:

How are loggers implemented in different languages or in OS's?
How efficient is logging in different OS?
How much overhead does loggers bring?
How are they efficiently implemented?
How much of a difference is there between sys out vs. writing to a file vs. a logger vs. streaming logs in terms of efficiency and performance? Can we measure this? Compare similar methods for other languages.
How does logger get information that it is coming from this file? What is the mechanism for this in different languages? — Very important question
What part of logging filters is based on log level?
The first thing the logger does is compare the message's level integer against its own threshold integer; if the message level is lower, it returns immediately and nothing else runs. Is this based on configuration?
Which is the most efficient language to write loggers in that would still be usable in other languages — or does something like this not make sense?
Why are markers used in logging? What does it solve that we cannot already solve without them? I know Java contains Markers, but do other languages contain them?
When I provide a lower log level while writing loggers but keep a higher log level in the configuration, does it create a performance impact? (e.g., having many Debug and Trace loggers while the log level is kept at Info).
In Java, are the placeholders in the loggers — such as Request was successful user={}, userId—concatenations, or is some other mechanism used for them?

If you do not want to read the article, you can skip it and use this question bank to form your own understanding.

GitHub - rajkundalia/logger-internals-java: A Java logging library built from scratch - exploring async handlers, structured fields, granular caller info…

TL;DR

We all assume a disabled log call costs nothing. It doesn't — the level check is cheap, but any string you constructed before passing it to the logger is already gone, whether the log fires or not.
Every time you see a class name and line number in a log output, something paid for that. In Java, when caller info is enabled, it's a runtime stack walk. In C and Rust, it was resolved at compile time and costs nothing at runtime. Most engineers have never had reason to think about the difference.
logger.info("User {}", user) is not just cleaner syntax. It's a different evaluation model — the string is only built if the log actually fires. "User " + user is evaluated before the logger even sees it.
Async logging feels like a free upgrade. It isn't. It changes what you can trust about your logs when something crashes — and the logs you lose are exactly the ones you needed.
In Rust and C/C++, a disabled log call can be removed from the binary entirely at compile time. In Java and Python, it always exists at runtime, even if it does nothing. The language made this choice.
Go and C logging stacks sit closer to the OS than JVM-based logging stacks. There are fewer layers between the log call and the syscall. That distance has a cost, and it compounds under load.

Thanks to LLMs I could create this: https://github.com/rajkundalia/logger-internals-java

Why Logging Is Not Just Printing

Most of us haven't considered how much happens between our code calling logger.info(...) and that string reaching disk: a level check, a formatter, a handler with its own buffering strategy, a lock or queue depending on sync versus async mode, a syscall into the kernel, and sometimes a second system — syslog, journald — that takes over from there. At scale, that pipeline has real cost. String formatting allocates. Synchronous file writes add latency to every thread that logs. A slow disk creates backpressure that stalls application threads. And in a distributed system where logs are your only audit trail, how that pipeline behaves during a crash is not an edge case — it is a design constraint you either chose or inherited without knowing it. None of that is obvious from a println.

The Pipeline: What a Logger Actually Does

Before pulling any of this apart, it helps to see the whole shape at once:

Application
    ↓
Logger
    ↓
Level Filter
    ↓
Formatter
    ↓
Appender / Handler
    ↓
Operating System
    ↓
Disk / Stream

The application emits a log event with a level, message, and arguments. The logger checks whether the configured threshold allows the event through. If it passes, the formatter constructs the final string — interpolating placeholders, appending timestamps, resolving caller location. The appender or handler takes that string and writes it somewhere: a file, stdout, a socket, a rolling buffer. That write becomes a system call, handing control to the OS, which manages buffering and flush behavior before data actually hits disk. Each stage has cost. Each stage is a place where things can go wrong or get optimized. The rest of this post is about what happens at each one.

Log Level Filtering Internals

Here's something that seems obvious until you think about it: a DEBUG log call in a hot loop, in a production service configured at INFO, runs on every single iteration. It doesn't log anything — but it doesn't disappear either.

The level check itself is cheap. Each level maps to an integer, and the check is a comparison — INFO against whatever the event's level is, early return if it doesn't pass. No formatting, no allocation, no appender invocation. In Logback, higher integers map to higher severity — TRACE is 5000, ERROR is 40000. java.util.logging follows the same direction but uses a different numeric scale and different level names: FINE is 500, SEVERE is 1000. The ordering is not inverted — the scales and names just don't align. Either way, the comparison is fast.

What I found more interesting is where in the pipeline the check actually happens. I assumed there was one gate. There are often several. In Java's SLF4J backed by Logback, the logger checks first — that's the fast path. But appenders can have their own filter chains, meaning an event can clear the logger-level check and still be dropped downstream. This is deliberate and useful: you can send WARN and above to a file, ERROR and above to an alert sink, and everything to stdout, all from the same pipeline. But it means filtering is not a single decision — it's a sequence of decisions, each adding a small amount of overhead to events that reach it.

The real cost isn't the check. It's everything you did before the call site. If you constructed a string before passing it to the logger, that work happened regardless of whether the log fires. Which is exactly why placeholder syntax exists, and why it's not just a style preference.

How a Logger Knows Where It Came From

You've probably never thought about how a log line knows it came from UserService.java:142. It just appears. What's actually happening underneath varies so much across languages that it's worth making explicit — because the cost difference is not small.

In Java, two approaches exist. The older one constructs a Throwable and extracts the stack trace — the JVM walks the call stack and allocates an array of frame objects. The newer approach, StackWalker introduced in Java 9, is lazy and stream-based: you only materialize the frames you actually need. Both are runtime operations with real cost, which is why caller location logging is configurable in most Java frameworks and off by default in many Logback configurations. You can see how this plays out in the reference implementation at https://github.com/rajkundalia/logger-internals-java.

Python captures caller information as part of LogRecord creation, inside _log(), which is only reached after the level check passes. The depth of that inspection — whether stack info is captured, whether additional frame walking occurs — depends on configuration and what the formatter requests. The cost is not paid on every call, but it is paid at record creation time, not at formatting time.

Go makes this explicit. runtime.Caller(skip int) returns the file, line, and function name when you ask for it. It's a runtime operation, but controlled — you call it when you need it, rather than it being woven into every log record automatically.

C and C++ sidestep runtime cost entirely. __FILE__ and __LINE__ are preprocessor macros, expanded at compile time. By the time the binary runs, those values are string literals and integers baked into the executable. No stack walking, no frame introspection, nothing.

Rust takes the same approach through the log crate's macro system. log::info!("...") expands at compile time to include the module path and line number as constants. The binary contains no machinery for discovering caller location — it was resolved before the program ran.

The gap between compile-time resolution and runtime stack walking is the kind of thing that's invisible until you're logging at high volume. C/C++ and Rust pay nothing. Java pays on every logged event where caller info is enabled. Go pays when you ask. Most engineers pick a logging framework without knowing which of these models they've signed up for.

Placeholders vs String Concatenation

These two lines look similar. They are not:

// Eager: string is built before the logger is invoked
logger.info("Connected user: " + user.toString());

// Lazy: string is only built if the level check passes
logger.info("Connected user: {}", user);

In the first version, the JVM evaluates user.toString() and concatenates the string before the logger receives anything. If the level check drops the event — which it will, for any DEBUG or TRACE call in a production service configured at INFO — that allocation and work was wasted. At low log volumes this is invisible. Scattered through hot paths at high throughput, it accumulates.

In the second version, user is passed as an object reference. The logger receives the raw argument. Only if the event clears the level filter does the formatter resolve the placeholder and build the final string. toString() is never called otherwise, and no intermediate string is allocated.

This only matters because of how filtering works — specifically the early return discussed in the filtering section. The two design choices reinforce each other: a cheap level check creates the condition under which deferred string construction delivers its benefit. If logging were unconditional, the distinction wouldn't save anything.

OS Interaction: Where Language Logging Ends and the OS Begins

There's a boundary in every logging pipeline that most application engineers have never had reason to think about: the point where your code hands a string to the OS and stops being in control of what happens next.

When an appender writes to a file, it eventually calls write() — a system call. Everything above that boundary is the language runtime: string formatting, in-memory buffering, lock acquisition. Everything below it is the kernel: its own buffers, filesystem cache, eventual persistence to disk. Crossing that boundary involves a context switch from user space to kernel space. It's not free, and it happens on every unbuffered write.

This is why buffered I/O matters. Rather than one write() per log line, most production logging configurations accumulate output in memory and flush periodically or when the buffer is full. Fewer syscalls, higher throughput. The trade-off: a crash can lose whatever is buffered and not yet flushed. You are always choosing between durability and throughput at that boundary, whether you know it or not.

The OS also offers its own logging infrastructure — syslog on POSIX systems, journald on Linux. These are daemons that accept log messages via a socket and handle buffering, rotation, and persistence outside your application entirely. The boundary shifts: your application writes to a socket, and the daemon takes responsibility for the rest. Structured fields are first-class in journald. Log rotation is not your problem. The cost is IPC (Inter-process communication) overhead — a socket write instead of a local file write.

Go and C-adjacent logging stacks sit naturally close to this boundary. Go's os.File.Write is a thin wrapper over write() with minimal overhead between your code and the syscall. JVM logging absolutely works at scale — but it involves more layers: GC-managed heap allocations, object creation for log events, the JVM's own I/O abstraction. Those layers add up under load.

Synchronous vs Asynchronous Logging

At some point, most engineers configure async logging and move on. Throughput goes up, latency on application threads drops, and nothing seems worse. It feels like a free upgrade.

Here's what actually changed: you no longer have a guarantee that a log line you wrote ever reached disk.

Synchronous logging blocks the calling thread until the write completes. The appender acquires a lock, formats the string, calls write(), releases the lock. Every log call has latency. Under high write volume to a slow disk, this becomes a bottleneck that shows up on every application thread that logs.

Async logging breaks this coupling. Your thread drops an event into a queue and returns immediately. A dedicated logging thread drains the queue, formats events, and writes to the appender. Throughput increases because writes get batched. Thread latency drops to the cost of a queue insertion. This sounds like a strict improvement. It is not.

The queue is bounded. Under sustained high load it fills up. At that point the framework has a decision to make: block the calling thread, drop the event, or expand the queue. Many async logging implementations are configured to drop lower-severity events under pressure unless explicitly set to block — Logback's AsyncAppender, for instance, starts discarding TRACE, DEBUG, and INFO events when the queue reaches 80% capacity by default, while WARN and ERROR are retained. Which means under the conditions where your system is most stressed, in the moments just before something breaks, you may be losing the exact log lines that would have told you why.

The crash case is worse. Events sitting in the queue when the application crashes never reach the appender. Your crash logs — the ones you needed most — may not exist.

Async logging is worth using. It is the right choice in many high-throughput systems. But it is an architectural decision about what you are willing to lose and when. Using it without understanding the failure contract means you have made that trade without knowing it.

Compile-Time vs Runtime Filtering

Something I hadn't considered when I started this: in Java, Python, and Go, a disabled log call still exists in the binary. In Java and Python this is unambiguous — the level check runs on every call. Go's compiler is more aggressive about in-lining and dead code elimination, so the picture is less clear-cut and depends on the logging library and how it's implemented. But in none of these languages can the call be eliminated entirely at compile time the way it can in Rust or C/C++.

Take a TRACE call inside a hot loop in a Java service configured at INFO. On every iteration, the JVM executes an integer comparison and branches. The call is suppressed, but it was visited. At high enough frequency, that cost appears.

In Rust and C/C++, this can be eliminated entirely. A trace!() macro in Rust, conditioned on a compile-time feature flag, is removed by the compiler if tracing is disabled at build time. The instruction does not exist in the binary. There is no branch, no comparison, no overhead of any kind. The code was removed before the program ran.

The trade-off is operational flexibility. A Java application can change its log level at runtime — attach to a running JVM, set the Logback threshold to TRACE, watch debug output appear without a restart. A C binary compiled with TRACE disabled cannot do this. The capability is gone. You traded dynamic observability for zero runtime cost.

Which is right depends on context. A long-running service that needs live level adjustment values the runtime flexibility. A systems program where every cycle matters may prefer compile-time elimination. Most languages make this choice implicitly, as part of how their logging ecosystem is designed. It is worth knowing which choice your language made for you.

Cross-Language Comparison Table

Language	Caller Detection	Filter Type	Async Ecosystem	Compile-time Elimination
Java	StackWalker / Throwable	Runtime	Logback AsyncAppender	No
Go	runtime.Caller	Runtime	zap, zerolog (non-block)	No
Python	currentframe / LogRecord	Runtime	QueueHandler	No
C/C++	FILE, LINE macros	Runtime / Compile	spdlog async mode	Yes (preprocessor)
Rust	Compile-time macro expansion	Runtime / Compile	tracing crate	Yes (feature flags)

Markers in Java/SLF4J — A Brief Callout

Log levels give you one axis for filtering: severity. But severity alone can't answer a question like "show me all security-related events, regardless of level." That's what Markers solve. In SLF4J, a Marker is a named tag attached to a log event — SECURITY, AUDIT, BILLING — that appenders can filter on independently of level. You can route all AUDIT-marked events to a dedicated file while dropping untagged DEBUG events entirely. It's multi-dimensional filtering: level is one axis, marker is another. Other ecosystems approximate this — Go's zap uses structured fields, Python's logging has Filter objects that can inspect arbitrary LogRecord attributes — but SLF4J Markers are one of the cleaner formulations of the idea, and they're underused in codebases that reach for custom log levels when what they actually need is a second axis.

What Surprised Me

We all assume async logging was a performance upgrade with no real downside. It's a trade — lower latency on application threads in exchange for weaker guarantees about what survives a crash. That trade is often worth making. It's not invisible.

I didn't expect caller detection to have such variance across languages. The gap between __FILE__ resolved at compile time and StackWalker walking the call stack at runtime is not a footnote — it's an architectural difference that shows up under load, and most engineers pick a logging framework without knowing which model they've chosen.

Filtering being a pipeline of gates, not a single check, was more nuanced than I expected. I assumed one threshold, one decision. In practice, logger-level filters and appender-level filters can conflict, and events can be dropped at multiple points for different reasons.

The syscall boundary reframed how I think about logging performance. Everything above it is yours — allocations, formatting, buffering. Everything below it is the kernel's. Understanding where that boundary sits, and how often you cross it, makes the buffering trade-offs obvious in a way they weren't before.

Compile-time log elimination felt genuinely strange when I first understood it. The log crate in Rust doesn't just suppress a call when a level is disabled — the code is removed entirely from the binary by the compiler. That's a fundamentally different model from anything Java or Python offer, and it matters in contexts where it matters.

Markers are really interesting. The logs that are easiest to reason about in production are the ones where someone thought carefully about how to filter them — not just what level to assign, but what category they belong to. It's a small design decision that compounds over time.