DEV Community

Siddharth Pogul
Siddharth Pogul

Posted on

Building rapidlog: Why I Made a 3x Faster Python Logger

The Problem Nobody Talks About

You're building a production Python application. Everything's humming: API handlers fast, database queries optimized, caching in place.
Then you flip on logging.

Suddenly, throughput tanks. CPU spikes. Latency goes haywire.
Why? Your app logs from 4 worker threads. Each thread competes for the same lock in logging.Handler. While one thread serializes JSON, the others sit spinning. You're not bottlenecked on I/O anymore - you're bottlenecked on a single lock.
This is why I built rapidlog.

The Benchmark That Started It All

Here's what I found comparing Python's stdlib logging against other libraries (all with identical JSON output, 4 threads, 100K logs per thread):

Library Throughput vs stdlib
rapidlog 20,133 logs/sec 3.1x faster
structlog 12,101 logs/sec 1.86x faster
stdlib-json 6,487 logs/sec baseline
loguru 3,248 logs/sec 0.50x (slower!)

That 3.1x difference? In production, that's 13.6K extra events per second you can handle without scaling up servers.
For a company logging 100M events/day across 4 worker threads, that's the difference between 10 servers and 3 servers.

Why Is stdlib So Slow Under Load?

Let's trace through what happens when you call logger.info():
With stdlib logging:

Thread 1: logger.info() → acquire lock
Thread 2: logger.info() → WAIT (lock held by Thread 1)
Thread 3: logger.info() → WAIT
Thread 4: logger.info() → WAIT
Thread 1: serialize JSON → format record → write to stdout → release lock
Threads 2–4: race to acquire lock (one succeeds, others wait again)
Enter fullscreen mode Exit fullscreen mode

Every single call hits the lock. JSON serialization happens inside the lock. You've made your hot path serialization-bound AND lock-bound.
With rapidlog:

Thread 1: logger.info() → append to thread-local buffer (no lock)
Thread 2: logger.info() → append to thread-local buffer (no lock)
Thread 3: logger.info() → append to thread-local buffer (no lock)
Thread 4: logger.info() → append to thread-local buffer (no lock)
Writer thread: drain all buffers → serialize JSON → write to stdout
Enter fullscreen mode Exit fullscreen mode

The hot path is buffer append only. No locks. No serialization. Then a background thread handles the expensive stuff (JSON, I/O) in batches.

Architecture: Per-Thread Buffers + Async Writer

Here's the design in detail:

Layer 1: Hot Path (Per-Thread)

def _log(self, level: str, msg: str, **kwargs):
    if level not in _LEVELS or _LEVELS[level] < self.level_value:
        return  # Quick exit

    # Append to thread-local buffer (single append, no lock)
    self._thread_local.buffer.append([
        time.time_ns(),
        level,
        msg,
        kwargs,
        threading.current_thread().ident
    ])
Enter fullscreen mode Exit fullscreen mode

No dict creation. No serialization. Just a fast append to a pre-allocated list.

Layer 2: Cross-Thread Handoff (RingQueue)

When the thread-local buffer fills, it flushes to a bounded RingQueue:

class RingQueue:
    def __init__(self, capacity: int):
        self.buffer = [None] * capacity
        self.write_pos = 0
        self.read_pos = 0
        self.lock = threading.Lock()

    def put(self, item):
        with self.lock:
            if self.write_pos - self.read_pos >= self.capacity:
                # Queue full, wait or drop
                return False
            self.buffer[self.write_pos % self.capacity] = item
            self.write_pos += 1
            return True
Enter fullscreen mode Exit fullscreen mode

This is a multi-producer/single-consumer design. Multiple threads append records, one writer thread drains.

Layer 3: Writer Thread (Background)

def _writer_loop(self):
    while self.running:
        batch = self.queue.get_many(batch_size=256, timeout=0.01)

        # Serialize all JSON in writer thread
        output = []
        for record in batch:
            json_str = json.dumps({
                "ts_ns": record[0],
                "level": record[1],
                "msg": record[2],
                **record[3],
                "thread": record[4]
            })
            output.append(json_str)

        # Single I/O operation
        self.stdout.buffer.write(b"\n".join(output) + b"\n")
Enter fullscreen mode Exit fullscreen mode

All JSON serialization happens here, outside the hot path. Batching reduces I/O syscalls.

The Trade-Off: Memory vs Throughput

This design uses memory to buy speed:
Low-memory preset:

  • Buffer size: 2,048 records per thread
  • Peak memory: ~2–4 MiB
  • Use when: Lambda, containers with tight memory

Balanced preset (default):

  • Buffer size: 32,768 records per thread
  • Peak memory: ~5–10 MiB
  • Use when: General-purpose apps

Throughput preset:

  • Buffer size: 131,072 records per thread
  • Peak memory: ~10–20 MiB
  • Use when: High-volume logging (100K+ logs/sec)

This is an intentional trade-off. Standard library makes the opposite choice (minimal memory, multiple locks). rapidlog assumes "memory is cheaper than CPU under load" and bets accordingly.

Code Example: Real-World Usage

Before (stdlib + structlog/python-json-logger)

import logging
from pythonjsonlogger import jsonlogger

# Setup is tedious
handler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(fmt='%(timestamp)s %(level)s %(name)s %(message)s')
handler.setFormatter(formatter)
logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

# API is awkward
@app.post("/users")
def create_user(user_id: int, email: str):
    logger.info("user_create", extra={"fields": {"user_id": user_id, "email": email}})
    # ...
    return {"status": "ok"}
Enter fullscreen mode Exit fullscreen mode

After (rapidlog)

from rapidlog import get_logger

# One line
logger = get_logger(level="INFO")

# Clean API
@app.post("/users")
def create_user(user_id: int, email: str):
    logger.info("user_create", user_id=user_id, email=email)
    # ...
    return {"status": "ok"}
Enter fullscreen mode Exit fullscreen mode

Both output the same JSON:

{"ts_ns": 1739462130123456789, "level": "INFO", "msg": "user_create", "user_id": 123, "email": "bob@example.com", "thread": 12345}
Enter fullscreen mode Exit fullscreen mode

When to Use rapidlog (And When NOT To)

Use rapidlog when:

✅ You're logging 10K+ events/sec
✅ You have 4+ worker threads logging concurrently
✅ You need structured JSON by default
✅ You want zero external dependencies
✅ Latency matters (e.g., fintech, gaming, real-time APIs)

Don't use rapidlog when:

❌ You're logging <1K events/sec (stdlib is fine, no contention)
❌ You need file rotation (coming in v2, but not there yet)
❌ You need colors/pretty output for development (use stdlib)
❌ Memory is extremely constrained and you can't spare 2+ MiB
❌ You're on Python < 3.10

Benchmarking Methodology (Why I Trust These Numbers)

I benchmarked against 6+ libraries with fair conditions:

  1. Same output format across all (structured JSON, ~100 bytes/log)
  2. Except fastlogging, which doesn't support structured JSON
  3. Real I/O (logs written to file, not discarded)
  4. In-memory benchmarks are meaningless for logging
  5. Configurable thread counts (1, 4, 8 workers)
  6. Documented trade-offs (memory, dependencies, features) All code is in the GitHub repo if you want to reproduce.

v2 Roadmap (What's Coming)

v1 is intentionally minimal. v2 will add:

  • Multiple sinks (file, network, cloud)
  • Sampling (drop 1-in-N records for very high volume)
  • Custom encoders (MessagePack, Protobuf)
  • OpenTelemetry integration (correlation IDs, trace context)
  • Datadog/Honeycomb SDK examples

Why Open-Source?

Why share this instead of keeping it proprietary?
Selfish reasons:

  1. I care deeply about performance engineering, and open-sourcing helps pressure-test ideas in the real world.
  2. Feedback from real users (you!) makes the library better, faster.
  3. It's a long-term engineering project where I can share trade-offs, benchmarking methodology, and design decisions transparently.

Altruistic reasons:

  1. This is a universal problem. Lock contention under load affects every multi-threaded Python app.
  2. stdlib logging is good enough for 95% of cases. The other 5% shouldn't have to ship custom solutions.

Questions I Expect (And Answers)

Q: Why not just use async/await for logging?
A: Async adds overhead (context switching, event loop overhead). Pre-allocated buffers are simpler and faster for this narrow use case.

Q: What about the GIL?
A: GIL + stdlib's lock is a double hit. rapidlog works within GIL constraints by deferring expensive work (serialization) to a writer thread. This is a known pattern (think: RabbitMQ design).

Q: How does this compare to Loguru?
A: Loguru is more feature-rich and easier to use. rapidlog is faster for high-volume multi-threaded scenarios. Pick based on your constraints.

Q: Can I use this with Django/Flask?
A: Yes! Examples coming in the repo. Anywhere you'd use stdlib logging, rapidlog works.

Q: Is this production-ready?
A: v1.0 is stable (37 comprehensive tests). It's intentionally minimal, but what's there is solid.

Try It Out

pip install rapidlog
Enter fullscreen mode Exit fullscreen mode
from rapidlog import get_logger
logger = get_logger()
logger.info("Hello, high-performance logging!", request_id="abc123", latency_ms=42)
logger.close()
Enter fullscreen mode Exit fullscreen mode

Full docs: https://github.com/sid19991/rapidlog

What I Learned Building This

  1. Lock contention is invisible until you benchmark. Profilers don't always show it clearly.
  2. Pre-allocation trades memory for speed. This is unfashionable in today's auto-scaling world, but it works.
  3. The GIL is a constraint, not a blocker. You can build fast Python within its limits.
  4. Benchmarking is hard. Fair comparison requires controlling for output format, I/O, thread count, and more.
  5. Simple designs win. Per-thread buffers + async writer is decades-old (used in real databases). Innovation isn't always new.

Let's Talk

What do you think? Are you logging 10K+ events/sec and hitting lock contention? Would you use this?
Hit me up:

Links:

Top comments (0)