DEV Community: François Gauthier

🛡️ Loggr: A Real-Time Logging Engine as a Weapon Against DDoS Attacks

François Gauthier — Sat, 01 Nov 2025 09:20:55 +0000

Introduction

Distributed Denial of Service (DDoS) attacks remain one of the most persistent and costly threats in cybersecurity. They overwhelm infrastructures, obscure visibility, and often leave defenders blind at the very moment they need reliable data the most.

The key to detecting, understanding, and countering these attacks lies in something often underestimated: logs.

Traditional logging systems struggle under pressure. They sample, drop events, or rely on approximate timestamps that make it impossible to faithfully reconstruct the timeline of an attack.

This is precisely the challenge that Loggr was designed to address. Loggr is a high‑performance logging engine capable of ingesting hundreds of millions of events in seconds on standard hardware. Beyond raw throughput, it introduces a critical innovation: absolute temporal fidelity, essential for detection, traceability, and post‑mortem analysis in cybersecurity.

🔍 Real-Time Detection

A DDoS attack is defined by a sudden surge of activity: millions of requests flooding in within seconds.

With conventional pipelines, many events are lost or delayed.
With Loggr, ingestion rates push hardware to its limits — tens of millions of logs per second on commodity machines — ensuring that all traffic is captured as long as the system is not saturated.

Result: security teams can spot anomalies instantly, even before downstream SIEMs or dashboards have processed the data. Loggr acts as a first‑line sensor, maximizing visibility.

🧾 Traceability and Forensics

During an attack, every event matters. Who hit the system, when, and how often?

Loggr records all events that the hardware can absorb, without sampling.
Logs are compressed and stored with a predictable footprint, enabling full retention of the attack for later analysis.

This near‑exhaustive capture is critical for:

Compliance (proving what happened).
Forensic investigations (identifying vectors and patterns).
Proactive defense (training detection models on real attack data).

🎞️ Replay and Post-Mortem

Once the attack is over, the post‑mortem begins. Without reliable logs, it is impossible to replay the exact sequence of events.

Loggr stores events in a strictly deterministic order.
Teams can replay the attack event by event, as if watching it unfold again.

This enables:

Identifying bottlenecks.
Understanding attack propagation.
Strengthening defenses for the future.

🕒 Absolute Temporal Fidelity: Beyond Timestamps

Most logging systems rely on timestamps (milliseconds or microseconds). Under heavy load, multiple events share the same timestamp, making it impossible to know which came first.

Loggr takes a radically different approach:

Each event is assigned an atomic inter‑thread sequence number, strictly increasing across all threads.
Even if two events occur in the same microsecond, they are differentiated and ordered.
This guarantees absolute temporal fidelity, without ambiguity.

In practice, this means that during a DDoS, when millions of requests hit simultaneously, Loggr can still reconstruct the exact order of events — as long as throughput remains within hardware capacity. If saturation occurs, losses are possible, but Loggr pushes those thresholds far beyond traditional solutions.

⚙️ Why It Works

Loggr achieves these results through several design choices:

Preprocessing + compression: entropy reduction before LZ4, achieving up to 5× compression without sacrificing speed.
Lock‑free pipelines: eliminating contention, ensuring no bottlenecks even under extreme load.
Predictable footprint: runs on standard hardware, no exotic infrastructure required. The minimal footprint is 20MB (stable), and allow the capture of 1.5 to 8M+ events per second

📌 Positioning in the Security Ecosystem

Loggr is not meant to replace a SIEM or full observability platform. Instead, it acts as an upstream buffer:

Capture: massive, reliable ingestion.
Compression: reducing volume before storage or transfer.
Forwarding: sending data to existing tools (Splunk, Elastic, Datadog, etc.).

By reducing volume at the source, Loggr makes downstream tools more efficient and cost‑effective.

Conclusion

In cybersecurity, visibility is survival. During a DDoS, losing logs means losing the ability to detect, respond, and learn.

With Loggr, no event is lost as long as the hardware holds the load. Detection is immediate, traceability is maximized, and post‑mortems are faithful to reality thanks to absolute temporal fidelity.

This is not just a logging engine: it is a strategic weapon against one of the oldest and most persistent threats in the digital landscape.

As data volumes continue to grow, upstream compression and absolute temporal fidelity will become essential pillars of resilient cybersecurity pipelines.

Architecture overwiew and detailed benchmarks are available here -> benchmarks and overview

How We Built Loggr, a Logging Library That Handles 20M+ Logs/second on a Laptop

François Gauthier — Fri, 31 Oct 2025 10:27:02 +0000

Sharing our work on high-performance log compression - curious about this community's scaling experiences

The Challenge That Started It All

Most logging systems are built for the cloud era — they assume CPU power, RAM, cheap bandwidth and storage.

But what if you're running edge computing, IoT devices, or just want to keep your cloud bills reasonable?

We set out to build something different.

It began with a simple question: how many logs per second can you realistically process on consumer hardware before hitting a wall?

The goal was audacious: create a logging library so efficient it could handle 250 million logs at the highest rate with efficient on-the-fly compression and live statistics exports on a standard developer laptop without breaking a sweat.

Why This Matters More Than You Think

We discovered that typical log data compresses at 4–6× when you apply smart preprocessing.

That means for every terabyte of logs you're storing, you could be storing just 200–250 GB.

Here's the dirty secret of modern logging: you're probably storing the same data dozens of times.

Error messages, API endpoints, user sessions — they follow patterns.

Yet most systems store each instance as if it were unique.

This translates into concrete egress cost cuts and multiplies long-term storage possibilities.

The Breakthrough: On-the-Fly Preprocessing + Compression

Beyond Standard Compression Techniques

Most solutions throw raw logs at a compressor.

We've developed a preprocessing layer that transforms logs into a more compressible format before even applying LZ4.

This step significantly reduces data entropy, allowing compression to achieve on-the-fly 4–6× ratios where standard approaches plateau at 2–3.5×, or require significant overhead using HC algorithms with cumbersome post-processing pipelines.

Lock-Free (Nearly) Everything

We treated contention as the enemy.

The hot path uses MPMC queues and ring buffers so multiple threads can enqueue logs without blocking each other.

It's like a multi-lane highway where cars merge without stopping.

Batch Compression Magic

Instead of compressing each log individually, we batch them and compress entire chunks.

Combined with preprocessing, this gives LZ4 more patterns to work with, dramatically improving ratios.

Our batching strategy pairs with a unique cache approach that handles millions of unique values within tight hardware constraints.

Positioning in the Observability Ecosystem:

Loggr is not designed to replace full-featured platforms like Datadog or Splunk, but to serve as an upstream gateway — compressing logs at the source before transmission to storage or downstream analysis pipelines. This creates a cost-efficient two-stage architecture where Loggr handles the “heavy lifting” of data reduction, dramatically cutting egress and storage costs while maintaining compatibility with existing tools.

The Moment of Truth: 250 Million Log Test

Laptop: Lenovo P14s Gen 5 (Ryzen 5 Pro, 96 GB RAM, NVMe SSD)

Environment: Windows x64, AVX2 enabled

Data: 1,000 unique URLs × 5,000 endpoints, random IP x URL distribution, assorted with other randomly generated fields

Sample Format:

[249328097] [2025-10-17T15:50:20.721988Z] [/forum/thread/12345.html] [172.16.18.116] [506] [SEARCH] [59466ms] [16802b] [174]

The 250M log benchmark isn't just a party trick — it's proof that with careful engineering, we can handle orders of magnitude more data on the same hardware

Results

With 6 caller threads, 2 MB batch size:

🔥 250,000,000 logs processed
⏱️ 11.52 seconds total
🚀 21.71 million logs/second -> to disk
💾 5:1 compression ratio
🖥️ 100% CPU (6 physical cores used)
📊 105 MB RAM footprint (stable)
✅ 0 lost logs & no back-pressure

With 1 caller thread, 512 MB batch size:

🔥 250,000,000 logs processed
⏱️ 27.17 seconds total
🚀 9.2 million logs/second -> to disk
💾 5:1 compression ratio
🖥️ 20% CPU (1 physical core)
📊 1.8 GB RAM footprint (stable)
✅ 0 lost logs & no back-pressure

With 1 caller thread, 500 KB batch size:

🔥 250,000,000 logs processed
⏱️ 40.32 seconds total
🚀 6.2 million logs/second -> to disk
💾 4.6:1 compression ratio
🖥️ 20% CPU (1 physical core)
📊 16 MB RAM footprint (stable)
✅ 0 lost logs & no back-pressure

Detailed Benchmarks : Here

Real-World Implications

Edge computing: Run comprehensive logging on resource-constrained devices without worrying about RAM and CPU limits.
Cost-conscious teams: Reduce log volume manyfold = lower storage costs, lower egress fees, and potentially lower licensing costs.
High-throughput systems: Maintain detailed logging without becoming I/O bound or drowning in storage costs.
Security traceability: Log all events with minimal resources, maintaining absolute temporal order with unique trans-thread atomic IDs.

The Architecture

Producer Threads → [Lock-free Queue] → Batch Builder → [LZ4 Compression] → Disk Writer

Key Design Decisions

Single C DLL (170KB) with no dependencies (plug & play, interoperable with most environments) featuring :

AVX2-optimized code paths
Highly configurable without complex setup
On-the-fly IP anonymization
Custom compression level (or none)
URL param truncation
Unique per-log cross-thread ID
Instant backup write path
Custom memory footprint
Live usage statistics

Optional cryptographic signing for audit trails (on the roadmap).

When You Might Not Need This

Let’s be honest — not every application needs this level of optimization. If you're processing 10,000 logs per second, traditional solutions work fine.

But if you're dealing with:

Hundred of thousand or million of events per second
Bandwidth-constrained environments
Budgets where cloud costs matter
Regulatory requirements for long-term retention ...then host-side compression becomes incredibly valuable.

The Future of Logging

We believe the next frontier in observability isn't collecting more data — it's being smarter about what we keep and how we store it.
By moving compression to the source, we can maintain detailed audit trails without the traditional cost burden.
The 250M log benchmark isn't just a party trick — it's proof that with careful engineering, we can handle orders of magnitude more data on the same hardware.

And in an era where data growth is outpacing budget growth, that might be the most important optimization of all.

Want to run your own tests? For organizations conducting formal technical evaluations, a limited demo DLL is available. We're particularly interested in hearing about edge cases and workloads where our approach does (and doesn't) work well.

Technical specs: Windows x64, AVX2 required, C API (easy bindings for most languages).

Beyond LZ4 Limits, Logging at high speed with on-the-fly compression

François Gauthier — Thu, 30 Oct 2025 11:07:38 +0000

An introduction, full article linked at the bottom.

TL;DR
We built Loggr: a tiny (170 KB, no external dependencies) native C logging library that preprocesses, batches and compresses logs at line rate. On a Lenovo P14s developer laptop (Ryzen 5 Pro, NVMe) we processed 250,000,000 synthetic web-style logs in 11.52 seconds (21.71 million logs/second), achieving roughly 5× end-to-end (to disk) compression (preprocessing + LZ4) while keeping RAM usage low and zero lost logs. This article explains the architecture, test methodology, exact parameters, benchmark data, limitations, and how to reproduce the tests.

How We Achieved on-the-fly 5× Log Compression Where LZ4 Alone Fails

The preprocessing trick that lets fast compression algorithms achieve heavy compression ratios

The Problem

Most logging systems assume cloud-era resources: unlimited CPU, RAM, and cheap storage. But what if you're running edge computing, IoT devices, or just want to keep cloud bills under control?

We started with a simple question: how many logs can you realistically process on consumer hardware before hitting a wall?

The Breakthrough

Instead of throwing raw logs at LZ4, we preprocess them first - transforming logs into a low-entropy format that compressors love.

Key innovations:

Smart preprocessing** reduces entropy before compression
Lock-free queues** handle 21M+ logs/sec without contention
Batch compression** finds longer patterns for better ratios
Temporal caching** leverages natural log patterns

The Numbers Don't Lie

Tested on a stock Lenovo P14s (Ryzen 5 Pro, NVMe SSD, 96GB RAM)

250 Million Logs - Multiple Configurations
6 threads, 2MB batches:
✅ 250M logs in 11.52 seconds
✅ 21.71 million logs/second -> To disk
✅ 5:1 compression ratio
✅ 105MB RAM footprint (stable)
✅ 0 lost logs

1 thread, 500KB batch (economy mode):
✅ 250M logs in 29.52 seconds
✅ 8.47 million logs/second -> To disk
✅ 4.6:1 compression ratio
✅ 16MB RAM footprint (stable)
✅ 0 lost logs

How It Works

The architecture is centered around ring buffers and lock-free queues.
Producer Threads → [Lock-free Queue] → Batch Builder → [LZ4 Compression] → Storage

Core architecture:

170KB C DLL - zero dependencies
AVX2-optimized code paths
Highly configurable memory footprint (20MB to GB+)
Live telemetry and atomic sequencing

Real-World Impact

Edge computing: Full logging on resource-constrained devices
Cost reduction: 80%+ savings on storage and egress fees
High-throughput systems: Maintain detailed logs without I/O bottlenecks
Security: Complete audit trails with minimal resources

The Bottom Line

This isn't just about faster compression - it's about rethinking logging as a data optimization problem. By moving intelligence upstream, we can handle orders of magnitude more data on the same hardware.

For the complete technical deep-dive with full benchmark methodology and API documentation, check out the full article on Medium:

Loggr: Processing 250M Logs in 11.5s on a Laptop with On-the-Fly 5× Compression

What logging challenges are you facing with your high-throughput applications?