<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: François Gauthier</title>
    <description>The latest articles on DEV Community by François Gauthier (@fsg_swl).</description>
    <link>https://dev.to/fsg_swl</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3589279%2F04d72c4c-d228-443b-adf7-06f4e31dde49.jpg</url>
      <title>DEV Community: François Gauthier</title>
      <link>https://dev.to/fsg_swl</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fsg_swl"/>
    <language>en</language>
    <item>
      <title>🛡️ Loggr: A Real-Time Logging Engine as a Weapon Against DDoS Attacks</title>
      <dc:creator>François Gauthier</dc:creator>
      <pubDate>Sat, 01 Nov 2025 09:20:55 +0000</pubDate>
      <link>https://dev.to/fsg_swl/loggr-a-real-time-logging-engine-as-a-weapon-against-ddos-attacks-6fk</link>
      <guid>https://dev.to/fsg_swl/loggr-a-real-time-logging-engine-as-a-weapon-against-ddos-attacks-6fk</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Distributed Denial of Service (DDoS) attacks remain one of the most persistent and costly threats in cybersecurity. They overwhelm infrastructures, obscure visibility, and often leave defenders blind at the very moment they need reliable data the most.&lt;br&gt;&lt;br&gt;
The key to detecting, understanding, and countering these attacks lies in something often underestimated: &lt;strong&gt;logs&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;Traditional logging systems struggle under pressure. They sample, drop events, or rely on approximate timestamps that make it impossible to faithfully reconstruct the timeline of an attack.  &lt;/p&gt;

&lt;p&gt;This is precisely the challenge that &lt;strong&gt;Loggr&lt;/strong&gt; was designed to address. Loggr is a high‑performance logging engine capable of ingesting &lt;strong&gt;hundreds of millions of events in seconds&lt;/strong&gt; on standard hardware. Beyond raw throughput, it introduces a critical innovation: &lt;strong&gt;absolute temporal fidelity&lt;/strong&gt;, essential for detection, traceability, and post‑mortem analysis in cybersecurity.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔍 Real-Time Detection
&lt;/h2&gt;

&lt;p&gt;A DDoS attack is defined by a &lt;strong&gt;sudden surge of activity&lt;/strong&gt;: millions of requests flooding in within seconds.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With conventional pipelines, many events are lost or delayed.
&lt;/li&gt;
&lt;li&gt;With Loggr, ingestion rates push hardware to its limits — tens of millions of logs per second on commodity machines — ensuring that all traffic is captured as long as the system is not saturated.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; security teams can &lt;strong&gt;spot anomalies instantly&lt;/strong&gt;, even before downstream SIEMs or dashboards have processed the data. Loggr acts as a &lt;strong&gt;first‑line sensor&lt;/strong&gt;, maximizing visibility.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧾 Traceability and Forensics
&lt;/h2&gt;

&lt;p&gt;During an attack, every event matters. Who hit the system, when, and how often?  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loggr records &lt;strong&gt;all events that the hardware can absorb&lt;/strong&gt;, without sampling.
&lt;/li&gt;
&lt;li&gt;Logs are compressed and stored with a predictable footprint, enabling full retention of the attack for later analysis.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This near‑exhaustive capture is critical for:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compliance&lt;/strong&gt; (proving what happened).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forensic investigations&lt;/strong&gt; (identifying vectors and patterns).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proactive defense&lt;/strong&gt; (training detection models on real attack data).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎞️ Replay and Post-Mortem
&lt;/h2&gt;

&lt;p&gt;Once the attack is over, the &lt;strong&gt;post‑mortem&lt;/strong&gt; begins. Without reliable logs, it is impossible to replay the exact sequence of events.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loggr stores events in a strictly deterministic order.
&lt;/li&gt;
&lt;li&gt;Teams can &lt;strong&gt;replay the attack event by event&lt;/strong&gt;, as if watching it unfold again.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This enables:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identifying bottlenecks.
&lt;/li&gt;
&lt;li&gt;Understanding attack propagation.
&lt;/li&gt;
&lt;li&gt;Strengthening defenses for the future.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🕒 Absolute Temporal Fidelity: Beyond Timestamps
&lt;/h2&gt;

&lt;p&gt;Most logging systems rely on &lt;strong&gt;timestamps&lt;/strong&gt; (milliseconds or microseconds). Under heavy load, multiple events share the same timestamp, making it impossible to know which came first.  &lt;/p&gt;

&lt;p&gt;Loggr takes a radically different approach:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each event is assigned an &lt;strong&gt;atomic inter‑thread sequence number&lt;/strong&gt;, strictly increasing across all threads.
&lt;/li&gt;
&lt;li&gt;Even if two events occur in the same microsecond, they are &lt;strong&gt;differentiated and ordered&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;This guarantees &lt;strong&gt;absolute temporal fidelity&lt;/strong&gt;, without ambiguity.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, this means that during a DDoS, when millions of requests hit simultaneously, Loggr can still reconstruct the &lt;strong&gt;exact order&lt;/strong&gt; of events — as long as throughput remains within hardware capacity. If saturation occurs, losses are possible, but Loggr &lt;strong&gt;pushes those thresholds far beyond traditional solutions&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ Why It Works
&lt;/h2&gt;

&lt;p&gt;Loggr achieves these results through several design choices:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Preprocessing + compression&lt;/strong&gt;: entropy reduction before LZ4, achieving up to 5× compression without sacrificing speed.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lock‑free pipelines&lt;/strong&gt;: eliminating contention, ensuring no bottlenecks even under extreme load.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Predictable footprint&lt;/strong&gt;: runs on standard hardware, no exotic infrastructure required. The minimal footprint is 20MB (stable), and allow the capture of 1.5 to 8M+ events per second&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📌 Positioning in the Security Ecosystem
&lt;/h2&gt;

&lt;p&gt;Loggr is not meant to replace a SIEM or full observability platform. Instead, it acts as an &lt;strong&gt;upstream buffer&lt;/strong&gt;:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Capture&lt;/strong&gt;: massive, reliable ingestion.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression&lt;/strong&gt;: reducing volume before storage or transfer.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forwarding&lt;/strong&gt;: sending data to existing tools (Splunk, Elastic, Datadog, etc.).
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By reducing volume at the source, Loggr makes downstream tools more efficient and cost‑effective.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In cybersecurity, visibility is survival. During a DDoS, losing logs means losing the ability to detect, respond, and learn.  &lt;/p&gt;

&lt;p&gt;With Loggr, &lt;strong&gt;no event is lost as long as the hardware holds the load&lt;/strong&gt;. Detection is immediate, traceability is maximized, and post‑mortems are faithful to reality thanks to absolute temporal fidelity.  &lt;/p&gt;

&lt;p&gt;This is not just a logging engine: it is a &lt;strong&gt;strategic weapon&lt;/strong&gt; against one of the oldest and most persistent threats in the digital landscape.&lt;/p&gt;

&lt;p&gt;As data volumes continue to grow, upstream compression and absolute temporal fidelity will become essential pillars of resilient cybersecurity pipelines. &lt;/p&gt;

&lt;p&gt;Architecture overwiew and detailed benchmarks are available here -&amp;gt; &lt;a href="https://medium.com/@fgauthier_36718/loggr-processing-250m-logs-in-11-5s-on-a-laptop-with-on-the-fly-5-compression-7903b3f941d4" rel="noopener noreferrer"&gt;benchmarks and overview&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How We Built Loggr, a Logging Library That Handles 20M+ Logs/second on a Laptop</title>
      <dc:creator>François Gauthier</dc:creator>
      <pubDate>Fri, 31 Oct 2025 10:27:02 +0000</pubDate>
      <link>https://dev.to/fsg_swl/how-we-built-loggr-a-logging-library-that-handles-20m-logssecond-on-a-laptop-2jc9</link>
      <guid>https://dev.to/fsg_swl/how-we-built-loggr-a-logging-library-that-handles-20m-logssecond-on-a-laptop-2jc9</guid>
      <description>&lt;p&gt;&lt;em&gt;Sharing our work on high-performance log compression - curious about this community's scaling experiences&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge That Started It All
&lt;/h2&gt;

&lt;p&gt;Most logging systems are built for the cloud era — they assume CPU power, RAM, cheap bandwidth and storage.&lt;br&gt;&lt;br&gt;
But what if you're running edge computing, IoT devices, or just want to keep your cloud bills reasonable?&lt;/p&gt;

&lt;p&gt;We set out to build something different.&lt;br&gt;&lt;br&gt;
It began with a simple question: &lt;strong&gt;how many logs per second can you realistically process on consumer hardware before hitting a wall?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The goal was audacious: create a logging library so efficient it could handle &lt;strong&gt;250 million logs at the highest rate with efficient on-the-fly compression and live statistics exports&lt;/strong&gt; on a standard developer laptop without breaking a sweat.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why This Matters More Than You Think
&lt;/h2&gt;

&lt;p&gt;We discovered that typical log data compresses at &lt;strong&gt;4–6×&lt;/strong&gt; when you apply smart preprocessing.&lt;br&gt;&lt;br&gt;
That means for every terabyte of logs you're storing, you could be storing just &lt;strong&gt;200–250 GB&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's the dirty secret of modern logging: you're probably storing the same data dozens of times.&lt;br&gt;&lt;br&gt;
Error messages, API endpoints, user sessions — they follow patterns.&lt;br&gt;&lt;br&gt;
Yet most systems store each instance as if it were unique.&lt;/p&gt;

&lt;p&gt;This translates into concrete &lt;strong&gt;egress cost cuts&lt;/strong&gt; and multiplies long-term storage possibilities.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Breakthrough: On-the-Fly Preprocessing + Compression
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Beyond Standard Compression Techniques
&lt;/h3&gt;

&lt;p&gt;Most solutions throw raw logs at a compressor.&lt;br&gt;&lt;br&gt;
We've developed a preprocessing layer that transforms logs into a more compressible format before even applying LZ4.&lt;br&gt;&lt;br&gt;
This step significantly reduces data entropy, allowing compression to achieve on-the-fly &lt;strong&gt;4–6×&lt;/strong&gt; ratios where standard approaches plateau at &lt;strong&gt;2–3.5×&lt;/strong&gt;, or require significant overhead using HC algorithms with cumbersome post-processing pipelines.&lt;/p&gt;
&lt;h3&gt;
  
  
  Lock-Free (Nearly) Everything
&lt;/h3&gt;

&lt;p&gt;We treated contention as the enemy.&lt;br&gt;&lt;br&gt;
The hot path uses &lt;strong&gt;MPMC queues and ring buffers&lt;/strong&gt; so multiple threads can enqueue logs without blocking each other.&lt;br&gt;&lt;br&gt;
It's like a multi-lane highway where cars merge without stopping.&lt;/p&gt;
&lt;h3&gt;
  
  
  Batch Compression Magic
&lt;/h3&gt;

&lt;p&gt;Instead of compressing each log individually, we batch them and compress entire chunks.&lt;br&gt;&lt;br&gt;
Combined with preprocessing, this gives LZ4 more patterns to work with, dramatically improving ratios.&lt;br&gt;&lt;br&gt;
Our batching strategy pairs with a unique cache approach that handles millions of unique values within tight hardware constraints.&lt;/p&gt;
&lt;h3&gt;
  
  
  Positioning in the Observability Ecosystem:
&lt;/h3&gt;

&lt;p&gt;Loggr is not designed to replace full-featured platforms like Datadog or Splunk, but to serve as an upstream gateway — compressing logs at the source before transmission to storage or downstream analysis pipelines. This creates a cost-efficient two-stage architecture where Loggr handles the “heavy lifting” of data reduction, dramatically cutting egress and storage costs while maintaining compatibility with existing tools.&lt;/p&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpcvxq4rtzxi0qi1zbsyg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpcvxq4rtzxi0qi1zbsyg.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Moment of Truth: 250 Million Log Test
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Laptop&lt;/strong&gt;: Lenovo P14s Gen 5 (Ryzen 5 Pro, 96 GB RAM, NVMe SSD)&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Environment&lt;/strong&gt;: Windows x64, AVX2 enabled&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Data&lt;/strong&gt;: 1,000 unique URLs × 5,000 endpoints, random IP x URL distribution, assorted with other randomly generated fields&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Sample Format&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[249328097] [2025-10-17T15:50:20.721988Z] [/forum/thread/12345.html] [172.16.18.116] [506] [SEARCH] [59466ms] [16802b] [174]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;The 250M log benchmark isn't just a party trick — it's proof that with careful engineering, we can handle orders of magnitude more data on the same hardware&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Results
&lt;/h3&gt;

&lt;h4&gt;
  
  
  With 6 caller threads, 2 MB batch size:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;🔥 250,000,000 logs processed
&lt;/li&gt;
&lt;li&gt;⏱️ 11.52 seconds total
&lt;/li&gt;
&lt;li&gt;🚀 21.71 million logs/second -&amp;gt; to disk&lt;/li&gt;
&lt;li&gt;💾 5:1 compression ratio
&lt;/li&gt;
&lt;li&gt;🖥️ 100% CPU (6 physical cores used)
&lt;/li&gt;
&lt;li&gt;📊 105 MB RAM footprint (stable)&lt;/li&gt;
&lt;li&gt;✅ 0 lost logs &amp;amp; no back-pressure&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  With 1 caller thread, 512 MB batch size:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;🔥 250,000,000 logs processed
&lt;/li&gt;
&lt;li&gt;⏱️ 27.17 seconds total
&lt;/li&gt;
&lt;li&gt;🚀 9.2 million logs/second -&amp;gt; to disk&lt;/li&gt;
&lt;li&gt;💾 5:1 compression ratio
&lt;/li&gt;
&lt;li&gt;🖥️ 20% CPU (1 physical core)
&lt;/li&gt;
&lt;li&gt;📊 1.8 GB RAM footprint (stable)
&lt;/li&gt;
&lt;li&gt;✅ 0 lost logs &amp;amp; no back-pressure&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  With 1 caller thread, 500 KB batch size:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;🔥 250,000,000 logs processed
&lt;/li&gt;
&lt;li&gt;⏱️ 40.32 seconds total
&lt;/li&gt;
&lt;li&gt;🚀 6.2 million logs/second -&amp;gt; to disk&lt;/li&gt;
&lt;li&gt;💾 4.6:1 compression ratio
&lt;/li&gt;
&lt;li&gt;🖥️ 20% CPU (1 physical core)
&lt;/li&gt;
&lt;li&gt;📊 16 MB RAM footprint (stable) &lt;/li&gt;
&lt;li&gt;✅ 0 lost logs &amp;amp; no back-pressure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Detailed Benchmarks : &lt;a href="https://medium.com/@fgauthier_36718/loggr-processing-250m-logs-in-11-5s-on-a-laptop-with-on-the-fly-5-compression-7903b3f941d4" rel="noopener noreferrer"&gt;Here&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Implications
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Edge computing&lt;/strong&gt;: Run comprehensive logging on resource-constrained devices without worrying about RAM and CPU limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-conscious teams&lt;/strong&gt;: Reduce log volume manyfold = lower storage costs, lower egress fees, and potentially lower licensing costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-throughput systems&lt;/strong&gt;: Maintain detailed logging without becoming I/O bound or drowning in storage costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security traceability&lt;/strong&gt;: Log all events with minimal resources, maintaining absolute temporal order with unique trans-thread atomic IDs.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Producer Threads → [Lock-free Queue] → Batch Builder → [LZ4 Compression] → Disk Writer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Design Decisions
&lt;/h3&gt;

&lt;p&gt;Single C DLL (170KB) with no dependencies (plug &amp;amp; play, interoperable with most environments) featuring :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AVX2-optimized code paths&lt;/li&gt;
&lt;li&gt;Highly configurable without complex setup&lt;/li&gt;
&lt;li&gt;On-the-fly IP anonymization&lt;/li&gt;
&lt;li&gt;Custom compression level (or none)&lt;/li&gt;
&lt;li&gt;URL param truncation&lt;/li&gt;
&lt;li&gt;Unique per-log cross-thread ID&lt;/li&gt;
&lt;li&gt;Instant backup write path&lt;/li&gt;
&lt;li&gt;Custom memory footprint&lt;/li&gt;
&lt;li&gt;Live usage statistics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Optional cryptographic signing for audit trails (on the roadmap).&lt;/p&gt;

&lt;h2&gt;
  
  
  When You Might Not Need This
&lt;/h2&gt;

&lt;p&gt;Let’s be honest — not every application needs this level of optimization. If you're processing 10,000 logs per second, traditional solutions work fine.&lt;/p&gt;

&lt;p&gt;But if you're dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hundred of thousand or million of events per second&lt;/li&gt;
&lt;li&gt;Bandwidth-constrained environments&lt;/li&gt;
&lt;li&gt;Budgets where cloud costs matter&lt;/li&gt;
&lt;li&gt;Regulatory requirements for long-term retention
...then host-side compression becomes incredibly valuable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Future of Logging
&lt;/h2&gt;

&lt;p&gt;We believe the next frontier in observability isn't collecting more data — it's being smarter about what we keep and how we store it.&lt;br&gt;
By moving compression to the source, we can maintain detailed audit trails without the traditional cost burden.&lt;br&gt;
The 250M log benchmark isn't just a party trick — it's proof that with careful engineering, we can handle orders of magnitude more data on the same hardware.&lt;/p&gt;

&lt;p&gt;And in an era where data growth is outpacing budget growth, that might be the most important optimization of all.&lt;/p&gt;

&lt;p&gt;Want to run your own tests? For organizations conducting formal technical evaluations, a limited demo DLL is available. We're particularly interested in hearing about edge cases and workloads where our approach does (and doesn't) work well.&lt;/p&gt;

&lt;p&gt;Technical specs: Windows x64, AVX2 required, C API (easy bindings for most languages).&lt;/p&gt;

</description>
      <category>scaling</category>
      <category>performance</category>
      <category>logging</category>
      <category>scalability</category>
    </item>
    <item>
      <title>Beyond LZ4 Limits, Logging at high speed with on-the-fly compression</title>
      <dc:creator>François Gauthier</dc:creator>
      <pubDate>Thu, 30 Oct 2025 11:07:38 +0000</pubDate>
      <link>https://dev.to/fsg_swl/beyond-lz4-limits-logging-at-high-speed-with-on-the-fly-compression-p0p</link>
      <guid>https://dev.to/fsg_swl/beyond-lz4-limits-logging-at-high-speed-with-on-the-fly-compression-p0p</guid>
      <description>&lt;p&gt;&lt;em&gt;An introduction, full article linked at the bottom.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;TL;DR&lt;br&gt;
We built Loggr: a tiny (170 KB, no external dependencies) native C logging library that preprocesses, batches and compresses logs at line rate. On a Lenovo P14s developer laptop (Ryzen 5 Pro, NVMe) we processed 250,000,000 synthetic web-style logs in 11.52 seconds (21.71 million logs/second), achieving roughly 5× end-to-end (to disk) compression (preprocessing + LZ4) while keeping RAM usage low and zero lost logs. This article explains the architecture, test methodology, exact parameters, benchmark data, limitations, and how to reproduce the tests.&lt;/p&gt;

&lt;h1&gt;
  
  
  How We Achieved on-the-fly 5× Log Compression Where LZ4 Alone Fails
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;The preprocessing trick that lets fast compression algorithms achieve heavy compression ratios&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Most logging systems assume cloud-era resources: unlimited CPU, RAM, and cheap storage. But what if you're running edge computing, IoT devices, or just want to keep cloud bills under control?&lt;/p&gt;

&lt;p&gt;We started with a simple question: &lt;strong&gt;how many logs can you realistically process on consumer hardware&lt;/strong&gt; before hitting a wall?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Breakthrough
&lt;/h2&gt;

&lt;p&gt;Instead of throwing raw logs at LZ4, we &lt;strong&gt;preprocess them first&lt;/strong&gt; - transforming logs into a low-entropy format that compressors love.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key innovations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smart preprocessing** reduces entropy before compression&lt;/li&gt;
&lt;li&gt;Lock-free queues** handle 21M+ logs/sec without contention
&lt;/li&gt;
&lt;li&gt;Batch compression** finds longer patterns for better ratios&lt;/li&gt;
&lt;li&gt;Temporal caching** leverages natural log patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Numbers Don't Lie
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Tested on a stock Lenovo P14s (Ryzen 5 Pro, NVMe SSD, 96GB RAM)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;250 Million Logs - Multiple Configurations&lt;br&gt;
6 threads, 2MB batches:&lt;br&gt;
✅ 250M logs in 11.52 seconds&lt;br&gt;
✅ 21.71 million logs/second -&amp;gt; To disk&lt;br&gt;
✅ 5:1 compression ratio&lt;br&gt;
✅ 105MB RAM footprint (stable)&lt;br&gt;
✅ 0 lost logs&lt;/p&gt;

&lt;p&gt;1 thread, 500KB batch (economy mode):&lt;br&gt;
✅ 250M logs in 29.52 seconds&lt;br&gt;
✅ 8.47 million logs/second -&amp;gt; To disk&lt;br&gt;
✅ 4.6:1 compression ratio&lt;br&gt;
✅ 16MB RAM footprint (stable)&lt;br&gt;
✅ 0 lost logs&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The architecture is centered around ring buffers and lock-free queues.&lt;br&gt;
Producer Threads → [Lock-free Queue] → Batch Builder → [LZ4 Compression] → Storage&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core architecture:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;170KB C DLL - zero dependencies&lt;/li&gt;
&lt;li&gt;AVX2-optimized code paths&lt;/li&gt;
&lt;li&gt;Highly configurable memory footprint (20MB to GB+)&lt;/li&gt;
&lt;li&gt;Live telemetry and atomic sequencing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Impact
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Edge computing: Full logging on resource-constrained devices&lt;/li&gt;
&lt;li&gt;Cost reduction: 80%+ savings on storage and egress fees
&lt;/li&gt;
&lt;li&gt;High-throughput systems: Maintain detailed logs without I/O bottlenecks&lt;/li&gt;
&lt;li&gt;Security: Complete audit trails with minimal resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;This isn't just about faster compression - it's about &lt;strong&gt;rethinking logging as a data optimization problem&lt;/strong&gt;. By moving intelligence upstream, we can handle orders of magnitude more data on the same hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For the complete technical deep-dive with full benchmark methodology and API documentation, check out the full article on Medium:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://medium.com/@fgauthier_36718/loggr-processing-250m-logs-in-11-5s-on-a-laptop-with-on-the-fly-5-compression-7903b3f941d4" rel="noopener noreferrer"&gt;Loggr: Processing 250M Logs in 11.5s on a Laptop with On-the-Fly 5× Compression&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What logging challenges are you facing with your high-throughput applications?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>c</category>
      <category>performance</category>
      <category>algorithms</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
