<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yu Qian Yang</title>
    <description>The latest articles on DEV Community by Yu Qian Yang (@yu_qianyang_db00999789f2).</description>
    <link>https://dev.to/yu_qianyang_db00999789f2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3878715%2Fd15d7afc-e5d1-40ad-a32f-da57f1977247.png</url>
      <title>DEV Community: Yu Qian Yang</title>
      <link>https://dev.to/yu_qianyang_db00999789f2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yu_qianyang_db00999789f2"/>
    <language>en</language>
    <item>
      <title>How We Built a Disaggregated Storage Layer for a Columnar MPP Database (And What Broke in Production)</title>
      <dc:creator>Yu Qian Yang</dc:creator>
      <pubDate>Thu, 16 Apr 2026 10:30:27 +0000</pubDate>
      <link>https://dev.to/yu_qianyang_db00999789f2/how-we-built-a-disaggregated-storage-layer-for-a-columnar-mpp-database-and-what-broke-in-2i24</link>
      <guid>https://dev.to/yu_qianyang_db00999789f2/how-we-built-a-disaggregated-storage-layer-for-a-columnar-mpp-database-and-what-broke-in-2i24</guid>
      <description>&lt;h1&gt;
  
  
  How I Built a Disaggregated Storage Layer for a Columnar MPP Database
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;And what broke in production.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Enterprise data warehouses are expensive to run. When your database stores data on local SSDs, you're paying SSD prices for data that might only be queried once a month. For a bank managing petabytes of historical transaction data, this adds up fast.&lt;/p&gt;

&lt;p&gt;The standard answer in 2022 was "move cold data to object storage" — S3, Alibaba OSS, Huawei OBS, take your pick. Object storage is cheap. The problem is that it's also slow: latency in the hundreds of milliseconds per request, compared to microseconds for local NVMe.&lt;/p&gt;

&lt;p&gt;Our task: make object storage fast enough that a columnar MPP database could use it as a primary storage backend without destroying query performance.&lt;/p&gt;

&lt;p&gt;The result was UniStore, a disaggregated storage layer that sits between the query engine and cloud object storage. This is the architecture we built, the tradeoffs we made, and the bugs we found the hard way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;UniStore runs as a separate process on each database node. From the query engine's perspective, it looks like a local filesystem. Under the hood, it manages a local cache (around 1TB per node) backed by object storage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query Engine
     |
     | (POSIX-like file API)
     v
  UniStore Frontend
     |
     | (internal protocol)
     v
  UniStore Backend
     |          |
     v          v
Local Cache   Object Storage
(NVMe, ~1TB)  (S3/OSS/OBS/HDFS)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The frontend handles file open/read/write calls from the query engine. The backend manages the cache, coordinates uploads to object storage, and handles prefetching. They communicate over a Unix socket.&lt;/p&gt;

&lt;p&gt;Object files are split into fixed-size blocks. Each block is independently cached, uploaded, and tracked. The metadata — which blocks exist, where they are, their cache state — is stored in RocksDB for persistence across restarts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Engineering: The Prefetch Engine
&lt;/h2&gt;

&lt;p&gt;Cold reads from object storage are slow (~200ms per request vs ~100μs for local NVMe — roughly 2000x slower). The only way to make this tolerable is to have data in cache before the query asks for it.&lt;/p&gt;

&lt;p&gt;UniStore's prefetch engine works by predicting sequential scan patterns. When the query engine opens a file and starts reading blocks sequentially, the backend detects the pattern and begins fetching ahead.&lt;/p&gt;

&lt;p&gt;Two mechanisms drive this. The first is a sequential scan predictor: given the current read position and recent access history, it estimates which blocks will be needed next and issues prefetch requests proactively. The second is a pre-registration API: when the query optimizer knows in advance which files it will access, it registers this list before execution starts, and the backend begins warming the cache early.&lt;/p&gt;

&lt;p&gt;To handle variable object storage latency, the prefetch timing uses a sliding-window average of recent round-trip times. If the storage backend is responding slowly, the prefetch window expands to compensate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance result:&lt;/strong&gt; Cold start (empty cache) is roughly 20x slower than SSD. Warm cache (data already prefetched) performs on par with SSD. In practice, for repeated workloads on the same dataset, queries run at SSD speed despite the underlying storage being object storage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Engineering: The Cache Layer
&lt;/h2&gt;

&lt;p&gt;The cache uses a block-level LRU policy with pin/unpin support. Blocks referenced by active queries are pinned — they cannot be evicted until the query releases them. This prevents a pathological case where a long-running scan evicts its own prefetched data.&lt;/p&gt;

&lt;p&gt;Eviction is triggered when cache utilization crosses a threshold. The eviction policy walks the LRU list, skipping pinned blocks, and frees the least recently used eligible blocks.&lt;/p&gt;

&lt;p&gt;One design decision worth noting: we track metadata (block location, cache state, reference counts) in RocksDB rather than in-memory only. This means cache metadata survives process restarts — after a crash, UniStore knows exactly which blocks are in the local cache without having to scan the filesystem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Cloud Backend Abstraction
&lt;/h2&gt;

&lt;p&gt;The customer required support for multiple cloud providers simultaneously, with the ability to switch backends or add redundancy without query engine changes.&lt;/p&gt;

&lt;p&gt;We implemented a worker abstraction: separate backend implementations for each storage type (OSS-compatible, HDFS, and local filesystem) all satisfy the same interface. The frontend doesn't know or care which backend is active. A factory component selects the right implementation based on configuration.&lt;/p&gt;

&lt;p&gt;This turned out to be more useful than we expected. During deployment, we needed to migrate data between cloud providers while keeping the database online. Because the worker abstraction was clean, we could run two backends simultaneously and drain one while filling the other — all transparent to the query engine.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Broke in Production
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bug 1: Silent Data Corruption on Large File Writes
&lt;/h3&gt;

&lt;p&gt;This one was subtle.&lt;/p&gt;

&lt;p&gt;Large files are written as a sequence of blocks. The frontend issues completion signals one by one as each block finishes. The backend adds these to an async task queue — so in theory, blocks could be processed out of order.&lt;/p&gt;

&lt;p&gt;The bug: when the last block was uploaded first (out of queue ordering), the multipart upload part number assignment collided with earlier blocks. The last block's data silently overwrote part of an earlier block. No error was returned. The file appeared to complete successfully, but the data was corrupt.&lt;/p&gt;

&lt;p&gt;The root cause was in the upload type selection logic. The code was checking only buffer size to decide between single-upload and multipart-upload mode — but for files with multiple blocks, this check was wrong for any block beyond the first. It chose single-upload mode when it should have chosen multipart, which conflicted with an already-in-progress multipart upload session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Gate the upload type decision on block position as well as buffer size. For any block beyond the first, force multipart mode.&lt;/p&gt;

&lt;p&gt;We also added a fault injection flag that forces the last block to be uploaded first, making the race condition deterministically reproducible in CI. This is the kind of test infrastructure we should have built before shipping — async upload pipelines have inherent ordering ambiguity, and that should have been a first-class test scenario from day one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 2: Cache Deadlock from Tmpfile Leak
&lt;/h3&gt;

&lt;p&gt;The original write path worked like this: when writing a file, data was first buffered into a local temporary file, and a corresponding cache object was created. When the file write completed normally, the temporary file was transitioned to a state where it could eventually be evicted.&lt;/p&gt;

&lt;p&gt;The problem: if an exception occurred before the file write completed — say, if object storage became temporarily unavailable — the temporary file and its associated cache objects remained in a state where they could not be evicted. Cache space leaked. Given enough retries (which the query engine would do automatically), the entire cache filled up with unevictable incomplete objects. New queries blocked waiting for cache space that would never be freed.&lt;/p&gt;

&lt;p&gt;A secondary problem: block identifiers are allocated serially, so retrying the same failed write created new identifiers and new temporary files with each attempt — accelerating the leak.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Added a configuration flag that bypasses the local cache entirely for writes, streaming data directly to object storage via dedicated write implementations for each backend type. In the worst case, only one temporary file leaks (named by object path rather than block identifier).&lt;/p&gt;

&lt;p&gt;The tradeoff: writes no longer populate the cache, so data written and immediately read will hit object storage instead of cache. For our workload (write-once, read-later analytics), this was acceptable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Test your failure modes, not just your happy path.&lt;/strong&gt;&lt;br&gt;
The data corruption bug only manifested when the task queue processed blocks out of order. This was always possible, but we didn't test for it. Fault injection infrastructure should be built alongside the feature, not after the bug is found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Resource cleanup ordering in async systems needs explicit contracts.&lt;/strong&gt;&lt;br&gt;
Both bugs above were variations of the same mistake: implicit assumptions about ordering in async code. The tmpfile bug assumed writes would always complete cleanly. The data corruption bug assumed blocks would always be processed in submission order. Neither assumption was documented, and neither was enforced.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Metadata persistence matters.&lt;/strong&gt;&lt;br&gt;
Storing cache metadata in RocksDB rather than memory-only was the right call. It added complexity (serialization, recovery logic), but it meant that a crashed UniStore process could restart without invalidating the entire cache or losing track of in-flight uploads.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Deployed at a major bank's big data center, managing thousands of PBs across multiple cloud providers (AWS S3, Alibaba Cloud OSS, Huawei Cloud OBS, Tencent Cloud). Cache size per node: ~1TB. Concurrent client connections supported: up to 17,500.&lt;/p&gt;

&lt;p&gt;Cold start performance: ~20x slower than SSD (dominated by object storage round-trip latency).&lt;br&gt;
Warm cache performance: on par with SSD.&lt;/p&gt;

&lt;p&gt;For the bank's analytics workloads — where the same datasets are queried repeatedly within a business day — warm cache performance means object storage costs with SSD-equivalent query speed.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you're working on similar storage infrastructure problems — disaggregated storage, object storage integration, or database performance engineering — feel free to reach out.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>performance</category>
      <category>cpp</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>How I Used Bit Manipulation to Speed Up Float-to-Int Conversion in a Storage Engine</title>
      <dc:creator>Yu Qian Yang</dc:creator>
      <pubDate>Wed, 15 Apr 2026 02:36:14 +0000</pubDate>
      <link>https://dev.to/yu_qianyang_db00999789f2/how-i-used-bit-manipulation-to-speed-up-float-to-int-conversion-in-a-storage-engine-4p5e</link>
      <guid>https://dev.to/yu_qianyang_db00999789f2/how-i-used-bit-manipulation-to-speed-up-float-to-int-conversion-in-a-storage-engine-4p5e</guid>
      <description>&lt;p&gt;In a columnar time-series database, one of the most effective compression tricks is&lt;br&gt;
deceptively simple: &lt;strong&gt;if a float value is actually an integer, store it as one.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Integers Compress Better Than Floats
&lt;/h2&gt;

&lt;p&gt;Integer compression algorithms like Delta-of-Delta, ZigZag, and Simple8b work by&lt;br&gt;
exploiting predictable bit patterns — small deltas between adjacent values, values&lt;br&gt;
that fit in fewer than 64 bits, and so on. They can pack multiple values into a&lt;br&gt;
single 64-bit word.&lt;/p&gt;

&lt;p&gt;Floats don't cooperate with these schemes. Even &lt;code&gt;1.0&lt;/code&gt; and &lt;code&gt;2.0&lt;/code&gt; have completely&lt;br&gt;
different IEEE 754 bit representations (&lt;code&gt;0x3FF0000000000000&lt;/code&gt; and &lt;code&gt;0x4000000000000000&lt;/code&gt;).&lt;br&gt;
Their XOR is large, their delta is meaningless as an integer, and bit-packing is useless.&lt;/p&gt;

&lt;p&gt;So when a column is declared as &lt;code&gt;FLOAT&lt;/code&gt; but actually contains values like &lt;code&gt;12.0&lt;/code&gt;,&lt;br&gt;
&lt;code&gt;18.0&lt;/code&gt;, &lt;code&gt;25.0&lt;/code&gt; — which happens more often than you'd expect, either because the schema&lt;br&gt;
was designed generically or because the upstream system always emits &lt;code&gt;.0&lt;/code&gt; values — you're&lt;br&gt;
leaving significant compression headroom on the table.&lt;/p&gt;

&lt;p&gt;The fix: &lt;strong&gt;detect these integer-valued floats at encode time, convert them losslessly&lt;br&gt;
to integers, and route them through the integer compression path.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A temperature sensor that reports &lt;code&gt;21.0&lt;/code&gt;, &lt;code&gt;21.5&lt;/code&gt;, &lt;code&gt;22.0&lt;/code&gt; is a good example. Multiply&lt;br&gt;
by 10 and you get &lt;code&gt;210&lt;/code&gt;, &lt;code&gt;215&lt;/code&gt;, &lt;code&gt;220&lt;/code&gt; — plain integers with small, predictable deltas.&lt;br&gt;
Delta-of-Delta or Simple8b will compress these far more efficiently than any&lt;br&gt;
float-specific scheme.&lt;/p&gt;

&lt;p&gt;The challenge: before converting, you need to check whether the scaled value can be&lt;br&gt;
losslessly represented as an integer. The naive check — &lt;code&gt;std::isnan&lt;/code&gt; + range comparison —&lt;br&gt;
works but it's slower than it needs to be on the hot encoding path.&lt;/p&gt;

&lt;p&gt;Here's the faster approach I implemented, using nothing but bit manipulation.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Setup: Scaling Floats to Integers
&lt;/h2&gt;

&lt;p&gt;The encoding scheme works in two steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Scale&lt;/strong&gt;: multiply the float by &lt;code&gt;10^scale&lt;/code&gt; (configurable per column)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Convert&lt;/strong&gt;: cast the scaled value to integer using &lt;code&gt;std::lround&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For example, with &lt;code&gt;scale = 2&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;1.23&lt;/code&gt; → &lt;code&gt;1.23 * 100 = 123.0&lt;/code&gt; → &lt;code&gt;123&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;45.678&lt;/code&gt; → &lt;code&gt;45.678 * 100 = 4567.8&lt;/code&gt; → overflow risk or precision loss&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Step 2 only makes sense if the scaled value actually fits in the target integer type.&lt;br&gt;
That's the overflow check.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Overflow Check
&lt;/h2&gt;

&lt;p&gt;The function takes a pointer to the raw float bytes and the target integer width in bytes.&lt;br&gt;
It returns non-zero if the value would overflow.&lt;/p&gt;

&lt;p&gt;Called before every conversion — if it fires, skip the integer path and fall back to&lt;br&gt;
float encoding.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;you can determine whether a float overflows a given integer type&lt;br&gt;
purely from the float's exponent bits, without doing any arithmetic.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's why.&lt;/p&gt;


&lt;h2&gt;
  
  
  IEEE 754 in One Paragraph
&lt;/h2&gt;

&lt;p&gt;A double-precision float is stored as 64 bits:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ sign: 1 bit ][ exponent: 11 bits ][ fraction: 52 bits ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The value is: &lt;code&gt;1.fraction × 2^(exponent − 1023)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;1023&lt;/code&gt; is the bias — it allows the 11-bit exponent field to represent negative&lt;br&gt;
exponents. The real exponent is &lt;code&gt;stored_exponent − 1023&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For 32-bit floats: 8 exponent bits, bias 127, fraction 23 bits.&lt;/p&gt;


&lt;h2&gt;
  
  
  Extracting the Exponent
&lt;/h2&gt;

&lt;p&gt;For a &lt;code&gt;double&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                        &lt;span class="cm"&gt;/* safe type-pun, no UB */&lt;/span&gt;
&lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int16_t&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x07ff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1023&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step by step:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;memcpy&lt;/code&gt; into a &lt;code&gt;uint64_t&lt;/code&gt; — reinterpret the 8 bytes as a 64-bit integer (no arithmetic, just bits)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;gt;&amp;gt; 52&lt;/code&gt; — shift right past the 52 fraction bits, bringing the exponent to the low end&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;amp; 0x07ff&lt;/code&gt; — mask off the sign bit, keep only the 11 exponent bits&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;- 1023&lt;/code&gt; — subtract the bias to get the real exponent&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For a &lt;code&gt;float&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int16_t&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0xff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same logic: shift past 23 fraction bits, mask 8 exponent bits, subtract bias 127.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Overflow Condition
&lt;/h2&gt;

&lt;p&gt;Once you have the real exponent, the overflow check is one comparison:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;is_overflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;int_typewidth&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where does &lt;code&gt;- 2&lt;/code&gt; come from?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;−1 for the sign bit&lt;/strong&gt;: a signed integer of N bits can hold values up to &lt;code&gt;2^(N-1) - 1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;−1 for the implicit leading 1&lt;/strong&gt;: in IEEE 754, the fraction is &lt;code&gt;1.fraction&lt;/code&gt;, not &lt;code&gt;0.fraction&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So a float with real exponent &lt;code&gt;E&lt;/code&gt; represents a value with &lt;code&gt;E + 1&lt;/code&gt; significant bits&lt;br&gt;
(the implicit 1 plus E fraction bits). For it to fit in a signed N-bit integer, you&lt;br&gt;
need &lt;code&gt;E + 1 ≤ N - 1&lt;/code&gt;, which simplifies to &lt;code&gt;E ≤ N - 2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Full implementation in C:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="cm"&gt;/* Returns 1 if the double at src overflows a signed integer of int_bytes bytes. */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;double_overflow_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;int_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int16_t&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;52&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x07ff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1023&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;int_bytes&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* Returns 1 if the float at src overflows a signed integer of int_bytes bytes. */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;float_overflow_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;int_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint32_t&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int16_t&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;bits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0xff&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;127&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;real_exp&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;int_bytes&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total cost: one &lt;code&gt;memcpy&lt;/code&gt;, one shift, one AND, one subtract, one compare.&lt;br&gt;
No floating-point arithmetic, no branches on the value itself.&lt;/p&gt;


&lt;h2&gt;
  
  
  How It Fits into the Encoder
&lt;/h2&gt;

&lt;p&gt;The encoder scales the value first, then calls the overflow check on the scaled result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;orig&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;               &lt;span class="cm"&gt;/* scale: e.g. orig * 100.0 */&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;double_overflow_check&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;scaled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ENCODE_OVERFLOW&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                  &lt;span class="cm"&gt;/* fall back to float encoding */&lt;/span&gt;

&lt;span class="kt"&gt;int64_t&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llround&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scaled&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;            &lt;span class="cm"&gt;/* safe: overflow already ruled out */&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scale factor is stored in the column header so the decoder can reverse the&lt;br&gt;
operation: &lt;code&gt;decoded = (double)stored_integer / pow(10, scale)&lt;/code&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Not Just Use &lt;code&gt;std::isnan&lt;/code&gt; + Range Check?
&lt;/h2&gt;

&lt;p&gt;The conventional approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;isnan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;INT64_MAX&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;INT64_MIN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This involves floating-point comparisons, which on many architectures require the&lt;br&gt;
value to be loaded into a float register before comparison. On a hot encoding path&lt;br&gt;
processing millions of values, the difference adds up.&lt;/p&gt;

&lt;p&gt;The bit manipulation approach operates entirely on integer registers. The float's&lt;br&gt;
bytes are reinterpreted as an integer — no floating-point unit involved until the&lt;br&gt;
final &lt;code&gt;std::lround&lt;/code&gt; conversion, which only happens when you've already confirmed&lt;br&gt;
no overflow.&lt;/p&gt;


&lt;h2&gt;
  
  
  What This Enables
&lt;/h2&gt;

&lt;p&gt;This check is the entry gate for the full encoding chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;float column
    ↓
check_float_overflow   ← this article
    ↓ (passes)
float → integer cast
    ↓
Delta+ZigZag encoding
    ↓
Simple8b bit-packing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without a cheap overflow gate, the chain can't run on untrusted float data. With it,&lt;br&gt;
each value costs one check before entering the integer compression path — which can&lt;br&gt;
achieve far better compression ratios than float-specific schemes on "integer-like"&lt;br&gt;
time-series data.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This article is part of a series on compression engineering in time-series databases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 1&lt;/strong&gt;: Runtime adaptive compression — how the system selects the best algorithm
without scanning all data &lt;em&gt;(published)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3&lt;/strong&gt;: Chained encoding — the full float-to-integer → Delta+ZigZag → Simple8b pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 4&lt;/strong&gt;: An improved floating-point compression algorithm based on ELF&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;I'm currently available for freelance work on backend systems, storage engineering,&lt;br&gt;
and systems integration. Feel free to reach out.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>performance</category>
      <category>database</category>
      <category>algorithms</category>
    </item>
    <item>
      <title>How I Built a Runtime Adaptive Compression System That Selects the Best Algorithm Without Scanning All Data</title>
      <dc:creator>Yu Qian Yang</dc:creator>
      <pubDate>Tue, 14 Apr 2026 13:31:07 +0000</pubDate>
      <link>https://dev.to/yu_qianyang_db00999789f2/how-i-built-a-runtime-adaptive-compression-system-that-selects-the-best-algorithm-without-scanning-j30</link>
      <guid>https://dev.to/yu_qianyang_db00999789f2/how-i-built-a-runtime-adaptive-compression-system-that-selects-the-best-algorithm-without-scanning-j30</guid>
      <description>&lt;p&gt;When I was working on a columnar time-series database built on Greenplum, we faced a&lt;br&gt;
problem that every time-series storage engineer eventually hits:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;No single compression algorithm works best for all data.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A sensor reading temperature every second looks nothing like a financial tick stream.&lt;br&gt;
A column of integer IDs compresses completely differently from a column of floating-point&lt;br&gt;
measurements. And in the real world, the same column can change character over time —&lt;br&gt;
steady signal for hours, then suddenly a burst of noise.&lt;/p&gt;

&lt;p&gt;The naive answer is: let the user pick an algorithm. But that puts the burden on the user,&lt;br&gt;
and in practice they almost always pick wrong — or just leave the default.&lt;/p&gt;

&lt;p&gt;So I designed &lt;strong&gt;automode&lt;/strong&gt;: a runtime analyzer that samples incoming data, detects its&lt;br&gt;
statistical character, and selects the best encoding chain automatically.&lt;/p&gt;

&lt;p&gt;Here's how it works.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Encoding Chain
&lt;/h2&gt;

&lt;p&gt;Before talking about the analyzer, I need to briefly explain what it's selecting between.&lt;/p&gt;

&lt;p&gt;We built a set of encoding algorithms, each optimized for a specific data pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run-Length Encoding&lt;/strong&gt; — all values are identical&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delta-of-Delta&lt;/strong&gt; — slowly changing values (timestamps, smooth sensors)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delta + ZigZag&lt;/strong&gt; — values oscillating around a baseline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bit Packing&lt;/strong&gt; — small integers that can be bit-packed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gorilla&lt;/strong&gt; — floating-point values with small XOR between adjacent values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The insight that made automode possible: these algorithms operate on fundamentally&lt;br&gt;
different &lt;em&gt;statistical features&lt;/em&gt; of data. If I can identify the dominant feature of a&lt;br&gt;
data stream, I can map it directly to the right algorithm.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Design Problem
&lt;/h2&gt;

&lt;p&gt;The obvious approach — scan all the data and compute statistics — doesn't work in a&lt;br&gt;
storage engine. The whole point of compression is to be fast. Scanning every value&lt;br&gt;
before encoding defeats the purpose.&lt;/p&gt;

&lt;p&gt;I needed a way to characterize a data stream with &lt;strong&gt;minimal overhead&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The solution: &lt;strong&gt;sample windows&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of scanning the full column, I divide the input stream into fixed-size windows&lt;br&gt;
and analyze only a small number of items from each window. Within each window, the&lt;br&gt;
sample start position is randomized to avoid being fooled by local patterns.&lt;/p&gt;

&lt;p&gt;Why randomize? Consider this sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[1 1 1 1 1 1 1 1 1 1 1 1 2 3 2 3 2 3 2 3 2 3 2 3]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you always sample from the start, you'd classify this as "all equal" and pick&lt;br&gt;
Run-Length Encoding. But if you look at the full stream, it's clearly an alternating&lt;br&gt;
pattern — RLE would be terrible. Random sampling within the window gives a much more&lt;br&gt;
representative picture.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Five Features
&lt;/h2&gt;

&lt;p&gt;Each sampled window is analyzed for five statistical features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;All Equal&lt;/strong&gt; — all sampled values are identical&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Values bitwise small&lt;/strong&gt; — the values themselves fit in few bits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1st-order delta small&lt;/strong&gt; — differences between adjacent values are small&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2nd-order delta small&lt;/strong&gt; — differences between differences are small (delta-of-delta)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Float XOR small&lt;/strong&gt; — XOR of adjacent float values is small (Gorilla pattern)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For each window, the analyzer computes 0th, 1st, and 2nd order differentials, plus&lt;br&gt;
XOR pairs for float data. "Small" is defined bitwise: a value is small if its&lt;br&gt;
significant bit length is below a configurable threshold. This makes the check&lt;br&gt;
robust against occasional spike values — a single outlier doesn't ruin the&lt;br&gt;
classification.&lt;/p&gt;




&lt;h2&gt;
  
  
  Strong vs. Weak Features
&lt;/h2&gt;

&lt;p&gt;Not all features are equal. I introduced a &lt;strong&gt;Strong/Weak classification&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strongest&lt;/strong&gt;: All Equal → Run-Length Encoding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong&lt;/strong&gt;: 2nd-order delta small → Delta-of-Delta&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strong&lt;/strong&gt;: Float XOR small → Gorilla&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weak&lt;/strong&gt;: Values bitwise small → Bit Packing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weak&lt;/strong&gt;: 1st-order delta small → Delta + ZigZag&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why does this matter? Consider a case where Delta-of-Delta and Bit Packing score&lt;br&gt;
similarly. Delta-of-Delta is fundamentally better for smoothly varying time-series&lt;br&gt;
data, even with a slightly lower score. The weight system ensures strong methods&lt;br&gt;
are preferred when they're competitive.&lt;/p&gt;




&lt;h2&gt;
  
  
  One Edge Case Worth Noting
&lt;/h2&gt;

&lt;p&gt;The alternating pattern &lt;code&gt;[A B A B A B]&lt;/code&gt; is a trap.&lt;/p&gt;

&lt;p&gt;It looks like "values bitwise small", but Bit Packing performs poorly on alternating&lt;br&gt;
sequences. A general-purpose compressor actually handles this pattern better.&lt;/p&gt;

&lt;p&gt;I added an explicit check: if odd-indexed and even-indexed items within the sample&lt;br&gt;
are each internally equal, it's an alternating pattern — skip the Bit Packing&lt;br&gt;
classification and fall back to general compression.&lt;/p&gt;

&lt;p&gt;This kind of edge case only appears when you test against real production data. In our&lt;br&gt;
case, it showed up in sensor data from a deployment at a large financial institution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feature → Algorithm Mapping
&lt;/h2&gt;

&lt;p&gt;Once features are aggregated across all sample windows, the dominant feature maps to&lt;br&gt;
an algorithm:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;All Equal&lt;/strong&gt; → Run-Length Encoding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2nd-order delta small&lt;/strong&gt; → Delta-of-Delta&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Values bitwise small&lt;/strong&gt; → Bit Packing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1st-order delta small&lt;/strong&gt; → Delta + ZigZag&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Float XOR small&lt;/strong&gt; → Gorilla&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No clear pattern&lt;/strong&gt; → General Compression (fallback)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For floating-point columns, the encoding chain adds a float-to-integer conversion pass&lt;br&gt;
first — converting floats that happen to be integers (like &lt;code&gt;1.0&lt;/code&gt;, &lt;code&gt;2.0&lt;/code&gt;) into actual&lt;br&gt;
integers before passing to the integer encoding path. This is worth a separate article.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Modes
&lt;/h2&gt;

&lt;p&gt;The system supports two optimization priorities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compression-rate mode&lt;/strong&gt;: maximize storage efficiency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed mode&lt;/strong&gt;: maximize decompression speed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, compression-rate mode is used for cold storage and analytics workloads,&lt;br&gt;
and speed mode for hot data that's frequently queried. The same feature detection runs&lt;br&gt;
in both cases — only the final algorithm selection weights differ.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;The system is deployed in production at scale, including a large-scale installation&lt;br&gt;
at a major financial institution managing thousands of PBs of time-series data. In&lt;br&gt;
testing, automode consistently matched or outperformed manually-tuned single-algorithm&lt;br&gt;
compression across diverse real-world datasets.&lt;/p&gt;

&lt;p&gt;The key insight that made it work: &lt;strong&gt;you don't need to scan all the data to understand&lt;br&gt;
its character&lt;/strong&gt;. A few dozen samples, taken randomly from fixed-size windows, are&lt;br&gt;
enough to make a reliable classification — as long as you handle the edge cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This article is part of a series on compression engineering in time-series databases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 2&lt;/strong&gt;: Float-to-integer encoding — how I used IEEE 754 bit manipulation to detect
convertibility with near-zero overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3&lt;/strong&gt;: Chained encoding — a unified float compression pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 4&lt;/strong&gt;: An improved floating-point compression algorithm based on ELF&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;I'm currently available for freelance work on backend systems, storage engineering,&lt;br&gt;
and systems integration. Feel free to reach out.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>performance</category>
      <category>cpp</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
