<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Parth Khare</title>
    <description>The latest articles on DEV Community by Parth Khare (@parth_khare_84da5090de191).</description>
    <link>https://dev.to/parth_khare_84da5090de191</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3923282%2F4a0e5aa3-31cf-40d9-88b3-da543edbf94a.png</url>
      <title>DEV Community: Parth Khare</title>
      <link>https://dev.to/parth_khare_84da5090de191</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/parth_khare_84da5090de191"/>
    <language>en</language>
    <item>
      <title>How We Compressed 63.5 GB of Financial Tick Data to 5.5 GB</title>
      <dc:creator>Parth Khare</dc:creator>
      <pubDate>Sun, 10 May 2026 13:33:32 +0000</pubDate>
      <link>https://dev.to/parth_khare_84da5090de191/how-we-compressed-635-gb-of-financial-tick-data-to-55-gb-56dm</link>
      <guid>https://dev.to/parth_khare_84da5090de191/how-we-compressed-635-gb-of-financial-tick-data-to-55-gb-56dm</guid>
      <description>&lt;p&gt;At AlphaBots, we run an algorithmic trading platform that processes live market data across Indian equity and derivatives markets. Every second, we capture 1-second snapshot data and full tick data across Nifty, BankNifty, and equity instruments. It adds up fast — gigabytes of new data every single trading day, compounding.&lt;/p&gt;

&lt;p&gt;We store this data for backtesting, strategy validation, and compliance. After a few months of live operation, the storage bill started hurting. Loading large Parquet files for backtesting runs was slow — we were spending more time moving data around than actually running strategies.&lt;/p&gt;

&lt;p&gt;We tried Parquet's built-in ZSTD. It helped, but not enough. So we built our own compression engine. Here's what we learned.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Insight: Tick Data Has Exploitable Structure
&lt;/h2&gt;

&lt;p&gt;Financial tick data is not random. It has properties general-purpose compressors ignore:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prices move in tiny increments.&lt;/strong&gt; A Nifty futures price might go 22,450.25 → 22,450.50 → 22,450.25. The raw float64 values look different. But the &lt;em&gt;differences&lt;/em&gt; — +0.25, -0.25 — are tiny and repetitive. Store differences instead of raw values and the data collapses dramatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Columns are homogeneous.&lt;/strong&gt; All prices are floats in a similar range. All timestamps are sequential. Columnar storage exploits this — you compress each column independently, so the compressor sees 8 million prices together, not interleaved with volumes and symbols.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data is written once and read rarely.&lt;/strong&gt; Tick archives are almost never updated after writing. We can afford to spend more time compressing, because decompression happens only a handful of times per dataset.&lt;/p&gt;

&lt;p&gt;These three properties together suggested a pipeline general-purpose tools weren't exploiting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pipeline: Four Steps Before ZSTD Sees Anything
&lt;/h2&gt;

&lt;p&gt;We built TSC as a Rust-native engine. Here's the pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Columnar layout.&lt;/strong&gt; Split the dataset into individual columns. Process each independently. The compressor sees homogeneous data — all prices together, all timestamps together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Delta encoding.&lt;/strong&gt; Store the &lt;em&gt;difference&lt;/em&gt; between consecutive values instead of raw values. For a price column: 22450.25 (baseline), +0.25, -0.25. For timestamps with 1-second resolution: differences are often literally 1. They compress to almost nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Bit-packing.&lt;/strong&gt; After delta encoding, each value fits in far fewer bits. Small deltas that fit in 8 bits get stored in 8 bits, not 64.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 — ZSTD as the final pass.&lt;/strong&gt; Only now does ZSTD see the data — working on already-small packed integers, not raw floats. This is the key insight: &lt;strong&gt;ZSTD on pre-processed data significantly outperforms ZSTD on raw data.&lt;/strong&gt; The pre-processing is what beats Parquet's built-in compression.&lt;/p&gt;

&lt;p&gt;The pipeline runs in O(1) memory — fixed-size chunks, constant RAM regardless of input size.&lt;/p&gt;




&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;All tests 100% lossless — every row and column verified after full round-trip.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dataset&lt;/th&gt;
&lt;th&gt;Rows&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;TSC&lt;/th&gt;
&lt;th&gt;vs Parquet zstd&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Nifty historical&lt;/td&gt;
&lt;td&gt;~15M&lt;/td&gt;
&lt;td&gt;63.5 GB&lt;/td&gt;
&lt;td&gt;5.5 GB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;91.6% smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EQY US ALL BBO&lt;/td&gt;
&lt;td&gt;8.8M&lt;/td&gt;
&lt;td&gt;118.92 MB&lt;/td&gt;
&lt;td&gt;30.09 MB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;74.7% smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Options Greeks&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;66.6% smaller&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Compared against gzip on the 63.5 GB dataset: TSC produced 5.5 GB vs gzip's 7.5 GB — &lt;strong&gt;27% smaller than gzip&lt;/strong&gt;, with under 7 GB RAM throughout.&lt;/p&gt;

&lt;p&gt;For AlphaBots, this translated into direct storage cost reduction. Months of tick data now fits in a fraction of its previous space. Backtesting loads faster.&lt;/p&gt;




&lt;h2&gt;
  
  
  Honest Trade-offs
&lt;/h2&gt;

&lt;p&gt;TSC is not a Parquet replacement. Use Parquet when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Random access queries&lt;/strong&gt; — TSC decompresses full chunks, not individual rows. Point queries are slower than Parquet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast writes&lt;/strong&gt; — TSC's pipeline takes more time to compress than Parquet. Deliberate trade-off for better archival ratio.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixed-type / sparse data&lt;/strong&gt; — Delta encoding doesn't help strings or sparse columns. Gains are minimal on wide tables with lots of non-numeric data.
&lt;strong&gt;The sweet spot:&lt;/strong&gt; Dense numeric time-series, write-once, read in batch. Financial tick data. IoT sensor telemetry. Metrics archives.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Using It
&lt;/h2&gt;

&lt;p&gt;Built in Rust with Python bindings via PyO3. Zero-copy Arrow/Polars/Pandas integration. Pre-built wheels for Linux and Windows (Python 3.11 and 3.12).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tsc&lt;/span&gt;

&lt;span class="c1"&gt;# Compress
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tick_data.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tsc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;balanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sort_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Decompress
&lt;/span&gt;&lt;span class="n"&gt;restored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tsc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decompress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Parquet/CSV/DuckDB file workflows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;alphabots_tsc_wrapper&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TSCompressor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TSDecompressor&lt;/span&gt;

&lt;span class="nc"&gt;TSCompressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;balanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;compress_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.tsc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TSDecompressor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;decompress_polars&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data.tsc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Try it on your own data — no install needed:&lt;/strong&gt;&lt;br&gt;
Upload a Parquet or CSV file (up to 200 MB) and see the compression ratio on your actual data in about two minutes.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://parthpc.tail210df5.ts.net/static/index.html" rel="noopener noreferrer"&gt;TSC Compression Service&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pre-built wheels + docs:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/adminalphabots/alphabots-tsc-engine" rel="noopener noreferrer"&gt;GitHub — adminalphabots/alphabots-tsc-engine&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We built TSC for our own use at AlphaBots. The benchmarks are strong enough that we think it has broader applicability — particularly for platforms storing large volumes of financial or IoT time-series.&lt;/p&gt;

&lt;p&gt;We're exploring commercial licensing and IP transfer. If you're working on a TSDB, market data platform, or storage infrastructure where compression ratio matters, reach out: &lt;strong&gt;&lt;a href="mailto:parth.k@alphabots.in"&gt;parth.k@alphabots.in&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;TSC is free for evaluation and non-commercial use. Commercial licensing: &lt;a href="mailto:parth.k@alphabots.in"&gt;parth.k@alphabots.in&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>compression</category>
      <category>timeseries</category>
    </item>
  </channel>
</rss>
