<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: contour</title>
    <description>The latest articles on DEV Community by contour (@yasha1971coder).</description>
    <link>https://dev.to/yasha1971coder</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3935596%2F0df6e97a-b14f-429a-9d8d-18f701448faa.jpg</url>
      <title>DEV Community: contour</title>
      <link>https://dev.to/yasha1971coder</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yasha1971coder"/>
    <language>en</language>
    <item>
      <title>I Shipped a Database in 4 Days a Week. Here's What Git Says About My Productivity.</title>
      <dc:creator>contour</dc:creator>
      <pubDate>Tue, 02 Jun 2026 14:20:00 +0000</pubDate>
      <link>https://dev.to/yasha1971coder/i-shipped-a-database-in-4-days-a-week-heres-what-git-says-about-my-productivity-3a09</link>
      <guid>https://dev.to/yasha1971coder/i-shipped-a-database-in-4-days-a-week-heres-what-git-says-about-my-productivity-3a09</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccaer0dz9a1ouykx9cip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fccaer0dz9a1ouykx9cip.png" alt=" " width="800" height="509"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm not going to tell you if 4-day weeks work.&lt;/p&gt;

&lt;p&gt;I'm going to show you my git log for the last 90 days. You decide.&lt;/p&gt;

&lt;p&gt;Context: I write C++ search engines. Solo. No meetings. No manager. Just me and a codebase called glyph-engine.&lt;/p&gt;

&lt;p&gt;For 3 months, I worked Mon-Thu. Fri-Sun I wrote zero code. No laptop. No "quick fixes".&lt;/p&gt;

&lt;p&gt;Here's what happened to my output, my bugs, and my brain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data: My GitHub Doesn't Lie
&lt;/h2&gt;

&lt;p&gt;First, proof I'm not making this up. This is my May 2026: 363 commits across 6 repos.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fncr9djhk8zep4g9zk4pk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fncr9djhk8zep4g9zk4pk.png" alt=" " width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice the pattern? Heavy Mon-Thu. Dead Fri-Sun. That's intentional.&lt;/p&gt;

&lt;p&gt;187 of those commits went to glyph-engine, a disk-based FM-index I built:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1bcaqpxtw50lsrkz5mx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1bcaqpxtw50lsrkz5mx.png" alt=" " width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repo got traction too. 1,043 clones and 414 unique cloners in 14 days:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpg0osyz3g8m927r0488o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpg0osyz3g8m927r0488o.png" alt=" " width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, the actual experiment. I compared 3 months of 5-day weeks vs 3 months of 4-day weeks on the same codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;5-Day Weeks&lt;/th&gt;
&lt;th&gt;4-Day Weeks&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Commits / Week&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;-36%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lines Added / Week&lt;/td&gt;
&lt;td&gt;1,240&lt;/td&gt;
&lt;td&gt;780&lt;/td&gt;
&lt;td&gt;-37%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lines Deleted / Week&lt;/td&gt;
&lt;td&gt;410&lt;/td&gt;
&lt;td&gt;1,190&lt;/td&gt;
&lt;td&gt;+190%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bugs per 1k LOC&lt;/td&gt;
&lt;td&gt;4.2&lt;/td&gt;
&lt;td&gt;1.7&lt;/td&gt;
&lt;td&gt;-59%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Days to Merge Feature&lt;/td&gt;
&lt;td&gt;9.2&lt;/td&gt;
&lt;td&gt;4.8&lt;/td&gt;
&lt;td&gt;-48%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Translation: I wrote 37% less code. But I deleted 190% more garbage. Net result: features shipped 2x faster with 2.5x fewer bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Worked: The 3-Day Defrag
&lt;/h2&gt;

&lt;p&gt;My brain has cache. Just like a CPU.&lt;/p&gt;

&lt;p&gt;Day 1 off: L1 cache flush. I'm still thinking about yesterday's segfault. Useless.&lt;/p&gt;

&lt;p&gt;Day 2 off: L2 cache flush. I start forgetting variable names. Good.&lt;/p&gt;

&lt;p&gt;Day 3 off: L3 cache flush. This is where magic happens. I wake up and realize my suffix array layout thrashes the TLB. The fix is obvious.&lt;/p&gt;

&lt;p&gt;With 2-day weekends, I never hit Day 3. I was stuck debugging symptoms. With 3 days, I fixed root causes before I wrote them.&lt;/p&gt;

&lt;p&gt;My CPU does 37 GB/s. My brain does ~37 thoughts/s. Both need idle cycles to defrag. 2 days isn't enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  When This Fails: The Data Is Brutal
&lt;/h2&gt;

&lt;p&gt;I'm not selling 4-day weeks. They broke my workflow 3 times:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;On-call week.&lt;/strong&gt; Server panic on Friday. I was "off". Worked 6 hours anyway. Result: 5-day week + guilt + broken family time. Net negative.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upstream dependency.&lt;/strong&gt; My colleague ships on Fridays. I was blocked every Monday until he replied. Lost 20% of Mon to context-switching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crunch time.&lt;/strong&gt; 2 weeks before a demo, I switched back to 7 days. 4-day weeks are for marathons. Sprints need all hands.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;External data backs this up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cambridge University 2022:&lt;/strong&gt; 61 companies, 6 month trial. Burnout -71%, Revenue +1.4%. You feel better. The company doesn't print money.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buffer 2020:&lt;/strong&gt; Tried 4-day for engineers. Rolled back. Reason: collaboration overhead killed flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Japan 2019:&lt;/strong&gt; +40% sales productivity. But they were sales, not engineers building databases.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Actual Takeaway: Maker Schedule, Not 4-Day Week
&lt;/h2&gt;

&lt;p&gt;Don't ask your boss for a 4-day workweek. Ask for this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Block 4 days.&lt;/strong&gt; No meetings. No Slack. No code review. Just creation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Take 3 days off.&lt;/strong&gt; Really off. No laptop. Your L3 cache needs it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rotate on-call.&lt;/strong&gt; Pay them 2x. A 4-day week with on-call is just a 5-day week with lies.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you can't get #1, a 4-day week will make you slower.&lt;br&gt;
If you can, your git log might look like mine.&lt;/p&gt;

&lt;p&gt;My boss didn't give me a 4-day week. I gave it to myself. I tracked the data.&lt;br&gt;
And for this project, alone, writing C++, the data says it worked.&lt;/p&gt;

&lt;p&gt;Your codebase is different. Your team is different.&lt;/p&gt;

&lt;p&gt;Don't trust me. Trust your git log.&lt;br&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; github.com/yasha1971-coder/glyph-engine&lt;br&gt;
&lt;strong&gt;Traffic:&lt;/strong&gt; 1,043 clones in 14 days during this experiment.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>programming</category>
      <category>career</category>
      <category>github</category>
    </item>
    <item>
      <title>Reviving glyph-v8: STRIDE — A Deterministic Field-Aware Integer Analyzer This is a submission for the GitHub Finish-Up-A-Thon Challenge</title>
      <dc:creator>contour</dc:creator>
      <pubDate>Mon, 25 May 2026 09:56:04 +0000</pubDate>
      <link>https://dev.to/yasha1971coder/reviving-glyph-v8-from-a-forgotten-prototype-to-stride-a-field-aware-integer-coder-h24</link>
      <guid>https://dev.to/yasha1971coder/reviving-glyph-v8-from-a-forgotten-prototype-to-stride-a-field-aware-integer-coder-h24</guid>
      <description>&lt;p&gt;What I Built&lt;/p&gt;

&lt;p&gt;STRIDE is a deterministic, field-aware integer analysis engine revived from the abandoned glyph-v8 prototype.&lt;/p&gt;

&lt;p&gt;Not a general compressor. A precision primitive that does one thing no existing tool does: profile binary protocol data field by field, build per-field entropy models, and identify exactly where compression gains are possible.&lt;/p&gt;

&lt;p&gt;General compressors like zstd see a byte stream. STRIDE sees structure.&lt;/p&gt;

&lt;p&gt;The Problem&lt;/p&gt;

&lt;p&gt;Binary protocols move billions of messages daily — Protobuf, MessagePack, Thrift. Their integer fields are not random:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• Timestamps delta from the previous value
• Status codes are almost always 200
• IDs increment monotonically
• Enums repeat from a tiny set
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;zstd doesn’t know this. It compresses the whole stream as if every byte were unpredictable. STRIDE knows field boundaries — and that changes everything about what’s compressible.&lt;/p&gt;

&lt;p&gt;Demo&lt;/p&gt;

&lt;p&gt;Repository: github.com/yasha1971-coder/glyph-v8&lt;/p&gt;

&lt;p&gt;Live benchmark: enwik8 (100,000,000 bytes, OVH EPYC server)&lt;/p&gt;

&lt;p&gt;$ stride container-bytefreq enwik8.stridebin --top 5&lt;br&gt;
Total bytes processed: 100,000,000&lt;br&gt;
  32  0x20  13,519,824  (13.52%)  ← space dominates&lt;br&gt;
 101  0x65   8,001,205  (8.00%)&lt;br&gt;
 116  0x74   6,154,908  (6.15%)&lt;br&gt;
  97  0x61   5,712,026  (5.71%)&lt;br&gt;
 105  0x69   5,227,649  (5.23%)&lt;/p&gt;

&lt;p&gt;$ stride container-hotspots enwik8.stridebin --top 3&lt;br&gt;
Chunk 635  Entropy: 5.685  ← highest information density&lt;br&gt;
Chunk 634  Entropy: 5.609&lt;br&gt;
Chunk 636  Entropy: 5.534&lt;/p&gt;

&lt;p&gt;$ stride container-headersketch enwik8.stridebin --size 8&lt;br&gt;
Bucket 15: 0.574&lt;br&gt;
Bucket 33: 0.663&lt;br&gt;
Bucket 41: 0.605&lt;br&gt;
Bucket 48: 0.660&lt;/p&gt;

&lt;p&gt;Timing on 100MB corpus:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ByteFreq&lt;/td&gt;
&lt;td&gt;1.97s&lt;/td&gt;
&lt;td&gt;256-byte histogram&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hotspots&lt;/td&gt;
&lt;td&gt;4.17s&lt;/td&gt;
&lt;td&gt;Entropy map across 1,526 chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HeaderSketch&lt;/td&gt;
&lt;td&gt;4.40s&lt;/td&gt;
&lt;td&gt;64-slot structural profile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fingerprint&lt;/td&gt;
&lt;td&gt;71.6s&lt;/td&gt;
&lt;td&gt;128 MinHash values &lt;em&gt;(known: O(n·k) rolling hash)&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  ⚡ STRIDE vs zstd — I/O Performance
&lt;/h2&gt;

&lt;p&gt;STRIDE is not a compressor — it's a deterministic container. Comparison is I/O throughput only.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Encode&lt;/td&gt;
&lt;td&gt;STRIDE&lt;/td&gt;
&lt;td&gt;0.173s&lt;/td&gt;
&lt;td&gt;96MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Encode&lt;/td&gt;
&lt;td&gt;zstd -1&lt;/td&gt;
&lt;td&gt;0.240s&lt;/td&gt;
&lt;td&gt;39MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Encode&lt;/td&gt;
&lt;td&gt;zstd -9&lt;/td&gt;
&lt;td&gt;2.146s&lt;/td&gt;
&lt;td&gt;31MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decode&lt;/td&gt;
&lt;td&gt;STRIDE&lt;/td&gt;
&lt;td&gt;0.089s&lt;/td&gt;
&lt;td&gt;100MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decode&lt;/td&gt;
&lt;td&gt;zstd -d&lt;/td&gt;
&lt;td&gt;0.125s&lt;/td&gt;
&lt;td&gt;100MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;STRIDE encode: 28% faster than zstd -1&lt;br&gt;
STRIDE decode: 40% faster than zstd -d&lt;/p&gt;

&lt;p&gt;Trade-off: STRIDE does not compress. Use zstd for compression. Use STRIDE for deterministic container I/O.&lt;/p&gt;

&lt;p&gt;Proof with SHA256 verification: &lt;a href="https://github.com/yasha1971-coder/glyph-v8/blob/main/proof/enwik8_benchmark.txt" rel="noopener noreferrer"&gt;proof/enwik8_benchmark.txt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;V1 benchmark proof: &lt;a href="https://github.com/yasha1971-coder/glyph-v8/blob/main/proof/v1_benchmark.txt" rel="noopener noreferrer"&gt;proof/v1_benchmark.txt&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before → After&lt;/p&gt;

&lt;p&gt;Before (glyph-v8, 3 months abandoned):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• Experimental L0-index with minimizer indexing
• No documentation, no architecture, no clear purpose
• Code sitting unused on an OVH server
• hit_rate 87.6% on old version, 99.8% on new — but no one knew
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;After (STRIDE v0):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• Full CLI with 10 commands
• Deterministic corpus analysis on any binary data
• Real benchmark on enwik8 100MB with SHA256-verified proof
• stride/ package installable via pip install -e .
• Structured container format (STRIDE01 magic, chunked layout)
• Cross-platform: Linux + OVH EPYC verified
    •       GitHub Actions CI — tests pass on every push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Architecture&lt;/p&gt;

&lt;p&gt;RAW CORPUS&lt;br&gt;
    ↓&lt;br&gt;
STRIDE Container (.stridebin)&lt;br&gt;
  [MAGIC: STRIDE01][corpus_size][chunk_size][data...]&lt;br&gt;
    ↓&lt;br&gt;
Analysis Layer:&lt;br&gt;
  container-bytefreq    → byte frequency histogram&lt;br&gt;
  container-hotspots    → entropy per chunk&lt;br&gt;
  container-fingerprint → 128-value MinHash&lt;br&gt;
  container-headersketch → 64-slot structural sketch&lt;br&gt;
    ↓&lt;br&gt;
Model Output (model.json):&lt;br&gt;
  timestamp_field → Delta coding&lt;br&gt;
  status_field    → Dictionary coding&lt;br&gt;
  id_field        → Rice coding&lt;br&gt;
    ↓&lt;br&gt;
STRIDE v1 ✅: container-write (575 MB/s) + container-decode (1,053 MB/s)&lt;br&gt;
container-compare --fast → HeaderSketch similarity in 7s (vs 150s full mode)&lt;/p&gt;

&lt;p&gt;What Makes STRIDE Different&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;grep&lt;/th&gt;
&lt;th&gt;zstd&lt;/th&gt;
&lt;th&gt;Elasticsearch&lt;/th&gt;
&lt;th&gt;STRIDE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Field-aware&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-field entropy model&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic output&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema-aware analysis&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;partial&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SHA256-verified proof&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Honest Benchmark Status&lt;/p&gt;

&lt;p&gt;STRIDE v0 is a corpus analyzer, not a codec. It does not yet produce compressed output.&lt;/p&gt;

&lt;p&gt;STRIDE v1 shipped. Encoder: 575 MB/s. Decoder: 1,053 MB/s. Round-trip MD5-verified on enwik8 100MB.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5jbga0oqc2cfwiquhcz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5jbga0oqc2cfwiquhcz.png" alt="Entropy Heatmap" width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Red = high entropy (hard to compress) | Yellow = moderate | Each cell = 64KB chunk of enwik8&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Theoretical compression gains (6-8x vs zstd on integer-heavy data) are derived from the entropy models STRIDE builds — not from measured compression results.&lt;/p&gt;

&lt;p&gt;This is intentional. STRIDE v0 establishes the measurement foundation. STRIDE v1 builds on it.&lt;/p&gt;

&lt;p&gt;How GitHub Copilot Helped&lt;/p&gt;

&lt;p&gt;The original glyph-v8 was a pile of experimental scripts with no coherent design. Copilot helped:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• Reconstruct the project from scattered OVH files
• Design the StrideContainer format and reader
• Build the CLI dispatch architecture (argparse + subcommands)
• Implement all five analysis modules
• Write the benchmark pipeline with SHA256 verification
• Structure this submission
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Without Copilot the gap between “abandoned prototype” and “installable system with proof” would have taken weeks. It took days.&lt;/p&gt;

&lt;p&gt;Project Family&lt;/p&gt;

&lt;p&gt;STRIDE is the third primitive in a deterministic systems family:&lt;/p&gt;

&lt;p&gt;ACEAPEX — parallel LZ77 decode&lt;br&gt;
9,903 MB/s on EPYC 9575F (64 cores). 2.5x faster than zstd. Merged into lzbench.&lt;/p&gt;

&lt;p&gt;GLYPH — byte-exact substring retrieval&lt;br&gt;
6,888x faster than grep on repeated queries. 1,138 organic git clones in 14 days with zero promotion.&lt;/p&gt;

&lt;p&gt;STRIDE — field-aware integer analysis&lt;br&gt;
Profiles binary protocol data. Builds per-field entropy models. Foundation for a codec that knows what zstd doesn’t.&lt;/p&gt;

&lt;p&gt;Same philosophy across all three: deterministic, exact, measurable.&lt;/p&gt;

&lt;p&gt;What’s Next&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• Full benchmark suite vs zstd, LZ4, Brotli
• Protobuf schema-aware field extraction
• MessagePack and Thrift adapters
• Publish as standalone Python package on PyPI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Inspired by Perelman’s geometrization — the idea that complex structures simplify under the right flow. Every project in this family is an attempt to find that flow.&lt;/p&gt;

</description>
      <category>githubfinishupathon</category>
      <category>devchallenge</category>
      <category>githubchallenge</category>
    </item>
    <item>
      <title>I built a retrieval engine that answers in 0.017ms where grep takes 115ms.</title>
      <dc:creator>contour</dc:creator>
      <pubDate>Sat, 16 May 2026 23:33:31 +0000</pubDate>
      <link>https://dev.to/yasha1971coder/description-deterministic-byte-exact-retrieval-over-static-corpora-4793</link>
      <guid>https://dev.to/yasha1971coder/description-deterministic-byte-exact-retrieval-over-static-corpora-4793</guid>
      <description>&lt;h1&gt;
  
  
  I built a deterministic byte-exact retrieval engine. Here’s what I learned about correctness the hard way.
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Not a search engine. Not a vector DB. Not a grep replacement. Something else.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Last year I started building something I couldn’t find anywhere else: a retrieval system that makes a hard guarantee.&lt;/p&gt;

&lt;p&gt;Not “probably found it.” Not “semantically similar.” Not “ranked by relevance.”&lt;/p&gt;

&lt;p&gt;Just: &lt;strong&gt;these exact bytes exist at these exact offsets. Every time. Same query, same result. No exceptions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The project is called GLYPH. It’s built on suffix array + BWT + FM-index over raw bytes. It’s experimental. It has known limitations. And building it taught me more about correctness than anything I’ve worked on before.&lt;/p&gt;

&lt;p&gt;This is the story of what went wrong, what I fixed, and what “determin... Читать далее&lt;/p&gt;

&lt;h1&gt;
  
  
  I built a retrieval engine that makes one hard guarantee: same bytes, same result, every time.
&lt;/h1&gt;

&lt;p&gt;No ranking. No embeddings. No “probably found it.”&lt;/p&gt;

&lt;p&gt;Just: &lt;strong&gt;these exact bytes exist at these exact offsets.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;The bug that taught me the most: FM-index counts were wrong on HDFS 1GB. SA correct. BWT correct. C-table correct. The culprit was one missing byte — the terminal sentinel wasn’t physically appended to the corpus, only accounted for symbolically. Off by one byte. Wrong counts.&lt;/p&gt;

&lt;p&gt;Fix: append a real &lt;code&gt;0x00&lt;/code&gt;. Verify against Python oracle. Formalize as an invariant. Write a regression test.&lt;/p&gt;

&lt;p&gt;That shift — from “fixed a bug” to “formalized a contract” — changed how I think about correctness entirely.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Benchmark reality, honestly:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;grep 1GB scan:          11.5 sec
GLYPH persistent FM:    0.0167 ms/query  ← index in RAM
GLYPH verified CLI:     ~19 ms/query     ← subprocess + integrity check
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two different systems. Most benchmarks show only the fast number. Both matter.&lt;/p&gt;

&lt;p&gt;RAM cost: 9.4GB for 1GB corpus. Not hiding it. Compressed SA is next.&lt;/p&gt;




&lt;p&gt;This isn’t a vector DB killer. It’s a verification layer beneath probabilistic systems — for when you need to know if a chunk was actually in the source, not just semantically similar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/yasha1971-coder/glyph-engine
./examples/mini/build_mini.sh
&lt;span class="c"&gt;# count: 2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apache-2.0. Experimental. Critique welcome, especially on RAM economics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://glyph.rs" rel="noopener noreferrer"&gt;glyph.rs&lt;/a&gt; · &lt;a href="mailto:contact@glyph.rs"&gt;contact@glyph.rs&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;code&gt;#systems&lt;/code&gt; &lt;code&gt;#retrieval&lt;/code&gt; &lt;code&gt;#infrastructure&lt;/code&gt; &lt;code&gt;#cpp&lt;/code&gt; &lt;code&gt;#algorithms&lt;/code&gt;&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>computerscience</category>
      <category>showdev</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
