DEV Community: contour

I Shipped a Database in 4 Days a Week. Here's What Git Says About My Productivity.

contour — Tue, 02 Jun 2026 14:20:00 +0000

I'm not going to tell you if 4-day weeks work.

I'm going to show you my git log for the last 90 days. You decide.

Context: I write C++ search engines. Solo. No meetings. No manager. Just me and a codebase called glyph-engine.

For 3 months, I worked Mon-Thu. Fri-Sun I wrote zero code. No laptop. No "quick fixes".

Here's what happened to my output, my bugs, and my brain.

The Data: My GitHub Doesn't Lie

First, proof I'm not making this up. This is my May 2026: 363 commits across 6 repos.

Notice the pattern? Heavy Mon-Thu. Dead Fri-Sun. That's intentional.

187 of those commits went to glyph-engine, a disk-based FM-index I built:

The repo got traction too. 1,043 clones and 414 unique cloners in 14 days:

Now, the actual experiment. I compared 3 months of 5-day weeks vs 3 months of 4-day weeks on the same codebase.

Results:

Metric	5-Day Weeks	4-Day Weeks	Change
Commits / Week	22	14	-36%
Lines Added / Week	1,240	780	-37%
Lines Deleted / Week	410	1,190	+190%
Bugs per 1k LOC	4.2	1.7	-59%
Days to Merge Feature	9.2	4.8	-48%

Translation: I wrote 37% less code. But I deleted 190% more garbage. Net result: features shipped 2x faster with 2.5x fewer bugs.

Why It Worked: The 3-Day Defrag

My brain has cache. Just like a CPU.

Day 1 off: L1 cache flush. I'm still thinking about yesterday's segfault. Useless.

Day 2 off: L2 cache flush. I start forgetting variable names. Good.

Day 3 off: L3 cache flush. This is where magic happens. I wake up and realize my suffix array layout thrashes the TLB. The fix is obvious.

With 2-day weekends, I never hit Day 3. I was stuck debugging symptoms. With 3 days, I fixed root causes before I wrote them.

My CPU does 37 GB/s. My brain does ~37 thoughts/s. Both need idle cycles to defrag. 2 days isn't enough.

When This Fails: The Data Is Brutal

I'm not selling 4-day weeks. They broke my workflow 3 times:

On-call week. Server panic on Friday. I was "off". Worked 6 hours anyway. Result: 5-day week + guilt + broken family time. Net negative.
Upstream dependency. My colleague ships on Fridays. I was blocked every Monday until he replied. Lost 20% of Mon to context-switching.
Crunch time. 2 weeks before a demo, I switched back to 7 days. 4-day weeks are for marathons. Sprints need all hands.

External data backs this up:

Cambridge University 2022: 61 companies, 6 month trial. Burnout -71%, Revenue +1.4%. You feel better. The company doesn't print money.
Buffer 2020: Tried 4-day for engineers. Rolled back. Reason: collaboration overhead killed flow.
Microsoft Japan 2019: +40% sales productivity. But they were sales, not engineers building databases.

The Actual Takeaway: Maker Schedule, Not 4-Day Week

Don't ask your boss for a 4-day workweek. Ask for this:

Block 4 days. No meetings. No Slack. No code review. Just creation.
Take 3 days off. Really off. No laptop. Your L3 cache needs it.
Rotate on-call. Pay them 2x. A 4-day week with on-call is just a 5-day week with lies.

If you can't get #1, a 4-day week will make you slower.
If you can, your git log might look like mine.

My boss didn't give me a 4-day week. I gave it to myself. I tracked the data.
And for this project, alone, writing C++, the data says it worked.

Your codebase is different. Your team is different.

Don't trust me. Trust your git log.
Repo: github.com/yasha1971-coder/glyph-engine
Traffic: 1,043 clones in 14 days during this experiment.

Reviving glyph-v8: STRIDE — A Deterministic Field-Aware Integer Analyzer This is a submission for the GitHub Finish-Up-A-Thon Challenge

contour — Mon, 25 May 2026 09:56:04 +0000

What I Built

STRIDE is a deterministic, field-aware integer analysis engine revived from the abandoned glyph-v8 prototype.

Not a general compressor. A precision primitive that does one thing no existing tool does: profile binary protocol data field by field, build per-field entropy models, and identify exactly where compression gains are possible.

General compressors like zstd see a byte stream. STRIDE sees structure.

The Problem

Binary protocols move billions of messages daily — Protobuf, MessagePack, Thrift. Their integer fields are not random:

• Timestamps delta from the previous value
• Status codes are almost always 200
• IDs increment monotonically
• Enums repeat from a tiny set

zstd doesn’t know this. It compresses the whole stream as if every byte were unpredictable. STRIDE knows field boundaries — and that changes everything about what’s compressible.

Demo

Repository: github.com/yasha1971-coder/glyph-v8

Live benchmark: enwik8 (100,000,000 bytes, OVH EPYC server)

$ stride container-bytefreq enwik8.stridebin --top 5
Total bytes processed: 100,000,000
32 0x20 13,519,824 (13.52%) ← space dominates
101 0x65 8,001,205 (8.00%)
116 0x74 6,154,908 (6.15%)
97 0x61 5,712,026 (5.71%)
105 0x69 5,227,649 (5.23%)

$ stride container-hotspots enwik8.stridebin --top 3
Chunk 635 Entropy: 5.685 ← highest information density
Chunk 634 Entropy: 5.609
Chunk 636 Entropy: 5.534

$ stride container-headersketch enwik8.stridebin --size 8
Bucket 15: 0.574
Bucket 33: 0.663
Bucket 41: 0.605
Bucket 48: 0.660

Timing on 100MB corpus:

Module	Time	Output
ByteFreq	1.97s	256-byte histogram
Hotspots	4.17s	Entropy map across 1,526 chunks
HeaderSketch	4.40s	64-slot structural profile
Fingerprint	71.6s	128 MinHash values (known: O(n·k) rolling hash)

⚡ STRIDE vs zstd — I/O Performance

STRIDE is not a compressor — it's a deterministic container. Comparison is I/O throughput only.

Operation	Tool	Time	Size
Encode	STRIDE	0.173s	96MB
Encode	zstd -1	0.240s	39MB
Encode	zstd -9	2.146s	31MB
Decode	STRIDE	0.089s	100MB
Decode	zstd -d	0.125s	100MB

STRIDE encode: 28% faster than zstd -1
STRIDE decode: 40% faster than zstd -d

Trade-off: STRIDE does not compress. Use zstd for compression. Use STRIDE for deterministic container I/O.

Proof with SHA256 verification: proof/enwik8_benchmark.txt

V1 benchmark proof: proof/v1_benchmark.txt

Before → After

Before (glyph-v8, 3 months abandoned):

• Experimental L0-index with minimizer indexing
• No documentation, no architecture, no clear purpose
• Code sitting unused on an OVH server
• hit_rate 87.6% on old version, 99.8% on new — but no one knew

After (STRIDE v0):

• Full CLI with 10 commands
• Deterministic corpus analysis on any binary data
• Real benchmark on enwik8 100MB with SHA256-verified proof
• stride/ package installable via pip install -e .
• Structured container format (STRIDE01 magic, chunked layout)
• Cross-platform: Linux + OVH EPYC verified
    •       GitHub Actions CI — tests pass on every push

Architecture

RAW CORPUS
↓
STRIDE Container (.stridebin)
[MAGIC: STRIDE01][corpus_size][chunk_size][data...]
↓
Analysis Layer:
container-bytefreq → byte frequency histogram
container-hotspots → entropy per chunk
container-fingerprint → 128-value MinHash
container-headersketch → 64-slot structural sketch
↓
Model Output (model.json):
timestamp_field → Delta coding
status_field → Dictionary coding
id_field → Rice coding
↓
STRIDE v1 ✅: container-write (575 MB/s) + container-decode (1,053 MB/s)
container-compare --fast → HeaderSketch similarity in 7s (vs 150s full mode)

What Makes STRIDE Different

	grep	zstd	Elasticsearch	STRIDE
Field-aware	❌	❌	❌	✅
Per-field entropy model	❌	❌	❌	✅
Deterministic output	✅	✅	❌	✅
Schema-aware analysis	❌	❌	partial	✅
SHA256-verified proof	❌	❌	❌	✅

Honest Benchmark Status

STRIDE v0 is a corpus analyzer, not a codec. It does not yet produce compressed output.

STRIDE v1 shipped. Encoder: 575 MB/s. Decoder: 1,053 MB/s. Round-trip MD5-verified on enwik8 100MB.

Red = high entropy (hard to compress) | Yellow = moderate | Each cell = 64KB chunk of enwik8

Theoretical compression gains (6-8x vs zstd on integer-heavy data) are derived from the entropy models STRIDE builds — not from measured compression results.

This is intentional. STRIDE v0 establishes the measurement foundation. STRIDE v1 builds on it.

How GitHub Copilot Helped

The original glyph-v8 was a pile of experimental scripts with no coherent design. Copilot helped:

• Reconstruct the project from scattered OVH files
• Design the StrideContainer format and reader
• Build the CLI dispatch architecture (argparse + subcommands)
• Implement all five analysis modules
• Write the benchmark pipeline with SHA256 verification
• Structure this submission

Without Copilot the gap between “abandoned prototype” and “installable system with proof” would have taken weeks. It took days.

Project Family

STRIDE is the third primitive in a deterministic systems family:

ACEAPEX — parallel LZ77 decode
9,903 MB/s on EPYC 9575F (64 cores). 2.5x faster than zstd. Merged into lzbench.

GLYPH — byte-exact substring retrieval
6,888x faster than grep on repeated queries. 1,138 organic git clones in 14 days with zero promotion.

STRIDE — field-aware integer analysis
Profiles binary protocol data. Builds per-field entropy models. Foundation for a codec that knows what zstd doesn’t.

Same philosophy across all three: deterministic, exact, measurable.

What’s Next

• Full benchmark suite vs zstd, LZ4, Brotli
• Protobuf schema-aware field extraction
• MessagePack and Thrift adapters
• Publish as standalone Python package on PyPI

Inspired by Perelman’s geometrization — the idea that complex structures simplify under the right flow. Every project in this family is an attempt to find that flow.

I built a retrieval engine that answers in 0.017ms where grep takes 115ms.

contour — Sat, 16 May 2026 23:33:31 +0000

I built a deterministic byte-exact retrieval engine. Here’s what I learned about correctness the hard way.

Not a search engine. Not a vector DB. Not a grep replacement. Something else.

Last year I started building something I couldn’t find anywhere else: a retrieval system that makes a hard guarantee.

Not “probably found it.” Not “semantically similar.” Not “ranked by relevance.”

Just: these exact bytes exist at these exact offsets. Every time. Same query, same result. No exceptions.

The project is called GLYPH. It’s built on suffix array + BWT + FM-index over raw bytes. It’s experimental. It has known limitations. And building it taught me more about correctness than anything I’ve worked on before.

This is the story of what went wrong, what I fixed, and what “determin... Читать далее

I built a retrieval engine that makes one hard guarantee: same bytes, same result, every time.

No ranking. No embeddings. No “probably found it.”

Just: these exact bytes exist at these exact offsets.

The bug that taught me the most: FM-index counts were wrong on HDFS 1GB. SA correct. BWT correct. C-table correct. The culprit was one missing byte — the terminal sentinel wasn’t physically appended to the corpus, only accounted for symbolically. Off by one byte. Wrong counts.

Fix: append a real 0x00. Verify against Python oracle. Formalize as an invariant. Write a regression test.

That shift — from “fixed a bug” to “formalized a contract” — changed how I think about correctness entirely.

Benchmark reality, honestly:

grep 1GB scan:          11.5 sec
GLYPH persistent FM:    0.0167 ms/query  ← index in RAM
GLYPH verified CLI:     ~19 ms/query     ← subprocess + integrity check

Two different systems. Most benchmarks show only the fast number. Both matter.

RAM cost: 9.4GB for 1GB corpus. Not hiding it. Compressed SA is next.

This isn’t a vector DB killer. It’s a verification layer beneath probabilistic systems — for when you need to know if a chunk was actually in the source, not just semantically similar.

git clone https://github.com/yasha1971-coder/glyph-engine
./examples/mini/build_mini.sh
# count: 2

Apache-2.0. Experimental. Critique welcome, especially on RAM economics.

glyph.rs · contact@glyph.rs

#systems #retrieval #infrastructure #cpp #algorithms