I store LLM training data. Every tool I found either compresses it or encrypts it — nothing did both. So I built QUANTUM-PULSE.
The pipeline
payload → MsgPack → Zstd-L22 + corpus dict → AES-256-GCM → SHA3-256 Merkle
Step 1: MsgPack over JSON
Before compression, MsgPack shrinks the payload by ~22%:
import msgpack
raw = msgpack.packb(payload, use_bin_type=True)
# 22% smaller than json.dumps().encode() — better input = better downstream ratio
Step 2: The dictionary insight
Standard Zstd builds a probability model from scratch every time. For training records sharing the same schema, this is wasted work.
Train once:
import zstandard as zstd
dict_data = zstd.train_dictionary(131072, corpus_samples[:200])
cctx = zstd.ZstdCompressor(level=22, dict_data=dict_data)
compressed = cctx.compress(raw)
Result: 28.46× with dict vs 14.64× vanilla — +94.4% improvement, 29% faster.
The dictionary retrains automatically every 24h via APScheduler as new data arrives.
Step 3: AES-256-GCM with per-blob HKDF keys
One passphrase → master key → unique key per blob:
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os
# Unique key per blob — one compromise reveals nothing about others
blob_key = HKDF(
algorithm=hashes.SHA256(), length=32,
salt=blob_salt, info=pulse_id.encode()
).derive(master_key)
nonce = os.urandom(12) # fresh per seal
ciphertext = AESGCM(blob_key).encrypt(nonce, compressed, None)
Step 4: SHA3-256 Merkle tree
Every unseal verifies a Merkle proof before returning any data. Silent corruption — bit rot, tampered storage, partial writes — is caught cryptographically, not by hoping checksums match.
Benchmark results
| Algorithm | Ratio | Time | Enc | Integrity |
|---|---|---|---|---|
| snappy | 12× | 1.3ms | ✗ | ✗ |
| gzip-9 | 62× | 9.9ms | ✗ | ✗ |
| zstd-L3 | 76× | 1.6ms | ✗ | ✗ |
| QUANTUM-PULSE | 95× | 590ms | ✓ | ✓ |
| zstd-L22 | 99× | 1745ms | ✗ | ✗ |
| brotli-11 | 112× | 1441ms | ✗ | ✗ |
QUANTUM-PULSE is the only option with both encryption and integrity — and it's 3× faster than vanilla zstd-L22.
Honest limitations
- No formal third-party crypto audit yet (private reporting in SECURITY.md)
- PBKDF2-SHA256 over Argon2 — Argon2 planned for v1.1
- MongoDB-first — S3/GCS backends on the roadmap
Try it
git clone https://github.com/Naveenub/quantum-pulse
cp .env.example .env # set QUANTUM_PASSPHRASE
docker-compose up -d
qp seal dataset.json --tag version=v1
qp unseal <pulse-id>
python scripts/benchmark_demo.py # reproduce the numbers
MIT license. 277 tests.
→ https://github.com/Naveenub/quantum-pulse

Top comments (0)