What if you could take any file — a photo, a database dump, a movie — and split it into two parts where neither part is useful on its own? Not encryption. Not compression. Just a clean cut.
That's bitsplit. It's pure Python, zero dependencies, and the entire restore operation is a single line:
restored = (data << count) | indices
The idea
Treat the whole file as one giant integer. Slice the top 128 bits off the front. Those 128 bits become your key (a short text string). Everything else becomes your block (a binary file).
File (bytes) --> Number --> [ data: 128 bits | indices: the rest ]
| |
key file data file
To restore: shift the key left, OR with the block, write bytes. Done.
photo.jpg --> data.bin + key.txt
1.05 MB 1.05 MB 102 B
Why does this work? Because the 128 missing bits sit at the most significant positions of the number. Without them, the block is a number whose top is unknown — and there are 2^128 possible tops (~3.4 × 10^38). Brute-forcing that takes longer than the age of the universe.
Try it
pip install bitsplit
bitsplit encode photo.jpg
# -> photo.jpg.dat + photo.jpg.key
bitsplit decode restored.jpg
# -> restored.jpg
Or from Python:
from bitsplit import encode, decode
block, key = encode(open("photo.jpg", "rb").read())
# key looks like: "340079864808174098294188674279182237768:8843264:1105424"
restored = decode(block, key)
The key has three parts: the 128-bit number, the bit shift count, and the original byte size. That's all you need to reconstruct the file.
Where it's actually useful
This isn't a replacement for AES. It's a different tool for a different shape of problem: you want one piece of data to be useless without another, and you want to control where each piece lives.
- Split storage — block in S3, key on your laptop. A bucket leak reveals nothing.
- Two-channel transfer — block over Telegram, key over SMS. Intercepting one channel is worthless.
- Offline backups — drive in a drawer, key on paper in a safe.
- Shared access — Alice holds the key, Bob holds the block. Both required.
- CI/CD secrets — commit the block, store the key in env vars.
- Geo-distribution — block in eu-west, key in us-east. Single-region breach, no data.
Performance
Two bitwise ops, no rounds, no block processing. On an Apple M2:
| File size | bitsplit | OpenSSL AES-256 | GPG AES-256 | 7-Zip AES-256 |
|---|---|---|---|---|
| 100 MB | 0.13 s | 0.64 s | 2.43 s | 4.86 s |
| 1 GB | 1.45 s | 5.11 s | 3.58 s | 3.16 s |
| 5 GB | 15.6 s | 58.8 s | 148.5 s | 372.2 s |
Output size equals input size — no overhead. Streaming I/O keeps memory flat at ~20 MB regardless of file size. All files restored with identical SHA-256 checksums.
What it is NOT
I want to be loud about this, because it matters:
bitsplit is not encryption.
No ciphers. No rounds. No key derivation. No authentication. No padding. No tamper detection.
If you need compliance, audits, or signatures — use AES-GCM or ChaCha20-Poly1305. Those exist for a reason.
bitsplit is a different primitive. Think of it as tearing a document in half, not locking it in a safe. The 128-bit key makes brute-force infeasible, but an attacker who can flip bits in the block can corrupt your data and you won't know until you decode.
For a lot of real-world use cases — split storage, two-channel transfer, offline backup — that's exactly what you want. For others, it's not enough. Pick the right tool.
The whole library
The core is essentially this:
def encode(data: bytes) -> tuple[bytes, str]:
n = int.from_bytes(data, "big")
bits = n.bit_length()
key_bits = min(128, bits)
shift = bits - key_bits
key = n >> shift
block = n & ((1 << shift) - 1)
block_bytes = block.to_bytes((shift + 7) // 8, "big")
return block_bytes, f"{key}:{shift}:{len(data)}"
def decode(block: bytes, key_str: str) -> bytes:
key, shift, size = map(int, key_str.split(":"))
n = (key << shift) | int.from_bytes(block, "big")
return n.to_bytes(size, "big")
That's the whole idea. Everything else is CLI, file handling, and streaming for huge files.
Try it, break it, tell me what's wrong
Repo: github.com/frolpaxa/bitsplit
Issues, PRs, and "actually you're wrong because…" comments very welcome. The math is simple enough that bugs hide in the I/O and edge cases, not the algorithm — exactly the kind of thing more eyes help with.
Top comments (0)