If you’ve ever tried to version a 10GB video, a folder of Unreal assets, or a dataset that mutates daily, you’ve probably had the same experience:
Git is incredible… right up until it isn’t.
And then the workflow becomes some combination of:
- “Just upload it to Drive”
- “Zip it and put it in Slack”
- “Name it
final_final_v12_really_final.zip” - “Please don’t touch that folder, it’s fragile”
- “We’ll figure out versioning later” (Narrator: they never do)
DITS is my attempt to attack that problem at the root.
Repo: https://github.com/byronwade/dits
Concept site: https://dits.byronwade.com
This post is long because the problem is not “a feature missing.”
It’s a primitive missing.
Why I Started Thinking About This
Git changed the world because it gave us a new way to think:
- history is a first-class citizen
- collaboration is distributed
- storage is optimized around changes, not copies
- identity is based on content, not filenames
But Git assumes the thing you’re versioning is text-like:
- line-based diffs
- merge-friendly changes
- small-ish files
- frequent small commits
That’s not how video or game assets behave.
A one-second trim in a video editor can rewrite massive spans of bytes.
A small change in a 3D scene can reshuffle asset metadata.
A “minor” export can produce a completely different binary layout.
So when people say “just use Git LFS,” it’s usually because they’ve never tried to scale a workflow where the primary artifacts are large binaries.
Git LFS can help with storage location.
But it doesn’t solve the deeper issues:
- the network cost of change
- the lack of meaningful diffs
- the inefficiency of re-uploading big blobs
- the inability to treat big assets as collaborative, composable history
I wanted something that preserves Git’s spirit while being honest about binary reality.
The Core Question
What is Git’s real magic?
Most people think it’s version history or branches.
I think Git’s core magic is this:
You don’t store files.
You store content, and you store it in a way that makes “change” cheap.
A commit in Git is essentially:
- a snapshot pointer to a tree of objects
- built from content-addressed data
- where unchanged content is reused automatically
That model is insanely powerful.
So the real question became:
Can we build that same model for large binary assets — where change is cheap — even if the file itself is hostile to diffing?
What “Cheap Change” Means for Binary Media
For text:
- a few lines change → store a small diff → ship a small diff
For video / binaries:
- a small edit can cause huge byte-level changes → naive diff becomes meaningless
So we need a different strategy.
The strategy I’m exploring with DITS is:
Content-defined chunking
Instead of treating a file as one giant blob, you split it into chunks.
But importantly: not fixed-size chunks.
Fixed-size chunking is fragile: insert data at the front and everything shifts → every chunk changes → dedupe fails.
Content-defined chunking (like FastCDC and related approaches) anchors chunk boundaries based on the content stream itself, so edits tend to affect local regions rather than shifting the whole file.
That gives you two huge wins:
- Dedup across versions (reuse unchanged chunks)
- Dedup across projects (identical chunks used in different files become shared)
If you’ve ever wondered why Git feels “magical” even when you branch like a raccoon on espresso… this is basically why: reuse.
The Mental Model of DITS
Here’s the version I keep in my head:
- A “file” is a manifest
- The manifest points to chunks
- Chunks are content-addressed (hash-identified)
- A “version” is a manifest + metadata + parent pointer(s)
- Syncing is: “do you already have these chunks? no? here are the missing ones.”
So rather than uploading a 20GB file every time, you’re uploading:
- some metadata
- and the subset of chunks that are actually new
This is the first principle I care about:
Bandwidth cost should scale with the edit, not the file size.
Why This Is Bigger Than “Another Tool”
This isn’t just about “file storage.”
This is about building a system where you can:
- collaborate on large media assets
- branch and merge workflows
- maintain history with integrity
- cache and reuse data efficiently
- synchronize across machines without redundancy
If it works, it affects:
- indie game development
- film & editing pipelines
- ML datasets
- CAD and 3D assets
- and frankly… cloud providers who pay the tax for redundant upload/download
A lot of modern workflows are basically:
“just re-upload the world because versioning is annoying.”
That’s insane.
The Hard Parts (And Why They’re Interesting)
This is where the project gets real.
1) Chunking is not “just split it”
Chunking affects everything:
- dedup ratio
- reconstruction speed
- memory usage
- integrity verification
- parallelization
- latency
DITS needs chunking that is:
- stable across small edits
- fast enough to run on normal machines
- tunable for different file types
- safe against pathological cases
2) Delta compression vs chunk reuse
Chunk reuse gets you far — but not always.
Sometimes you have chunks that are similar but not identical.
Delta compression can reduce storage further, but then reconstruction gets heavier and you introduce new complexity:
- delta chains
- dependency graphs
- worst-case rebuild time
- corruption blast radius
A big part of the thought process is:
where to be aggressive vs where to stay simple.
3) Reconstruction speed matters as much as storage
If you store a file in a thousand chunks, you have to rebuild it.
That rebuild must be:
- fast
- parallelizable
- streamable
- resilient
A system that saves bandwidth but makes checkout take 5 minutes is just a new kind of pain.
4) Metadata is the real “Git layer”
Git isn’t “files.” Git is metadata about files.
Same here.
DITS needs a robust model for:
- manifests
- versions
- history
- references
- integrity verification
- partial fetch (“give me only what I need to preview”)
This is where the system becomes a platform.
A Concrete Example (The “One-Second Edit” Problem)
Imagine a 5GB video file.
You open it, trim 1 second, export again.
Most systems treat this as:
- Old file: 5GB
- New file: 5GB
- Upload: 5GB again
- Storage: 10GB
But conceptually, the edit is tiny.
With chunk-based versioning:
- many chunks remain identical
- some chunks near the edit region change
- some metadata changes
Best-case outcome:
- you store and transmit a fraction of the file
Even if video codecs cause changes beyond the trimmed region, you still often get meaningful reuse depending on format and chunking strategy.
And even if you don’t get perfect reuse, the system gives you a framework to improve:
- file-type aware chunking
- codec-aware strategies
- optional delta layers
- smarter previews
Speculative Uploads (Because Humans Don’t Wait)
One idea I keep coming back to is speculative transport:
When someone is editing a file, we already know something:
- there will be a “next version”
- much of the content stream can be chunked as it’s being produced
- we can start uploading chunks before the final render completes
This is the same kind of thinking that makes modern web apps feel instant:
- pipeline the work
- overlap compute and network
- reduce perceived latency
For large media pipelines, “upload time” is part of the workflow tax.
If DITS can shrink that tax by shipping incrementally, it changes how collaboration feels.
What DITS Is Not Trying to Be (Yet)
This part matters because it’s easy to assume too much.
DITS is not (right now):
- a polished UI
- a full asset manager
- a replacement for every DAM / MAM
- a Dropbox competitor
It’s closer to:
- a content-addressed engine
- a history graph
- a transport + dedup primitive
- a “Git-like substrate” for binary assets
The UI layer can come later.
The primitive must be right first.
The Shape of the Tooling (What I Imagine Using)
The end-user experience I want is boring in the best way:
-
dits init— initialize a workspace -
dits add <path>— chunk, hash, and track assets -
dits commit -m "trim intro"— create a version snapshot -
dits push— sync missing chunks + metadata -
dits pull— fetch what you don’t have -
dits checkout <version>— reconstruct the assets
Under the hood it’s complicated.
On the surface it should feel as natural as Git.
Because that’s how you know you’ve built a good abstraction:
the complexity is there, but the workflow is simple.
The Goal: Make “History” Normal for Binary Work
The biggest cultural shift Git gave developers was:
history isn’t optional
Binary workflows still live in a world where history is either:
- manual (“v7_final”)
- expensive (duplicating huge files)
- fragile (ad hoc scripts)
- centralized (vendor lock-in)
If DITS can make “binary history” normal — cheap, verifiable, collaborative — it becomes a missing piece in modern production pipelines.
Where This Goes Next
Right now DITS is a public build-in-the-open project:
- ideas → prototypes → measurements → iteration
The near-term goals are:
- prove chunking + manifests behave as expected
- prove sync efficiency
- prove reconstruction speed
- test on real media workloads (video, 3D assets, etc.)
- validate the graph model for versions and references
Longer term:
- remote chunk stores
- distributed caching
- integrity + verification tooling
- smarter diff/preview layers
- “DITSHub” style compute services for heavy reconstruction workflows (optional)
But the foundation remains the same:
make change cheap.
If You Want to Help (Or Tear It Apart)
I genuinely want:
- skepticism
- edge cases
- “this will fail because X”
- “have you considered Y”
- “here’s a paper to read”
Because this is the kind of idea that only survives if it’s attacked early.
Repo: https://github.com/byronwade/dits
Site: https://dits.byronwade.com
If you’ve dealt with:
- video pipelines
- game asset workflows
- ML dataset versioning
- distributed storage / dedup systems
…tell me what would make this actually useful in your world.
And if you’re still naming files final_final_v12.zip, you’re among friends here.
Top comments (0)