Byron Wade

Posted on Dec 14, 2025

DITS: Rebuilding “Git” for Video, Game Assets, and the Stuff That Breaks Git

#webdev #programming #productivity #rust

If you’ve ever tried to version a 10GB video, a folder of Unreal assets, or a dataset that mutates daily, you’ve probably had the same experience:

Git is incredible… right up until it isn’t.

And then the workflow becomes some combination of:

“Just upload it to Drive”
“Zip it and put it in Slack”
“Name it final_final_v12_really_final.zip”
“Please don’t touch that folder, it’s fragile”
“We’ll figure out versioning later” (Narrator: they never do)

DITS is my attempt to attack that problem at the root.

Repo: https://github.com/byronwade/dits

Concept site: https://dits.byronwade.com

This post is long because the problem is not “a feature missing.”

It’s a primitive missing.

Why I Started Thinking About This

Git changed the world because it gave us a new way to think:

history is a first-class citizen
collaboration is distributed
storage is optimized around changes, not copies
identity is based on content, not filenames

But Git assumes the thing you’re versioning is text-like:

line-based diffs
merge-friendly changes
small-ish files
frequent small commits

That’s not how video or game assets behave.

A one-second trim in a video editor can rewrite massive spans of bytes.

A small change in a 3D scene can reshuffle asset metadata.

A “minor” export can produce a completely different binary layout.

So when people say “just use Git LFS,” it’s usually because they’ve never tried to scale a workflow where the primary artifacts are large binaries.

Git LFS can help with storage location.

But it doesn’t solve the deeper issues:

the network cost of change
the lack of meaningful diffs
the inefficiency of re-uploading big blobs
the inability to treat big assets as collaborative, composable history

I wanted something that preserves Git’s spirit while being honest about binary reality.

The Core Question

What is Git’s real magic?

Most people think it’s version history or branches.

I think Git’s core magic is this:

You don’t store files.

You store content, and you store it in a way that makes “change” cheap.

A commit in Git is essentially:

a snapshot pointer to a tree of objects
built from content-addressed data
where unchanged content is reused automatically

That model is insanely powerful.

So the real question became:

Can we build that same model for large binary assets — where change is cheap — even if the file itself is hostile to diffing?

What “Cheap Change” Means for Binary Media

For text:

a few lines change → store a small diff → ship a small diff

For video / binaries:

a small edit can cause huge byte-level changes → naive diff becomes meaningless

So we need a different strategy.

The strategy I’m exploring with DITS is:

Content-defined chunking

Instead of treating a file as one giant blob, you split it into chunks.

But importantly: not fixed-size chunks.

Fixed-size chunking is fragile: insert data at the front and everything shifts → every chunk changes → dedupe fails.

Content-defined chunking (like FastCDC and related approaches) anchors chunk boundaries based on the content stream itself, so edits tend to affect local regions rather than shifting the whole file.

That gives you two huge wins:

Dedup across versions (reuse unchanged chunks)
Dedup across projects (identical chunks used in different files become shared)

If you’ve ever wondered why Git feels “magical” even when you branch like a raccoon on espresso… this is basically why: reuse.

The Mental Model of DITS

Here’s the version I keep in my head:

A “file” is a manifest
The manifest points to chunks
Chunks are content-addressed (hash-identified)
A “version” is a manifest + metadata + parent pointer(s)
Syncing is: “do you already have these chunks? no? here are the missing ones.”

So rather than uploading a 20GB file every time, you’re uploading:

some metadata
and the subset of chunks that are actually new

This is the first principle I care about:

Bandwidth cost should scale with the edit, not the file size.

Why This Is Bigger Than “Another Tool”

This isn’t just about “file storage.”

This is about building a system where you can:

collaborate on large media assets
branch and merge workflows
maintain history with integrity
cache and reuse data efficiently
synchronize across machines without redundancy

If it works, it affects:

indie game development
film & editing pipelines
ML datasets
CAD and 3D assets
and frankly… cloud providers who pay the tax for redundant upload/download

A lot of modern workflows are basically:
“just re-upload the world because versioning is annoying.”

That’s insane.

The Hard Parts (And Why They’re Interesting)

This is where the project gets real.

1) Chunking is not “just split it”

Chunking affects everything:

dedup ratio
reconstruction speed
memory usage
integrity verification
parallelization
latency

DITS needs chunking that is:

stable across small edits
fast enough to run on normal machines
tunable for different file types
safe against pathological cases

2) Delta compression vs chunk reuse

Chunk reuse gets you far — but not always.

Sometimes you have chunks that are similar but not identical.

Delta compression can reduce storage further, but then reconstruction gets heavier and you introduce new complexity:

delta chains
dependency graphs
worst-case rebuild time
corruption blast radius

A big part of the thought process is:
where to be aggressive vs where to stay simple.

3) Reconstruction speed matters as much as storage

If you store a file in a thousand chunks, you have to rebuild it.

That rebuild must be:

fast
parallelizable
streamable
resilient

A system that saves bandwidth but makes checkout take 5 minutes is just a new kind of pain.

4) Metadata is the real “Git layer”

Git isn’t “files.” Git is metadata about files.

Same here.

DITS needs a robust model for:

manifests
versions
history
references
integrity verification
partial fetch (“give me only what I need to preview”)

This is where the system becomes a platform.

A Concrete Example (The “One-Second Edit” Problem)

Imagine a 5GB video file.

You open it, trim 1 second, export again.

Most systems treat this as:

Old file: 5GB
New file: 5GB
Upload: 5GB again
Storage: 10GB

But conceptually, the edit is tiny.

With chunk-based versioning:

many chunks remain identical
some chunks near the edit region change
some metadata changes

Best-case outcome:

you store and transmit a fraction of the file

Even if video codecs cause changes beyond the trimmed region, you still often get meaningful reuse depending on format and chunking strategy.

And even if you don’t get perfect reuse, the system gives you a framework to improve:

file-type aware chunking
codec-aware strategies
optional delta layers
smarter previews

Speculative Uploads (Because Humans Don’t Wait)

One idea I keep coming back to is speculative transport:

When someone is editing a file, we already know something:

there will be a “next version”
much of the content stream can be chunked as it’s being produced
we can start uploading chunks before the final render completes

This is the same kind of thinking that makes modern web apps feel instant:

pipeline the work
overlap compute and network
reduce perceived latency

For large media pipelines, “upload time” is part of the workflow tax.

If DITS can shrink that tax by shipping incrementally, it changes how collaboration feels.

What DITS Is Not Trying to Be (Yet)

This part matters because it’s easy to assume too much.

DITS is not (right now):

a polished UI
a full asset manager
a replacement for every DAM / MAM
a Dropbox competitor

It’s closer to:

a content-addressed engine
a history graph
a transport + dedup primitive
a “Git-like substrate” for binary assets

The UI layer can come later.

The primitive must be right first.

The Shape of the Tooling (What I Imagine Using)

The end-user experience I want is boring in the best way:

dits init — initialize a workspace
dits add <path> — chunk, hash, and track assets
dits commit -m "trim intro" — create a version snapshot
dits push — sync missing chunks + metadata
dits pull — fetch what you don’t have
dits checkout <version> — reconstruct the assets

Under the hood it’s complicated.

On the surface it should feel as natural as Git.

Because that’s how you know you’ve built a good abstraction:

the complexity is there, but the workflow is simple.

The Goal: Make “History” Normal for Binary Work

The biggest cultural shift Git gave developers was:

history isn’t optional

Binary workflows still live in a world where history is either:

manual (“v7_final”)
expensive (duplicating huge files)
fragile (ad hoc scripts)
centralized (vendor lock-in)

If DITS can make “binary history” normal — cheap, verifiable, collaborative — it becomes a missing piece in modern production pipelines.

Where This Goes Next

Right now DITS is a public build-in-the-open project:

ideas → prototypes → measurements → iteration

The near-term goals are:

prove chunking + manifests behave as expected
prove sync efficiency
prove reconstruction speed
test on real media workloads (video, 3D assets, etc.)
validate the graph model for versions and references

Longer term:

remote chunk stores
distributed caching
integrity + verification tooling
smarter diff/preview layers
“DITSHub” style compute services for heavy reconstruction workflows (optional)

But the foundation remains the same:
make change cheap.

If You Want to Help (Or Tear It Apart)

I genuinely want:

skepticism
edge cases
“this will fail because X”
“have you considered Y”
“here’s a paper to read”

Because this is the kind of idea that only survives if it’s attacked early.

Repo: https://github.com/byronwade/dits

Site: https://dits.byronwade.com

If you’ve dealt with:

video pipelines
game asset workflows
ML dataset versioning
distributed storage / dedup systems

…tell me what would make this actually useful in your world.

And if you’re still naming files final_final_v12.zip, you’re among friends here.

DEV Community