DEV Community

Cover image for Make git clone work between encrypted P2P FUSE peers (7 bugs)
Marius-Florin Cristian
Marius-Florin Cristian

Posted on • Originally published at keibisoft.com

Make git clone work between encrypted P2P FUSE peers (7 bugs)

Part of the KEIBIDROP development blog. KEIBIDROP is in active development. Release is coming soon.

KEIBIDROP mounts a virtual FUSE filesystem for each peer. When Peer A creates or modifies files, the changes sync to Peer B's mount in real-time through encrypted gRPC. We already made git work on a single FUSE mount (previous post). The question was: what happens when Peer A clones a repo into their mount, and Peer B tries to use it through theirs?

Bob (FUSE mount)                     Alice (FUSE mount)
+-- .git/                            +-- .git/          <- synced from Bob
|   +-- HEAD                         |   +-- HEAD
|   +-- config                       |   +-- config
|   +-- objects/pack/                |   +-- objects/pack/
|   |   +-- pack-abc123.pack (2MB)   |   |   +-- pack-abc123.pack
|   +-- refs/heads/main              |   +-- refs/heads/main
+-- go.mod                           +-- go.mod
+-- README.md                        +-- README.md
Enter fullscreen mode Exit fullscreen mode

Seven bugs appeared across three debugging sessions.

Bug 1: The rename race

Git uses atomic writes: create config.lock, write content, close, rename to config. KEIBIDROP sends ADD_FILE on close and RENAME_FILE on rename. The prefetch goroutine would download config.lock, finish before the RENAME_FILE arrived, and skip the disk rename because the in-memory path hadn't changed yet.

Fix: the RENAME_FILE handler now does os.Rename on disk before updating maps. The prefetch cleanup becomes a redundant safety net.

Bug 2: Pack file corruption (20 missing bytes)

Alice's pack file was 2,182,186 bytes. Bob's was 2,182,206 bytes. Git's index-pack appends a 20-byte SHA-1 checksum after the file is closed but before the rename. KEIBIDROP sent the ADD_FILE notification at close time with the pre-checksum size.

Fix: after renaming on disk, compare local file size with the size in the RENAME_FILE notification. If they differ, trigger a re-download.

Bug 3: macFUSE cache poisoning

git status showed deleted: go.mod even though the file existed on disk. macFUSE's negative_vncache option caches ENOENT results. macOS probed Alice's mount (Spotlight, fsevents) before the file arrived, the kernel cached "doesn't exist", and subsequent reads never reached our FUSE handler.

Fix: remove negative_vncache from mount options.

Bug 4: The 612-file hang

Cloning a large repo (612 files) hung after checkout completed. Every FUSE Release sent a gRPC notification. 612 files closing in rapid succession overwhelmed the connection.

Three-part fix:

  1. BatchNotify RPC -- batch 64 notifications into a single gRPC call
  2. Per-path debounce (200ms deadline, reset on each update) on the sender side
  3. Prefetch semaphore (max 8 concurrent) on the receiver side
rpc BatchNotify (BatchNotifyRequest) returns (BatchNotifyResponse);

message BatchNotifyRequest {
  repeated NotifyRequest notifications = 1;
  uint64 seq = 2;
  uint64 timestamp = 3;
}
Enter fullscreen mode Exit fullscreen mode

Bug 5: 190,000 log lines

After fixing the notification flood, the clone still hung for 30 seconds. Turns out we had 207,000 getattr calls and 190,000 ENOENT error logs, all written synchronously. macOS probes hundreds of paths per directory (Spotlight, fsevents, .DS_Store, resource forks) and without negative_vncache every probe hits our handler.

Fix: remove trace logs from hot-path handlers. Stop logging ENOENT as an error.

grep -c "FUSE getattr" Log_Bob.txt    207,137
grep -c "Failed to lstat" Log_Bob.txt  190,614
Enter fullscreen mode Exit fullscreen mode

Bug 6: Hook permission denied

fatal: cannot exec '.git/hooks/post-checkout': Operation not permitted. macOS Gatekeeper blocks executing scripts from FUSE mounts.

Fix: add defer_permissions to mount options.

Bug 7: LFS corruption (intermediate sizes + stale notifications)

Git LFS downloads a 420MB XML file incrementally into .git/lfs/incomplete/<hash>. Each intermediate close triggers ADD_FILE with the current size. The peer starts prefetching at 100MB, restarts at 200MB, restarts at 300MB. Then LFS renames incomplete/<hash> to objects/<sha>/<hash>, but the debounced ADD_FILE for the old path fires after the rename, pointing to a file that no longer exists.

Fix: per-path debounce with RENAME retargeting. When a RENAME arrives, the pending ADD_FILE for the old path gets its path rewritten to the new path instead of being deleted or sent stale.

case bindings.NotifyType_RENAME_FILE:
    if old, exists := pending[req.OldPath]; exists {
        delete(pending, req.OldPath)
        old.req.Path = req.Path  // retarget to new path
        pending[req.Path] = old
    }
    immediate = append(immediate, req)
Enter fullscreen mode Exit fullscreen mode

We tried three approaches: deleting the pending notification (broke content delivery), sending it stale (wrong path), retargeting (correct).

The result

After all fixes, the full git workflow works between encrypted P2P FUSE peers:

# Bob clones a repo
$ cd MountBob && git clone git@github.com:user/repo.git
Cloning into 'repo'... done.

# Alice sees it immediately
$ cd MountAlice/repo && git status
On branch main -- nothing to commit, working tree clean

# Alice creates a branch and commits
$ git checkout -b test_me
$ echo "test" > tst.txt && git add tst.txt && git commit -m "Commit for test"

# Bob sees the branch and commit
$ cd MountBob/repo && git log
commit 38e23bc (HEAD -> test_me)
    Commit for test
Enter fullscreen mode Exit fullscreen mode

Tested with a small repo (24 objects, 2MB, 8 files) and a large repo (2339 objects, 257MB + 3.71 GiB LFS, 612 files).

What we learned

  1. FUSE handlers must rename on disk, not just in memory. Goroutines race with notification handlers, especially for tiny files that download in milliseconds.
  2. Git writes data, closes the file, then renames. Between close and rename, other processes can modify the file (index-pack appending checksums). The rename notification has the correct size. Use it.
  3. macFUSE's negative_vncache silently caches ENOENT results. Our handler was never called. These bugs don't show up in FUSE logs because the kernel short-circuits the call.
  4. 612 files = 612 goroutines = 612 gRPC streams = connection collapse. A bounded channel + batching worker gives backpressure without blocking the caller.
  5. 190,000 synchronous log writes caused a 30-second hang. The logging seemed harmless with 20 files. 612 files with macOS probing turned it into 400,000 disk writes.
  6. For debounce + rename: you can't delete the pending notification (peer never gets content) and you can't send it stale (wrong path). Retarget it to the new path. Three attempts to get this right.
  7. None of these bugs are git-specific. Any application that creates files through one peer's mount and reads them through another would hit the same problems. Git just exercises every edge case.

More from the KEIBIDROP blog: full series | product page | FAQ

Top comments (0)