Taking Nexio from Local-Only to Collaborative
๐งญ Introduction
In my previous posts, I covered how I built Nexio from scratch as a local VCS, then optimized storage with content-addressable blobs, and most recently migrated metadata from JSON to SQLite. Through all of that, Nexio had one glaring limitation printed right in the README: "No remote repository support."
A version control system that can't sync between machines isn't very useful beyond a personal backup tool. In this post, I'll walk through how I added remote state management to Nexio using AWS S3, enabling push, pull, and clone operations โ the three pillars of collaborative version control.
๐ฏ The Goal
Three new commands, each addressing a core workflow:
| Command | Purpose |
|---|---|
nexio push |
Upload local commits and blobs to S3 |
nexio pull |
Download remote commits and blobs, merge into local |
nexio clone |
Create a full local copy from a remote S3 source |
The non-goals were equally important: no merge conflict resolution (push fails if remote is ahead), no multi-backend abstraction (S3 only for now), and no credential management (rely on the standard AWS credential chain).
๐๏ธ Architecture Decisions
Why S3?
S3 is ubiquitous, cheap, and durable. Most developers already have AWS credentials configured, and S3's eventual consistency model is fine for our use case since we use locking to serialize push/pull operations. There's no need for a custom server โ S3 acts as a dumb object store and Nexio handles all the logic client-side.
Remote Storage Layout
The remote mirrors the local .nexio/ structure inside an S3 prefix:
s3://my-bucket/nexio-repo/
โโโ index.db # Full SQLite database
โโโ objects/ # Content-addressable blob store
โ โโโ ab/
โ โ โโโ 3f7c9e2d... # Compressed blob (identical to local)
โ โโโ ...
โโโ config.json # Repository config
โโโ nexio.lock # Lock file for push/pull serialization
Why Upload the Full Database?
This was the biggest design decision. I had two options:
- Row-level sync: Track which commits/files/branches changed and sync individual rows
-
Full database upload: Upload the entire
index.dbon push, download and merge on pull
I chose option 2. The SQLite database is small โ typically under 1MB even for repositories with thousands of commits. Uploading the whole thing avoids the complexity of tracking deltas, handling schema versions across remotes, and resolving partial sync failures. The trade-off is a bit more bandwidth, but for a file that's rarely more than a few hundred KB, this is negligible.
URL Format
Remotes use the S3 URL scheme:
s3://bucket/prefix
For example: s3://my-bucket/team/project-alpha. The prefix acts as the repository root โ all objects are stored under this prefix, which means you can host multiple Nexio repositories in a single bucket.
๐ Remote Locking
Concurrent push/pull operations could corrupt the remote state. I needed a locking mechanism, and I had two realistic options:
| Option | Pros | Cons |
|---|---|---|
| DynamoDB | Strongly consistent, atomic | Extra AWS service, more config |
| S3 object lock | Simple, no extra dependencies | Not truly atomic (race condition window) |
I went with the S3 approach โ a nexio.lock file stored at the remote prefix. For a tool used by small teams, the tiny race condition window is acceptable, and it avoids requiring DynamoDB setup.
The lock file contains:
{
"holder": "John Doe <john@example.com>",
"timestamp": "2026-03-06T12:00:00Z",
"operation": "push"
}
Three rules govern the lock:
- Fresh lock exists โ abort with a message showing who holds it
- Stale lock (older than 5 minutes) โ overwrite it, assume the holder crashed
-
--forceflag โ overwrite regardless
The lock is always released in a defer, so even panics or errors trigger cleanup.
๐ค Push
The push algorithm is designed to minimize data transfer by leveraging the existing content-addressable blob store:
1. Validate โ No uncommitted staged changes
2. Lock โ Acquire remote lock
3. Download โ Fetch remote index.db (if exists)
4. Diff โ Compare local vs remote commit IDs
5. Fast-forward โ Verify remote HEAD is ancestor of local HEAD
6. Upload blobs โ Only blobs that don't exist remotely (HeadObject check)
7. Upload DB โ Replace remote index.db with local
8. Clean โ Remove orphaned blobs locally
9. Release lock
The fast-forward check is critical. If the remote has commits that don't exist locally, someone else pushed changes that you haven't pulled yet. Rather than silently overwriting their work, push aborts with: "Remote has commits not present locally. Run nexio pull first."
The blob deduplication on transfer is where Nexio's content-addressable architecture really pays off. Before uploading each blob, a HeadObject call checks if it already exists remotely. Same content, same hash โ no need to upload it again. This means pushing a commit that modifies one file in a repository with thousands of tracked files only uploads that one changed blob.
๐ฅ Pull
Pull is conceptually the inverse of push, but the merge step is more involved:
1. Validate โ No uncommitted staged changes
2. Lock โ Acquire remote lock
3. Download โ Fetch remote index.db to temp file
4. Diff โ Compare remote vs local commit IDs
5. Fast-forward โ Verify local HEAD is ancestor of remote HEAD
6. Download blobs โ Only blobs that don't exist locally
7. Merge DB โ Integrate remote data into local database
8. Sync workdir โ Restore new/changed files, remove deleted files
9. Clean โ Remove orphaned blobs locally
10. Release lock
Database Merge via ATTACH
The most interesting part of pull is the database merge. SQLite has a built-in ATTACH DATABASE command that lets you open a second database and query across both in a single transaction:
ATTACH DATABASE '/tmp/remote_index.db' AS remote;
-- Insert new commits (ignore if already exist)
INSERT OR IGNORE INTO commits
SELECT * FROM remote.commits;
-- Insert new files
INSERT OR IGNORE INTO files
SELECT * FROM remote.files;
-- Update branch heads to match remote
UPDATE branches SET head_commit = (
SELECT head_commit FROM remote.branches
WHERE remote.branches.name = branches.name
) WHERE name IN (SELECT name FROM remote.branches);
DETACH DATABASE remote;
This is elegant because it handles the merge atomically โ either everything succeeds or nothing changes. No custom diff logic, no conflict resolution code. The INSERT OR IGNORE pattern works because commit IDs and file record IDs are unique hashes; if they already exist locally, they're identical to what's on the remote.
Working Directory Sync
There was a subtle bug in my first implementation: pull downloaded the blobs and merged the database correctly, but never actually updated the files on disk. Running nexio workdir would show the new files (database was updated), but they were invisible in the actual working directory.
The fix was adding a working directory sync step after the merge. It:
- Captures the old HEAD commit before merging
- Compares old vs new HEAD file lists by blob hash
- Restores new or changed files via
RestoreBlob() - Removes files that were tracked in the old HEAD but no longer exist in the new HEAD
This is essentially a lightweight checkout operation that only touches files that actually changed.
๐ Clone
Clone is the simplest of the three commands โ it's basically "create directory, download everything, restore working directory":
1. Parse args โ Remote URL + optional local directory name
2. Verify remote โ Check that index.db exists at the remote
3. Create dirs โ .nexio/ and .nexio/objects/
4. Download DB โ Fetch index.db
5. Download blobs โ Fetch all objects
6. Write config โ Set remote URL in config.json
7. Restore โ Check out HEAD commit to working directory
One detail worth noting: clone automatically writes the remote URL into the cloned repository's config.json, so nexio push and nexio pull work immediately without additional configuration. This mirrors the Git experience where git clone sets up the origin remote for you.
โ๏ธ Configuration
The remote URL is stored in the existing config.json:
{
"name": "John Doe",
"email": "john@example.com",
"remote": "s3://my-bucket/nexio-repo"
}
I added set remote and get remote subcommands to nexio config:
nexio config set remote s3://my-bucket/nexio-repo
nexio config get remote
All three commands also accept a --remote flag to override the configured remote for a single operation, which is useful for pushing to a different location without changing your default config.
๐งฉ The Implementation
The implementation added five new files and modified four existing ones:
| New File | Purpose |
|---|---|
remote.go |
S3 client init, URL parsing, upload/download/list helpers |
remote_lock.go |
Lock acquire/release with staleness detection |
push.go |
Push command and blob diff logic |
pull.go |
Pull command, DB merge via ATTACH, working directory sync |
clone.go |
Clone command, full download, working directory restore |
The AWS SDK dependency (aws-sdk-go-v2) is the only new external dependency. Credentials are resolved through the standard AWS chain: environment variables, ~/.aws/credentials, IAM roles โ Nexio doesn't manage credentials itself.
๐งช A Complete Workflow
Here's what a typical multi-machine workflow looks like:
# Machine A: Initialize and push
nexio init
nexio config set name "Alice"
nexio config set email "alice@example.com"
nexio config set remote s3://team-bucket/project
nexio stage .
nexio commit -m "Initial commit"
nexio push
# Machine B: Clone and make changes
nexio clone s3://team-bucket/project
cd project
echo "new file" > feature.txt
nexio stage feature.txt
nexio commit -m "Add feature"
nexio push
# Machine A: Pull the changes
nexio pull
# feature.txt now appears in the working directory
๐ก Lessons Learned
Content-addressable storage makes sync easy. The blob deduplication from the previous optimization wasn't just about saving disk space โ it made remote sync almost trivial. Same hash means same content, so a
HeadObjectcheck is all you need to know whether to transfer a blob.SQLite's ATTACH is powerful. The ability to open two databases and merge them in a single transaction eliminated an entire class of merge complexity. No custom diff algorithms, no conflict tracking โ just
INSERT OR IGNORE.Don't forget the working directory. The database and blob store are the "truth," but users interact with files on disk. My initial pull implementation updated the database perfectly but left the working directory stale. Always sync the user-visible state.
Simple locking is good enough to start. A DynamoDB-based lock would be more correct, but the S3 object approach works well for small teams and avoids extra infrastructure. You can always upgrade later.
The first push initializes the remote. There's no explicit "init remote" command โ pushing to an empty S3 prefix creates the entire remote structure. This keeps the UX simple and avoids a separate setup step.
๐ฎ Future
With remote sync working, several improvements become natural next steps:
- Parallel blob uploads/downloads: Currently sequential; parallelizing with goroutines would significantly speed up large transfers
- Progress bars: Nice UX improvement for long-running push/pull operations
- DynamoDB locking: For teams that need stronger concurrency guarantees
- Conflict detection and resolution: Currently, diverged histories are rejected; supporting merges would be a significant feature
๐ Resources
- AWS SDK for Go v2 - The AWS SDK used for S3 operations
- SQLite ATTACH DATABASE - How SQLite handles cross-database queries
- S3 Consistency Model - Strong read-after-write consistency since 2020
๐ป Check out Nexio at GitHub.
You can also read this post on my portfolio page.
Top comments (0)