Odejobi Abiola Samuel

Posted on Jul 5

I Built a Sync Engine for a 0-Dep Client-Side Database, Here's What I Learned

#typescript #database #opensource #webdev

Sync logic outsized the core database code

I spent three months adding offline sync to ctrodb — a client-side database I'd been working on for about a year before that. The database itself was straightforward: schema validation, queries, a reactive signal system. Couple thousand lines. Nothing wild.

Sync was different.

It took more code than the rest of the database combined. I rewrote the conflict resolver three times. I shipped a transport system I later threw out. I learned what "eventually consistent" actually means when you're the one who has to implement it.

This is what I learned.

npm: npm install ctrodb
GitHub: github.com/ctrotech-tutor/ctrodb
Docs: ctrodb.vercel.app/docs

The Hard Part Is Not the Data Transfer

When I started, I thought sync would be: get data from server, merge with local data, done.

The first version was exactly that. A push endpoint, a pull endpoint, a simple merge. It worked for the demo. Then I tried editing the same record on two devices at once.

The second device's push would silently overwrite the first device's changes. Data gone. No notification. No way to recover.

That's when I realized sync isn't about moving bytes. It's about knowing what changed, in what order, and what to do when two changes conflict.

Change Tracking: The Foundation

The sync engine is a plugin that hooks into the database's write pipeline. When you create, update, or delete a record, the plugin writes a change record to an internal change log.

interface SyncChangeRecord {
  id: string
  collection: string
  recordId: string
  type: "create" | "update" | "delete"
  data: Record<string, unknown> | null
  prevData: Record<string, unknown> | null
  timestamp: string
  status: "pending" | "syncing" | "committed" | "failed"
  retries: number
  errorMessage: string | null
  createdAt: string
  updatedAt: string
}

The change log write happens inside the same transaction as the main data write. If one fails, both fail. No orphaned changes.

The plugin hooks are onAfterCreate, onAfterUpdate, and onAfterDelete. Each one appends to the change log.

const plugin: CtroDBPlugin = {
  name: "sync",
  async onDatabaseInit(db) {
    tracker = new ChangeTracker(db._getAdapter())
    engine = new SyncEngine(db, config)
    await engine.init()
  },
  async onAfterCreate(collection, record) {
    await tracker.append("create", collection, record.id, record)
  },
  async onAfterUpdate(collection, id, record, oldRecord) {
    await tracker.append("update", collection, id, record, oldRecord ?? null)
  },
  async onAfterDelete(collection, id, oldRecord) {
    await tracker.append("delete", collection, id, null, oldRecord ?? null)
  },
}

The plugin also registers store indexes — the change log is an IndexedDB object store with indexes on status and timestamp.

Pull: Cursor-Based Pagination

Pull requests ask the server for changes since the last sync. The actual implementation uses cursor-based pagination, not sequence numbers.

interface SyncTransport {
  pull(options?: {
    cursor?: string | null
    collections?: string[]
    batchSize?: number
    signal?: AbortSignal
  }): Promise<{
    changes: SyncChangeRecord[]
    cursor: string | null
    hasMore: boolean
  }>
}

The cursor is an opaque token — usually a timestamp or a record ID. The client sends cursor: null for the first pull (fetch everything) and the last cursor from the previous sync for subsequent pulls.

The server returns changes plus the next cursor. If hasMore is true, the client keeps pulling in a loop (up to 1000 pages, safety valve).

Push: Batch and Retry

Push collects all pending and failed changes, sends them in batches, and processes the server's response.

async function pushChanges() {
  const pending = await tracker.getPending() // pending + failed, sorted by timestamp
  if (pending.length === 0) return

  const batch = pending.slice(0, pushBatchSize)
  await tracker.markSyncing(batchIds)

  try {
    const result = await transport.push(batch, { signal })
    for (const accepted of result.accepted) tracker.markCommitted(accepted.id, ...)
    for (const conflict of result.conflicts) await resolveConflict(conflict)
    for (const err of result.errors) await tracker.markFailed(err.id, err.error)
  } catch {
    // Mark back as pending on network failure
  }
}

The server returns three lists: accepted (applied), conflicts (need resolution), errors (rejected). Each is handled differently.

Conflict Resolution: The Part I Got Wrong

My first conflict resolver used last-write-wins (strategy: "lww"). Whoever wrote last, their data survives. Simple. Wrong.

Consider this: Alice edits a note's title on her phone while offline. Bob edits the same note's body on his laptop, also offline. Both sync when they reconnect. With last-write-wins, whichever syncs second loses. Bob's body edit gets wiped because Alice's phone happened to flush its sync queue a few seconds later.

That's not Bob's fault. He changed different fields. The data should merge.

The built-in strategies are:

type ConflictStrategy = "lww" | "client-wins" | "server-wins" | "custom"

"lww" — last-write-wins by timestamp. Default.
"client-wins" — local version always wins.
"server-wins" — remote version always wins.
"custom" — you provide a conflictResolver function.

For field-level merge, use the custom resolver:

const db = new Database({
  name: "my-app",
  schema: { ... },
  plugins: [syncPlugin({
    transport,
    strategy: "custom",
    conflictResolver: (conflict) => {
      const merged = { ...conflict.local, ...conflict.remote }
      return { resolution: "merged", merged }
    },
  })],
})

The conflict object gives you both versions and a list of conflicting fields:

interface SyncConflict {
  changeId: string
  recordId: string
  collection: string
  local: Record<string, unknown> | null
  remote: Record<string, unknown> | null
  localTimestamp: string
  remoteTimestamp: string
  fieldConflicts: string[]
}

The resolver returns a ConflictResolution:

type ConflictResolution = {
  resolution: "local" | "remote" | "merged"
  merged?: Record<string, unknown> | null
}

If you return "local", the server version is discarded. "remote" overwrites local. "merged" applies your merged object.

Transport: HTTP with a Single URL

The HTTP transport takes a single base URL, not separate push/pull URLs. It appends /push and /pull paths automatically.

const transport = new HttpTransport({
  url: "https://api.myapp.com/sync",
  headers: { Authorization: "Bearer ..." },
  timeoutMs: 10000,
})

It implements connect(), disconnect(), isConnected(), push(), and pull(). Push sends a POST to /push, pull sends a POST (or GET) to /pull.

I added rate limiting detection — if the server returns 429, the transport parses the Retry-After header and surfaces it. The sync engine uses this to back off.

WebSocket transport is available for real-time push scenarios, but most apps don't need it. HTTP is simpler, stateless, and easier to debug.

What I'd Do Differently

Test with real offline scenarios earlier. My first tests used simulated networks with controlled latency. Real offline is messier — partial connectivity, brief flips between online and offline, race conditions on reconnect. I added a faulty transport helper that drops every nth request and randomizes delay. That caught bugs throttling never triggered.

Build devtools from day one. Debugging sync is hard because the state is distributed. The devtools — inspectSyncQueue(db), getSyncStats(db), retryFailedSync(db) — saved me more times than I can count. I added them halfway through. Should have been first.

import { inspectSyncQueue, retryFailedSync } from "ctrodb"

const queue = await inspectSyncQueue(db)
console.log(queue.stats)
// { total: 42, pending: 3, syncing: 0, committed: 38, failed: 1 }

await retryFailedSync(db) // marks failed items back to pending and triggers sync

Don't try to handle every edge case. I spent two weeks on a sync ordering dependency system — ensuring referenced records sync before their dependents. The database already handled this at the app level. I removed it and sync worked fine.

Trade-offs I'm Still Thinking About

The current design trades real-time collaboration for simplicity. Two users on the same document won't see each other's changes until they sync. True real-time would need a CRDT or OT layer.

But for most apps — todos, notes, offline-first dashboards — the sync-every-few-seconds model works. The complexity of CRDTs isn't worth it until you're building Google Docs.

npm: npm install ctrodb
Sync code: github.com/ctrotech-tutor/ctrodb (in src/sync/)
Sync docs: ctrodb.vercel.app/docs/sync/overview

Top comments (11)

mote • Jul 9

The change log written in the same transaction as the data write is the bit most people skip and then regret. Orphaned changes are how you get a "synced" database that's quietly lying to you. Conflict resolution is where it gets expensive though. Last-write-wins is fine until two devices edit the same record and the first edit disappears with no signal.

We hit the same wall building moteDB, a Rust embedded multimodal store. The write-plus-changelog atomicity is the same pattern, except on embedded and edge devices the "server" is usually another local process or a peer, not a cloud. Curious how ctrodb behaves when the sync target is unreachable for days instead of seconds. Did you ever measure change-log growth on a long-lived offline client, or is compaction something you pushed down the road?

Odejobi Abiola Samuel • Jul 11

The write-plus-changelog atomicity turned out to be one of those things that looks obvious in hindsight but is easy to miss in a first pass, glad it came through clearly.

On compaction: compactSyncQueue deduplicates superseded pending/failed changes for the same record, but committed entries accumulate until clearCommittedSync runs (auto-triggered after a successful push). Haven't stress-tested it with weeks of offline use yet, that's on the near-term test list. If growth becomes a problem on real devices, I'll likely add a max-age TTL and a periodic vacuum before shipping the production sync docs.

Still actively testing the offline durability story. Appreciate the signal on what matters at the embedded/edge scale, that's a whole different class of constraints.

mote • Jul 12

Thanks for the detailed walkthrough of the compaction pipeline. The committed-vs-pending split is clean. One thing we hit with moteDB's WAL on edge devices: if you TTL-vacuum entries before a long-delayed sync, there's no merge base left to reconstruct from. A minimum retained batch count as a safety floor solved it. Wondering if you're considering something similar alongside the age-based TTL, or if the auto-trigger after successful push is strict enough that you don't need it.

Raju Dandigam • Jul 7

“Sync isn’t about moving bytes” is the sentence that usually separates the demo from the real system. The first overwrite-on-two-devices moment is where offline-first work stops being a transport problem and becomes a history/conflict problem, and your writeup frames that transition clearly. I also appreciate that you highlighted how much more code sync demanded than the base database, because teams routinely underestimate that multiplier. For browser-local systems, the quality of change tracking and merge semantics matters more than the storage layer itself once real users start editing in parallel. Curious what finally made the third conflict resolver feel correct enough to keep instead of rewrite again?

Odejobi Abiola Samuel • Jul 11

Thanks, appreciate the close read. The third resolver stuck because I stopped trying to handle every case upfront and made the engine composable instead. The first two were attempts at a universal strategy (field-level auto-merge first, then a CRDT-lite thing) and both collapsed under their own complexity.

The keeper is the "custom" strategy hook. The engine tracks conflicting fields and surfaces both versions, but what actually happens is the caller's decision. That made the resolver itself trivial, the hard part is just plumbing the data correctly so the app can decide. Once I accepted that the library doesn't know the business semantics of a record, the whole thing got simpler.

Still iterating on it. More strategies and better dev tooling for debugging conflicts are in progress.

Vinicius Pereira • Jul 5

Really like that you led with the failure cases instead of a polished "look what I built." The one I would push to the top of the list, and it is the exact bug hiding behind "tested with simulated networks first," is idempotency on push. Your syncing -> pending revert on network failure is correct except for the single ordering that actually matters: the server commits the batch and THEN the response gets lost. Client sees a failure, reverts to pending, re-pushes the same ops, and now the change is applied twice server-side with nothing on the client aware of it. A client-generated change id per op that the server dedups on turns the replay into a no-op. W/o it, retries are only safe as long as the network fails in the polite direction, which real networks don't.

The other one waiting for you is deletes. Once onAfterDelete is just another change-log entry, delete-vs-edit across two offline clients is the nastiest conflict in the whole engine: LWW by timestamp will either resurrect a row someone deleted or silently drop an edit, depending on whose clock ran ahead. Tombstones (the delete syncs as a marker instead of an absence) kill the resurrection, and then you inherit the genuinely hard part, deciding when it is safe to GC a tombstone w/o a client that has been offline for a month re-introducing the row on its next sync. Worth designing before you have delete data in the wild, retrofitting it is brutal. And if you ever want to de-fang the LWW clock-skew problem w/o going full CRDT, a hybrid logical clock is the cheap middle ground: it makes "last" mean last in a skew-resistant order instead of trusting two drifted wall clocks.

Odejobi Abiola Samuel • Jul 11

This is gold , thanks for writing it up.

The idempotency hole is real and I've already been burned by it in testing. Client-generated change IDs are in the tracker already (every SyncChangeRecord has an id field), but the server-side dedup logic isn't wired yet. That's the next thing on the build list before the sync transport docs go live.

On deletes: you're right that tombstones are the honest answer, and tombstone GC is where the real pain lives. The change log already records delete operations as full entries (type, recordId, prevData, timestamp), so the substrate for it is there, what's missing is the server protocol for tombstone acknowledgment before pruning. Haven't shipped it yet. Still designing the ack lifecycle.

HLC is something I've been reading up on but haven't committed to. The current timestamp is new Date().toISOString() which is obviously naive. LWW is the default for a reason (simplicity), but I want to offer better defaults before calling sync production-ready. Appreciate the nudge.

Still building and testing all of this, the posts are the real-time design notes more than a finished manual. Feedback like this directly shapes what gets fixed next.

Vinicius Pereira • Jul 11

The ack lifecycle might be cheaper than it looks: you may not need per-tombstone acknowledgment at all. This is the same trick Postgres replication slots use, the WAL only advances past what every consumer has consumed. If each client already syncs from a monotonic cursor over the change log, "acked" falls out for free, a tombstone is prunable once every known client's cursor has passed its sequence. The one real decision left is the policy knob for the client that never comes back: pick a max offline age, and past it the cursor is invalidated and the client does a full resync on return. That single rule GCs tombstones and kills the month-offline row resurrection in the same move, because the stale client never gets to replay its old ops at all.

Mads Hansen • Jul 12

“Sync is not about moving bytes” is the key lesson.

The hard part is deciding what a conflict means in product language. Record-level last-write-wins is simple until the record represents something users think of as multiple independent facts. Then a title edit, status change, and comment append may each need different merge rules.

I like that you kept a change log. That is the piece teams often skip in the first version, and it becomes painful later. A durable sync log is not just for retries; it is also how you explain to a user why their local edit lost, merged, or needs manual resolution.

Valentyn Kit • Jul 15

Rewriting the conflict resolver three times is the rite of passage that ends at either CRDTs or a version vector, right about when last-write-wins first eats a write under clock skew. The transport you threw out was probably the cheaper lesson of the two.

PubliFlow • Jul 18

Conflict resolution in client-side sync is notoriously tricky, especially when dealing with concurrent offline edits without relying on heavy CRDT libraries. I'm curious how you handled the transport layer's backpressure when a client comes back online with a massive queue of local changes. I actually ran into similar offline-first architecture headaches when putting together our SaaS boilerplate, PubliFlow, though we ended up leaning on Supabase's real-time subscriptions to avoid building the sync engine from scratch.

View full discussion (11 comments)