sravan27

Posted on Jun 6 • Edited on Jun 8

How I shipped 8 silent-row-loss fixes to one OSS database in 24 hours - the same class repeating

#javascript #webdev #database #correctness

Update June 8: tracking page with full audit summary + sprint pricing at https://silentdrop-sravan.vercel.app.

A JS database that re-implements SQL operators silently returns the wrong rows. Not occasionally. Not for one obscure case. Every day, in production, for any user data containing characters outside a-z and a 16-bit integer.

I went looking for one of these in PowerSync's sync rules engine last month. I found four, got them merged, and got paid for a 48-hour hardening sprint. This week I came back and found four more. All same class. None of them throw. All of them silently return the wrong rows.

Here are the four open today:

1. JOINs in Sync Streams silently sync zero rows

streams:
  user_chat:
    queries:
      - >
        SELECT cm.* FROM chat_messages cm
        INNER JOIN chat_conversations ON cm.conversation_id = chat_conversations.id
        WHERE chat_conversations.user_id = auth.user_id()

Looks fine. Aliases the source table inside an INNER JOIN. The result: zero rows sync to the client. No validation error. No runtime error. Just an empty client database and a user wondering why their chat history is gone.

Root cause: the parser's filter compilation doesn't track joined tables, so chat_conversations.user_id resolves to a value the SqlTools doesn't recognize, and the filter degenerates silently.

#662 surfaces this as a loud actionable error with the working subquery rewrite (#665 in the issue's test matrix). It doesn't fix JOIN support — that's a bigger change — but it stops the silent data loss.

2. `upper()` / `lower()` use Unicode case-folding while SQLite uses ASCII

const upper = {
  call(value) {
    return value?.toUpperCase() ?? null;
  },
};

String.prototype.toUpperCase() is locale-aware and length-changing. SQLite's default upper() is ASCII-only. They disagree the moment your data contains ß:

Input	JS `toUpperCase()`	SQLite `upper()`
`"hello"`	`"HELLO"`	`"HELLO"`
`"straße"`	`"STRASSE"`	`"STRAßE"`
`"ﬁle"`	`"FILE"`	`"ﬁLE"`

SELECT upper(name) AS bucket_key FROM users produces "STRASSE" server-side and "STRAßE" client-side. The client queries the "STRAßE" bucket and finds nothing. Row silently lost.

#663 replaces the JS calls with explicit ASCII-only fold loops.

3. `length()` counts UTF-16 code units instead of characters

String.prototype.length is a UTF-16 code-unit count. SQLite's length() is a character count (Unicode code points). For BMP characters they agree. For anything else — emoji, CJK Extension B-G, ancient scripts, U+10000 and up — they diverge by exactly 2x per character:

'😀'.length       // 2 (surrogate pair, two code units)
'a😀b'.length    // 4
'𐀀'.length       // 2
// SQLite: length('😀') = 1, length('a😀b') = 3, length('𐀀') = 1

A bucket-parameter query like SELECT * FROM messages WHERE length(content) > 100 will route messages with emoji to a different bucket than the client query asks for. Silent miss.

#664 walks the string with for..of (which iterates code points) and counts.

4. `substring()` slices on code units, not characters — splits surrogate pairs

Same root cause as length(). String.prototype.substring(0, 2) on "a😀bc" returns "a\uD83D" — an unpaired surrogate, which isn't a valid string. SQLite's substr('a😀bc', 1, 2) returns "a😀".

#665 spreads the string into a code-point array ([...text]) and slices that.

The pattern

Every one of these is the same shape:

The server-side evaluator re-implements a SQL operation in JavaScript.
The client SQLite implements the same operation differently.
The two agree for the data the maintainer tested with (ASCII strings, small integers, single-byte boundaries).
They disagree silently for everything else.
Bucket keys diverge, filters return the wrong rows, downstream code consumes corrupted state.

No error. No log line. Just wrong data.

I've shipped this exact bug class to PowerSync (8 times now), Rocicorp's Zero (2 merged), InstantDB (1 merged), ElectricSQL (PR open), and Dexie (PR open). I packaged the audit as an open-source checker so any team can run it against their own DB query layer:

→ silentdrop (npm i silentdrop)

Run it. The four checks took twenty minutes to write and have found a real bug in every JS database I've pointed them at.

If your sync/database layer is correctness-critical and you'd rather have the whole operator surface hardened by hand — same pass as the eight PowerSync PRs and the four other databases — I take that on as a fixed 48-hour sprint. Details and Polar checkout in the silentdrop repo.

DEV Community

How I shipped 8 silent-row-loss fixes to one OSS database in 24 hours - the same class repeating

1. JOINs in Sync Streams silently sync zero rows

2. `upper()` / `lower()` use Unicode case-folding while SQLite uses ASCII

3. `length()` counts UTF-16 code units instead of characters

4. `substring()` slices on code units, not characters — splits surrogate pairs

The pattern

Top comments (0)

1. JOINs in Sync Streams silently sync zero rows

2. upper() / lower() use Unicode case-folding while SQLite uses ASCII

3. length() counts UTF-16 code units instead of characters

4. substring() slices on code units, not characters — splits surrogate pairs

The pattern

2. `upper()` / `lower()` use Unicode case-folding while SQLite uses ASCII

3. `length()` counts UTF-16 code units instead of characters

4. `substring()` slices on code units, not characters — splits surrogate pairs