DEV Community

rinat kozin
rinat kozin

Posted on

SQLite provider anatomy: moving away from EF Core — typed object storage for desktop, mobile, and Blazor WASM

redb.sqlite

Series: redb — ecosystem, engineering teardown following the 3.2.0 SQLite announcement.

What this is

When we shipped the SQLite provider in redb 3.2.1, the pitch was two sentences: same LINQ, one line in DI. This post is the opposite of that pitch. Not "what shipped" — how it's built and where it leaked. Concretely: how redb's query engine moved into a native C extension on a database with no stored procedures; how we store DateTimeOffset in a database that has no date type; and three real bugs from this release, each with the filter JSON, the generated SQL, and the fix.

It's long and it has code. If you want the short version, the announcement is linked above. If you want what's under "one line in DI," pull up a chair.

For context, if redb is new to you (the rest assumes you've seen this):

Two disclaimers before we start, so nothing trips us up later.

One — about a word. Yes, redb's model is "flexible": a class is spread across rows in a values table. No, this is not EAV in the dismissive sense people throw that acronym around. What lives in a redb database is RTTI — real type information: schemes, structures, field types, references. The DB knows that EmployeeProps.HireDate is a DateTime, that Contacts is an array of objects, that CurrentProject is a reference to another scheme. That's a runtime type system at the storage layer, not a blind bag of tuples. You'll see below why neither the materializer nor the query compiler could physically exist without it — at every step they need the type of the thing they're materializing or filtering.

Two — about fit, so nobody's disappointed later. redb is for complex business classes: object graphs, nesting, cross-scheme references, trees, dictionaries, arrays of objects. Pumping flat, high-volume streams into it — a coordinate feed, sensor telemetry, thousands of metrics a second — technically works, but it's an anti-pattern. That's what time-series and columnar stores exist for; redb pays for its flexibility and typing exactly where the data is rich and connected, not where you've got one (timestamp, value) table at a billion rows. Short version: redb shines when you have a real domain model.


Part 0. The constraint that shapes everything

SQLite has no procedural language. No PL/pgSQL, no T-SQL — nowhere to put server-side logic. And redb's "Free" tier on Postgres and MSSql is built exactly around having that: the heavy machinery (query compiler, materializer, soft-delete, permission views) lives inside the database as server-side functions. That's literally the Free/Pro line: where JSON gets materialized and who generates the SQL.

"No stored procedures" forks into exactly two paths, and we took both — one per tier:

Pro — pure C#. ProSqlBuilder generates the query SQL in code, props materialize in code, zero DB-function calls. Consequence: it runs anywhere Microsoft.Data.Sqlite runs — including Blazor WebAssembly and mobile (MAUI/iOS/Android), where you flat-out cannot load a native SQLite extension.

Free — a native C extension. Same shape as Postgres/MSSql: the engine lives in the database. On SQLite, "in the database" means a loadable extension written in C (redb.dll / .so / .dylib) on top of sqlite3ext.h. It runs wherever native code loads: desktop, server, CI — and it lets a non-.NET host (Python, the sqlite3 CLI) talk to a redb database directly if it wants to.

Both stories at full size below. Native first (it's the fun one), then dates (the important one), then the bugs, then the traps.


Part 0.5. What the provider reproduces: _objects and _values

Before diving into the native side — what it actually reads and writes. redb's model is ~13 tables (full teardown in the "13 tables" post), but two are load-bearing for the SQLite story, and the provider had to recreate both column-for-column.

_objects is the "header" of every object: identity, tree, ownership, dates, the reference to a scheme (i.e. to a type):

CREATE TABLE _objects(
    _id             INTEGER NOT NULL PRIMARY KEY,
    _id_parent      INTEGER NULL,            -- tree (parent)
    _id_scheme      INTEGER NOT NULL,        -- RTTI: which CLASS this object is
    _id_owner       INTEGER NOT NULL,
    _id_who_change  INTEGER NOT NULL,
    _date_create    REAL    NOT NULL DEFAULT (julianday('now')),  -- UTC Julian day (REAL)
    _date_modify    REAL    NOT NULL DEFAULT (julianday('now')),  -- UTC Julian day (REAL)
    _name           TEXT    NULL,
    _hash           TEXT    NULL,            -- hash of the props set (delta/cache)
    -- slots for RedbPrimitive<T> (when Props IS the primitive, no nested structure)
    _value_long     INTEGER NULL,
    _value_string   TEXT    NULL,
    _value_bool     INTEGER NULL,            -- bool = 0/1
    _value_double   REAL    NULL,
    _value_numeric  REAL    NULL,            -- NUMERIC(38,18): REAL by default
    ...
);
Enter fullscreen mode Exit fullscreen mode

_values is the row-per-property store. One row per (object, structure, [array index]). The key idea is typed column slots: a value lives not in one "universal" text column, but in the column for its type:

CREATE TABLE _values(
    _id              INTEGER NOT NULL PRIMARY KEY,
    _id_structure    INTEGER NOT NULL,       -- RTTI: WHICH field this is
    _id_object       INTEGER NOT NULL,
    _String          TEXT    NULL,
    _Long            INTEGER NULL,
    _Guid            TEXT    NULL,
    _Double          REAL    NULL,
    _DateTimeOffset  REAL    NULL,           -- DateTime/DateTimeOffset/DateOnly as UTC Julian
    _Boolean         INTEGER NULL,           -- 0/1
    _ByteArray       BLOB    NULL,
    _Numeric         REAL    NULL,
    _ListItem        INTEGER NULL,
    _Object          INTEGER NULL,           -- reference to another object
    _array_parent_id INTEGER NULL,           -- arrays/dictionaries — relational
    _array_index     INTEGER NULL
);
Enter fullscreen mode Exit fullscreen mode

Two consequences that everything else rests on:

  • Typed slots, not "everything as a string." _Boolean is 0/1, _DateTimeOffset is REAL Julian, _Long is an integer. So comparisons and sorts in SQL run on the column's native type (and use indexes) instead of a string cast. That's the "not EAV" part: _values is a typed props store, and _id_structure_structures carries RTTI about which field this is and what type it is. The query compiler needs the field's type at every step — to pick the slot column for MAX(...) FILTER in the pivot; without RTTI it wouldn't know which column LastName comes out of.
  • Arrays and dictionaries are relational, via _array_parent_id/_array_index, not a JSON blob. So the materializer assembles them with GROUP BY over the index, and the compiler can filter through them (_array_index IS NULL in the pivot is exactly what separates a field's scalar row from its array elements).

From here, "the engine in the database" = functions that read/write exactly these two tables, checking the metadata in _schemes/_structures.


Part 1. The native extension: anatomy

Entry point

A SQLite loadable extension is a .so/.dll/.dylib with one exported init function. The default name is derived from the file basename: for redb.dll, that's sqlite3_redb_init. Inside, it registers all our SQL functions:

SQLITE_EXTENSION_INIT1

int sqlite3_redb_init(sqlite3 *db, char **pzErrMsg, const sqlite3_api_routines *pApi){
  SQLITE_EXTENSION_INIT2(pApi);
  int rc;

  rc = sqlite3_create_function(db, "get_object_json", 1, SQLITE_UTF8, 0,
                               getObjectJsonFunc, 0, 0);
  if(rc != SQLITE_OK) return rc;
  rc = sqlite3_create_function(db, "get_object_json", 2, SQLITE_UTF8, 0,
                               getObjectJsonFunc, 0, 0);   // overload: (_id, max_depth)
  if(rc != SQLITE_OK) return rc;

  rc = sqlite3_create_function(db, "save_object_json", 1, SQLITE_UTF8, 0,
                               saveObjectJsonFunc, 0, 0);
  if(rc != SQLITE_OK) return rc;

  rc = redbRegisterPvt(db);   // the whole pvt_* compiler
  return rc;
}
Enter fullscreen mode Exit fullscreen mode

SQLITE_EXTENSION_INIT1/INIT2 are macros from sqlite3ext.h that rewrite direct sqlite3_* calls into calls through the pApi pointer table. The gotcha that's easy to miss: after INIT2, every SQLite API call inside the extension goes through that table. Forget INIT2 and the extension compiles fine — then segfaults on the first sqlite3_* call through a garbage pointer.

Loading — on every connection

The defining trait of loadable extensions: they are not persistent. SQLite forgets registered functions on a new connection. And Microsoft.Data.Sqlite pools connections. So you must load the extension on every connection, after PRAGMAs. Our wrapper around SqliteConnection does it: open → set foreign_keys=ON and busy_timeout → load the extension → hand it to the pool.

The binary is located by:

  1. An explicit SqliteDataSource.NativeExtensionPath (if set in code).
  2. The REDB_SQLITE_EXTENSION environment variable.
  3. Otherwise redb.{dll,so,dylib} from the NuGet runtimes/<rid>/native/ folder; in dev, by walking up the directory tree to redb.SQLite/native/build.

Pro never sets a path: it doesn't need the native code, and in WASM/mobile there wouldn't be any.

Shipping the binary: why it cross-compiles for free

The extension is a loadable module, and that has a packaging payoff people don't expect: it links against nothing. sqlite3ext.h hands you the API as a pointer table resolved from the host at load time (that's what SQLITE_EXTENSION_INIT2 wires up) — so there's no libsqlite3 to find, no import library, no target sysroot. Cross-compiling collapses to "point CMake at a cross compiler":

# linux-arm64 from an x64 box, in a throwaway container — no sysroot needed
docker run --rm -v "$PWD:/work" debian:bookworm bash -c '
  apt-get update && apt-get install -y cmake make gcc-aarch64-linux-gnu
  cd /work/redb.SQLite/native
  cmake -S . -B build-linux-arm64 -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc
  cmake --build build-linux-arm64'
# → build-linux-arm64/redb.so: ELF 64-bit LSB shared object, ARM aarch64
Enter fullscreen mode Exit fullscreen mode

A native lib that linked libsqlite3 would need the arm64 build of that library to link against; this one doesn't, because it only ever calls SQLite through the host's pointer table. One cross-gcc, one valid arm64 binary.

Delivery has its own twist. SQLite loads the extension by an explicit path (conn.LoadExtension(...)), not like a P/Invoke native lib the host resolves from the NuGet cache — so the file must physically sit in the app output. A RID-targeted dotnet publish -r <rid> flattens runtimes/<rid>/native/ for you; a framework-dependent build (no RID) doesn't, so the package ships a buildTransitive .targets that copies the OS-matching binary into the consumer's output, Exists-gated per RID. That's why "ship one more platform" = "produce one more redb.{so,dylib} and drop it in native/build-<rid>/" — the csproj and .targets already enumerate all five RIDs.

War story #5: the .targets that broke every consumer (3.2.0 → 3.2.1)

That delivery .targets is also where this release drew blood — and since this is an honest teardown, here it is. We shipped 3.2.1 with a banner comment in redb.SQLite.targets that had a row of - characters inside <!-- … -->. XML comments cannot contain --, so MSBuild refused to even load the file: every consumer that pulled the package got

error MSB4024: An XML comment cannot contain '--', and '-' cannot be the last character.
Enter fullscreen mode Exit fullscreen mode

and couldn't build at all. And it rode transitively — redb.SQLite.Pro depends on redb.SQLite, and buildTransitive assets flow to dependents, so the Pro package was just as broken.

Why didn't the 200/200 suite catch it? Because buildTransitive only imports when the package is consumed as a NuGet package. Our own solution references the projects by ProjectReference, which skips the build/ import entirely — so the malformed file sat in every green build, invisible, until the first real dotnet add package redb.SQLite. We found it the moment we built a packaged consumer (a sample app in the public repo), not a second earlier.

The fix is a one-line deletion (and yes — I reproduced the identical bug in a csproj comment while writing the fix; -- is a persistent little landmine). Because you can't overwrite a published version, the corrected packages went out as redb.SQLite / redb.SQLite.Pro 3.2.1, and the broken 3.2.1 pair was unlisted. So, to be exact about versions: the engine, the date handling, the bugs above — all 3.2.1; the two SQLite NuGet packages are 3.2.1. The unglamorous lesson: a package's build/ and buildTransitive assets are shipped code too — validate them from a packaged consumer in CI, because ProjectReference will happily lie to you.

What's inside: a materializer and a compiler

Two big subsystems live in the extension.

get_object_json(_id [, max_depth]) — a recursive materializer. Takes an _id, reads _values, assembles the JSON object that System.Text.Json then deserializes into a typed RedbObject<TProps>. It assembles everything: base fields from _objects, scalars, arrays and dictionaries (relationally, via _array_index / _array_parent_id), nested Class fields, object references, ListItem (including a ListItem that itself carries an Object). This is exactly where RTTI is required: to assemble a nested object, the materializer must know its scheme, its field types, that this field is an array and that one is a reference. Without type metadata, it'd just be a pile of rows in _values.

The pvt_* compiler — ~9k lines of PL/pgSQL logic ported into C as a SQL-string generator: pvt_build_query_sql, pvt_build_aggregate_sql, pvt_build_groupby_sql, pvt_build_window_sql, pvt_build_projection_sql, pvt_build_array_groupby_sql. The pipeline:

LINQ expression
   → (C#) facet filter as JSON + a field list
   → (native) pvt_build_query_sql(scheme, filterJson, ...)
   → a finished SQL SELECT string
   → (C#) execute it, materialize via get_object_json
Enter fullscreen mode Exit fullscreen mode

So the native side is a filter-JSON-to-SQL translator. C# doesn't build the SQL itself (Pro does that) — it builds a filter JSON and asks the database to assemble the SQL. It sounds inside-out, but it's exactly what buys parity: the same filter JSON, on Postgres and on SQLite Free, runs through the same engine — just emitting different dialects.

A bit deeper on get_object_json: recursion, arrays, references

"Assembles JSON" sounds simple until you remember it's assembling a graph. The materializer walks the scheme's structure and, for each field, decides what it is from the RTTI in _structures:

  • scalar — reads the value from the matching _values slot column (for a date, wraps it in strftime so an ISO string goes out);
  • array/dictionary — gathers all rows with the same _array_parent_id, ordered by _array_index, into a JSON array/object (this is why the pivot uses _array_index IS NULL — to separate a field's "scalar" row from its elements);
  • nested Class — recurses into the subtree assembly;
  • reference (_Object) — by max_depth, either expands the target object (recursion with decremented depth) or leaves an id reference;
  • ListItem — a lookup-list entry; and a ListItem can itself carry an Object, so that's one more level of expansion.

The max_depth parameter (the second overload of the function) is a guard against infinite recursion on cyclic references and, at the same time, a budget for how deep to pull related objects into one JSON. None of these transitions is possible without RTTI: the materializer must know that this field is an array and that one is a reference, or it's just rows in _values with no meaning.

The other direction: save_object_json — the write path

Reading has a symmetric twin — save_object_json(json). It takes an object's JSON and, by the scheme, lays it back out into _objects (base) and _values (props). Two SQLite-specific subtleties:

  • Dates on write. In the incoming JSON, dates are ISO strings (that's how System.Text.Json serializes a DateTimeOffset). The native write wraps them in julianday('<iso>') so _DateTimeOffset gets a REAL Julian. It's the mirror of the read-side strftime — and exactly what got fixed this release under "fix save_object_json too": before the fix, a date went in as a string and didn't agree with the REAL column.
  • The props-save strategy. On Free it's PropsSaveStrategy.DeleteInsert: a save = delete the object's existing _values rows and insert the new set (ChangeTracking — a delta off _hash — is a Pro feature). Simple, predictable, no tracking layer — at the cost of extra rewrites; fine for an embedded load.

Loading the right type: the CLR registry and polymorphism

get_object_json hands back JSON — but deserialize it into what? For LoadAsync<EmployeeProps>(id) the answer is in the generic. But for polymorphic loads (GetChildren on a tree where the children are objects of different schemes; LoadDynamicObject) the type is unknown at compile time: you have to map the object's _id_scheme to the *Props class to materialize into.

This got rebuilt this same cycle into a two-layer CLR registry (we were fixing a polymorphic LoadAsync that returned the wrong type out of nowhere):

  • a global scheme name ↔ Type index — self-healing: subscribed to AppDomain.AssemblyLoad, so loading an assembly with new *Props bumps a "generation" and the index rebuilds lazily (no "register all your types at startup");
  • a per-domain scheme_id → Type cache — where "domain" = SHA256(sanitized connection string) or an explicit CacheDomain. Per-domain partitioning matters because one process can hold several databases (including several SQLite files), and scheme_id=1000010 in one is not the same class as in another.

This ties straight to SQLite: the materializer hands back JSON + _id_scheme, and the registry turns _id_scheme into a Type to deserialize a polymorphic child into. Get it wrong and a tree of mixed-type children all materializes into one (wrong) type.

The Postgres → SQLite phrasebook

A port isn't "retype it in C," it's a dialect translation. The most frequent swaps (all real lines from SqliteDialect/the native side):

Postgres SQLite where
array_agg(x) FILTER (WHERE …) json_group_array(x) FILTER (WHERE …) pivoting arrays
x = ANY($1) x IN (SELECT value FROM json_each($1)) IN-lists (SQLite can't bind array params)
EXTRACT(year FROM x) CAST(strftime('%Y', x) AS INTEGER) date parts
col ILIKE $1 col LIKE $1 ESCAPE '\' case-insensitive LIKE
DELETE … WHERE _id = ANY($1) DELETE … WHERE _id IN (SELECT value FROM json_each($1)) delete-by-id-list
DISTINCT ON (col) ROW_NUMBER() OVER (PARTITION BY col) + WHERE rn=1 DistinctBy (its own section below)

That ESCAPE '\' deserves a sentence, because it's a classic landmine. Postgres uses \ as the default LIKE escape character. SQLite (and MSSQL) do not. Meanwhile UserProviderBase escapes user input with a backslash (50%50\%), assuming PG semantics. On SQLite, without an explicit ESCAPE '\', that escaped \% starts matching a literal backslash followed by anything — a silent search corruption. So the SQLite dialect always emits LIKE with an explicit ESCAPE '\'. Small thing, easy half-day to lose.

Minimum version, and why

SQLite 3.44.0+ (Nov 2023). We deliberately lean on FILTER (WHERE …), modern window functions, RETURNING, JSON1, and recursive CTEs. The goal is to keep the SQLite SQL structurally close to the Postgres SQL, not rewrite it from scratch. The closer the dialects, the fewer places they diverge in behavior (not just syntax) — and behavioral divergences are the ones you find in prod, not in the compiler.

Identifiers: AUTOINCREMENT instead of sequences

SQLite has no sequences. redb needs globally-unique ids that both the .NET key generator and the native side (for non-.NET hosts) can hand out. The solution: a native AUTOINCREMENT table with sqlite_sequence as a shared high-water mark. Both the C extension and the C# key generator advance the same counter and reserve id blocks from it. The result: ids are unique no matter who handed them out — .NET or a Python process writing to the same file.

War story #1: %% in sqlite3_mprintf

So that "porting PL/pgSQL to C" doesn't sound sterile — here's the kind of bug the port hands you for free.

The materializer has a column-list macro VCOLS that gets interpolated into an sqlite3_mprintf format string:

char *sql = sqlite3_mprintf("SELECT" VCOLS "FROM _values WHERE _id_structure=?1 "
                            "AND _id_object=?2 AND %s LIMIT 1", cond);
Enter fullscreen mode Exit fullscreen mode

When I switched the datetime columns to emit via strftime('%Y-%m-%dT%H:%M:%fZ', _DateTimeOffset), I dutifully put that strftime right into VCOLS. Everything fell over silently: objects started loading with empty properties, while base dates were fine.

I spent half an hour looking in the wrong place. The reveal: VCOLS flows into the format string of sqlite3_mprintf, and %Y %m %d %H %M %f are format specifiers to mprintf. It started eating the cond argument at the first %Y, the formatting derailed, the SQL came out malformed, sqlite3_prepare_v2 silently returned an error code, and the props function returned empty. Base dates survived because their SELECT goes through prepare_v2 directly, no mprintf.

The fix is escaping — %% (mprintf collapses %%% before SQLite ever sees the string):

// was: ... strftime('%Y-%m-%dT%H:%M:%fZ',_DateTimeOffset) ...   // breaks mprintf
// now:
#define VCOLS " _id,_String,_Long,_Guid,_Double, " \
  "strftime('%%Y-%%m-%%dT%%H:%%M:%%fZ',_DateTimeOffset), " \
  "_Boolean,_ByteArray,_Numeric,_ListItem,_Object,_array_parent_id,_array_index "
Enter fullscreen mode Exit fullscreen mode

The lasting lesson: when you generate SQL through a printf-style formatter, any % in the data is a landmine. What makes it nasty is that the failure is silent: prepare doesn't throw, it returns an error code that's easy to skip, and it surfaces as "why are my props empty."


Part 2. SQLite has no date type. How we lived with that for a release

This is the central engineering story of the release, and it didn't exist yet at announcement time.

Three storage classes, zero date types

In SQLite, datetime is a convention over three storage classes, not a type:

  • TEXT — an ISO-8601 string ('2024-06-15 13:45:30').
  • REAL — a Julian day, a floating-point number (astronomical day count).
  • INTEGER — Unix epoch seconds.

A .NET DateTimeOffset doesn't drop into any of these on its own. You pick a representation and hold it everywhere: on write (the parameter binder), on read (the materializer + scalar converters), in filter comparisons, in aggregates. One miss in any of those points and dates "kind of work" but lie at the boundaries.

Why not TEXT (even though that's how it started)

The first version stored dates as ISO text. It broke exactly the way string-comparing dates breaks.

datetime('now') in SQLite returns a string with a space between date and time: 2024-06-15 13:45:30. But the literal the C# layer interpolates into a comparison arrives with a T: 2024-06-15T13:45:30. TEXT comparison in SQLite is lexicographic, byte-by-byte. Here's what happens at position 10:

'2024-06-15 13:45:30'   byte[10] = 0x20 (space)
'2024-06-15T13:45:30'   byte[10] = 0x54 ('T')
0x20 < 0x54  →  the space string is ALWAYS "less than" the 'T' string
Enter fullscreen mode Exit fullscreen mode

So any "stored (with a space) ⟷ literal (with a T)" comparison goes one way always, regardless of the actual time. Range filters started lying silently. In a cluster, this once marked a live node dead: the heartbeat comparison last_seen < cutoff was "always true" because the stored last_seen (with a space) sorts lexicographically below the cutoff literal (with a T). Node's alive; monitoring thinks it's a corpse.

We could normalize the separator. But that treats the symptom: TEXT comparison stays lexicographic, and any other format drift (milliseconds, a timezone suffix, leading zeros) reopens the same hole. We had to leave strings.

The fix: REAL Julian day, everything in UTC

We moved to REAL Julian day, everything in UTC — exactly how Postgres keeps timestamptz in UTC. Three reasons this is a right choice, not "yet another convention":

1. SQLite's native functions eat the Julian number directly. julianday(), strftime(), datetime(), date() take a REAL Julian as-is, no wrapping:

sqlite> SELECT strftime('%Y-%m-%dT%H:%M:%fZ', 2460477.0732638887);
2024-06-15T13:45:30.000Z
sqlite> SELECT datetime(2460477.0732638887);
2024-06-15 13:45:30
sqlite> SELECT julianday('2024-06-15T13:45:30Z');
2460477.0732638887
Enter fullscreen mode Exit fullscreen mode

So to emit a date into JSON, the materializer just wraps the column in strftime and gets ISO that System.Text.Json parses natively.

2. Comparison becomes numeric. col < X on a double is correct and unambiguous. No lexicographic surprise, because you're comparing numbers, not bytes.

3. And, crucially for prod — it's sargable (index-friendly). Here's the subtlety the whole thing was for. Wrap the column in a function — julianday(col) < X — and the index on col dies: the optimizer can't use an index over an expression of the column. But comparing the raw REAL column to a constantcol < julianday('2024-06-15') — is indexable: bare column on the left, constant on the right. So in SQL generation we put julianday(...) on the literal side, never on the column:

-- NOT this (kills the index on _date_create):
WHERE julianday(o._date_create) >= julianday('2023-01-01')
-- this (sargable):
WHERE o._date_create >= julianday('2023-01-01')
Enter fullscreen mode Exit fullscreen mode

The column stores a Julian number → compare it to a Julian number computed from the literal, on the constant side.

Conversion: no magic, built-in .NET

Converting DateTime/DateTimeOffset ⟷ Julian is arithmetic on the built-in ToOADate/FromOADate. An OLE Automation date (epoch 1899-12-30) differs from a Julian day by exactly the constant 2415018.5:

internal static class SqliteJulian
{
    // Julian = OADate + 2415018.5. ToOADate/FromOADate are built-in and lossless
    // within double precision — the same precision SQLite's julianday() lives in.
    private const double OADateToJulianOffset = 2415018.5;

    // DateTimeOffset → UTC Julian. .UtcDateTime APPLIES the offset → the true UTC instant.
    public static double ToJulian(DateTimeOffset dto) => dto.UtcDateTime.ToOADate() + OADateToJulianOffset;

    // DateTime → UTC Julian. The clock value is treated as UTC per redb's contract
    // (NormalizeForStorage sets Kind=Utc without converting). ToOADate ignores Kind — they agree.
    public static double ToJulian(DateTime dt) => dt.ToOADate() + OADateToJulianOffset;

    // REAL Julian → DateTimeOffset(+00:00)
    public static DateTimeOffset FromJulian(double julian)
    {
        var utc = DateTime.SpecifyKind(DateTime.FromOADate(julian - OADateToJulianOffset), DateTimeKind.Utc);
        return new DateTimeOffset(utc, TimeSpan.Zero);
    }
}
Enter fullscreen mode Exit fullscreen mode

Where it plugs in: four points

For dates not to lie, the REAL-Julian representation has to hold at every point where a value crosses the C# ⟷ SQLite boundary:

Write — the central parameter binder. Every write (base dates and props alike) goes through one place — CreateCommand in SqliteRedbConnection. There, DateTimeOffset/DateTime turn into double:

switch (param)
{
    case DateTimeOffset dto:
        // REAL Julian day (UTC) — SQLite's native date format.
        sqliteParam.Value = SqliteJulian.ToJulian(dto);
        break;
    case DateTime dt2:
        sqliteParam.Value = SqliteJulian.ToJulian(dt2);
        break;
    // ...
}
Enter fullscreen mode Exit fullscreen mode

Read scalars — ConvertScalar. A value from SQLite arrives as double; for a temporal target we convert back:

if (targetType == typeof(DateTimeOffset))
    return (T)(object)(value is double jdo ? SqliteJulian.FromJulian(jdo)
        : value is DateTimeOffset d ? d
        : value is DateTime dt ? new DateTimeOffset(dt, TimeSpan.Zero)
        : DateTimeOffset.Parse(value.ToString()!, ...));
Enter fullscreen mode Exit fullscreen mode

Read rows — MapRow. The same double → DateTimeOffset/DateTime/DateOnly when mapping columns to properties.

Native. get_object_json emits the date via strftime(ISO, col) (see the %% story), and pvt comparisons wrap the literal in julianday('<iso>').

Three CLR types on one column

The _values._DateTimeOffset column (REAL) serves three CLR types:

  • DateTimeOffset — directly.
  • DateTime — clock value as UTC (redb's contract).
  • DateOnly — UTC midnight.

(TimeOnly/TimeSpan go to _String.) The scheme's RTTI disambiguates them: the field's db-type tells the materializer what to unfold the double into.

Timezones: why +4 resolves correctly

A recurring comment-section question: "if I write a DateTimeOffset with a +04:00 zone in LINQ, does it compare correctly against what's in the DB?" Yes, for two reasons:

  • C# side: ToJulian(DateTimeOffset) takes dto.UtcDateTime — and .UtcDateTime applies the offset and yields the true UTC instant. So +04:00 collapses to UTC before Julian.
  • Native side: even if an offset leaks into the literal, the comparison wraps it in julianday('<iso-with-offset>'), and julianday() parses the offset itself:
sqlite> SELECT julianday('2026-06-25T20:00:00+04:00') = julianday('2026-06-25T16:00:00Z');
1
Enter fullscreen mode Exit fullscreen mode

Stored UTC → compared by UTC instant → it agrees with any incoming offset.

War story #2: analytics and a FormatException from nowhere

The trickiest part of the datetime story.

A normal object load goes through get_object_json — it emits the date as an ISO string (via strftime), and C# parses it normally. But analyticsMinRedbAsync/MaxRedbAsync, AggregateRedbAsync, windows, group-bys — bypass get_object_json. They pull the date column straight into the SELECT and hand the raw Julian number to the core converter. The converter expected a string or a DateTime. The result: a FormatException out of nowhere (and funnier still: elem.GetInt64() on a fractional 2460477.07 — because the code assumed an integer Unix timestamp).

The fix was a design fork. We could patch a SQLite hack right into the core — but redb.Core has no knowledge (and shouldn't) of Julian: that's a SQLite storage detail, and the core serves three dialects. PG/MSSql return dates fine, and dragging the word "Julian" into the shared converter would leak one provider's detail into all of them.

We did it through a neutral extension point. In the core, an optional "number → temporal type" hook, empty by default:

// redb.Core: the core does NOT know the word "Julian". Only: "if a NUMBER targets a
// date and a decoder is registered, ask it."
public static class TemporalDecoder
{
    public static Func<double, Type, object?>? NumericDecoder;

    public static bool IsTemporal(Type t) =>
        t == typeof(DateTime) || t == typeof(DateTimeOffset) || t == typeof(DateOnly);

    // Convert.ChangeType that first lets a number-to-date pass through the decoder.
    public static object ChangeType(object value, Type targetType) =>
        TryDecode(value, targetType, out var d) && d != null ? d
        : System.Convert.ChangeType(value, targetType);
}
Enter fullscreen mode Exit fullscreen mode

And the SQLite provider itself registers the decoder — in SqliteDataSource's static constructor, which runs for both Free and Pro:

TemporalDecoder.NumericDecoder = static (julian, targetType) =>
{
    var dto = SqliteJulian.FromJulian(julian);
    if (targetType == typeof(DateTimeOffset)) return dto;
    if (targetType == typeof(DateOnly))       return DateOnly.FromDateTime(dto.UtcDateTime);
    return dto.UtcDateTime; // DateTime
};
Enter fullscreen mode Exit fullscreen mode

Then two places in the core, through which all analytics materialization flows (for both Free and Pro, because Pro has no materializer of its own — it reuses the core converters):

  1. JsonValueConverter — the Number branch → temporal type (group-by, window, projections).
  2. A TemporalDecoder.ChangeType wrapper at scalar points (MinRedbAsync/MaxRedbAsync, AggregateResult.Get<T>).

PG/MSSql never return a number for a date — their NumericDecoder stays null, behavior unchanged to the byte. The constant 2415018.5 and the word "Julian" stay inside redb.SQLite, and the core stays storage-agnostic.

That, to my taste, is the shape of a right fix: the problem is local (SQLite stores dates as numbers), and the cure isn't smearing SQLite specifics across the core — it's one generic extension point that only the party who needs it ever pulls.


Part 3. The pvt compiler: how a filter becomes SQL

Since the engine is a filter-JSON-to-SQL translator, let's take apart a piece of that translator. This is the most "a database inside the database" part.

The filter JSON

C# folds a LINQ Where into a facet filter — JSON the native side understands. Say Where(e => e.LastName == "NullableTest") on a prop field gives:

{ "LastName": { "$eq": "NullableTest" } }
Enter fullscreen mode Exit fullscreen mode

And WhereRedb(o => o.ParentId == null) on a base field (the 0$: marker means "this is base, not a prop"):

{ "0$:ParentId": null }
Enter fullscreen mode Exit fullscreen mode

Combining the two conditions is an implicit $and over the object's keys:

{ "0$:ParentId": null, "LastName": { "$eq": "NullableTest" } }
Enter fullscreen mode Exit fullscreen mode

The split: push vs residual

The key function is pvtSplitFilter. It divides the filter into two parts:

  • push — conditions on base fields and props that can be pushed inside the CTE (into the _objects/_values subquery).
  • residual — what's applied outside, on top of the assembled pivot.

That split decides which "shape" the query takes. There are three:

  • Shape A — pure-base flat: the filter is base-only, no props. No CTE at all: SELECT _id FROM _objects o WHERE o._id_scheme=? AND <push>.
  • narrow — props exist, the filter reduces to the pivot: build a _pvt_cte (pivoting the needed structures via MAX(...) FILTER (WHERE _id_structure=? AND _array_index IS NULL)), join _objects.
  • non-narrow — there are non-pushable checks (e.g. presence), an outer WHERE is needed.

The actual assembled SQL for Where(LastName) + WhereRedb(ParentId IS NULL) (narrow shape):

WITH _pvt_cte AS (
    SELECT v._id_object,
           MAX(v._String) FILTER (WHERE v._id_structure = 1000012 AND v._array_index IS NULL) AS "LastName"
    FROM _values v
    WHERE v._id_structure IN (1000012)
      AND v._id_object IN (
            SELECT o._id FROM _objects o
            WHERE o._id_scheme = 1000010 AND o._id_parent IS NULL   -- ← pushed base condition
          )
    GROUP BY v._id_object
)
SELECT o._id FROM _pvt_cte
JOIN _objects o ON o._id = _pvt_cte._id_object
WHERE "LastName" = 'NullableTest'                                    -- ← residual prop condition
Enter fullscreen mode Exit fullscreen mode

Notice: the base condition o._id_parent IS NULL went inside the _objects subquery (push), and the prop condition on LastName stayed outside (residual). That's not incidental — it's exactly what pvtSplitFilter does, and exactly where we had a bug (next).

War story #3: a multi-key filter dropped null

WhereRedb(o => o.ParentId == null).Where(e => e.LastName == "X") on Free returned rows that have a parent. The IS NULL condition silently vanished — but only when combined with a prop filter. Base-only WhereRedb(o => o.ParentId == null) worked.

Diagnosis. Calling the native function directly on three filters:

-- 1) base IS NULL only — WORKS:
SELECT pvt_build_query_sql(1000010, '{"0$:ParentId":null}', ...);
 ... WHERE o._id_scheme = 1000010 AND o._id_parent IS NULL

-- 2) base equality — WORKS:
SELECT pvt_build_query_sql(1000010, '{"0$:ParentId":5}', ...);
 ... WHERE o._id_scheme = 1000010 AND o._id_parent = 5

-- 3) base IS NULL + prop — the base condition VANISHED:
SELECT pvt_build_query_sql(1000010, '{"0$:ParentId":null,"LastName":{"$eq":"X"}}', ...);
 ... (only LastName in the CTE, no o._id_parent IS NULL)
Enter fullscreen mode Exit fullscreen mode

So {"0$:ParentId":null} on its own is fine, but as part of multiple keys it's lost. The root is in pvtSplitFilter's multi-key branch. To split each key separately, it reassembles a single-key filter object per key via pvtSingleton:

static char *pvtSingleton(sqlite3 *db, const char *k, const char *v_json){
  sqlite3_stmt *st = 0; char *r = 0;
  sqlite3_prepare_v2(db, "SELECT json_object(?1, json(?2))", -1, &st, 0);
  // ...
}
Enter fullscreen mode Exit fullscreen mode

And the loop took each key's value from json_each:

// WAS:
"SELECT key, value FROM json_each(?1)"
Enter fullscreen mode Exit fullscreen mode

The trap: for a JSON null, the value column in json_each is SQL NULL. So v_json arrived as an empty string, json("") is a parse error, json_object(...) returned NULL, the singleton came out NULL → pvtSplitFilter on a NULL filter returned "nothing" → the condition silently disappeared. (The same would happen with a bare text value: json_each.value returns text without quotes, and json("NullableTest") is again a parse error.) The base-only path worked because it takes the value's type from a separate type column, where null is detected correctly.

The fix — reassemble the value into a valid JSON atom by type, in SQL, before pvtSingleton:

// NOW:
"SELECT key, CASE type "
"  WHEN 'text'  THEN json_quote(value) "   // "X" with quotes
"  WHEN 'null'  THEN 'null' "              // valid JSON null
"  WHEN 'true'  THEN 'true' "
"  WHEN 'false' THEN 'false' "
"  ELSE value END "                        // integer/real/object/array — already valid JSON
"FROM json_each(?1)"
Enter fullscreen mode Exit fullscreen mode

After re-encoding, {"0$:ParentId":null} stays valid JSON, the singleton assembles, and the condition reaches push and attaches to the _objects subquery. Lesson: json_each.value is a lossy source — it drops type (null → SQL NULL, text → unquoted); if you reconstruct JSON from it, do it off the type column.

DistinctBy: emulating DISTINCT ON with ROW_NUMBER()

Postgres has DISTINCT ON (col) — "one row per value of col." SQLite has none. Pro already solved it in ProSqlBuilder via ROW_NUMBER(); on Free the native distinct_on was ignored (an explicit TODO), and DistinctBy(e => e.Department) returned duplicates.

Bringing it to parity. pvt_build_query_sql accepts a 12th argument distinct_on — but the wrapper function only read up to the 11th, so C# sent the param and the native side dropped it. The fix is threefold:

  1. Read the 12th argument and thread it through.
  2. Pull the distinct field into the pivot. If the field isn't mentioned in the filter/sort, it's not in the collected fields → not in the CTE → nothing to partition on. So the distinct field gets mixed into field collection (pvtCollectFields) by the same mechanism as ORDER BY.
  3. Wrap the result in a _ranked CTE with ROW_NUMBER() and keep rn=1:
WITH _pvt_cte AS ( ... pivot Department ... ),
_ranked AS (
  SELECT o._id AS _id,
         ROW_NUMBER() OVER (PARTITION BY _pvt_cte."Department" ORDER BY o._id) AS _rn
  FROM _pvt_cte
  JOIN _objects o ON o._id = _pvt_cte._id_object
)
SELECT _id FROM _ranked WHERE _rn = 1
Enter fullscreen mode Exit fullscreen mode

The partition expression resolves from field metadata: base → o.<column>, prop → _pvt_cte."<FieldName>" (the pivot column). The group representative is the row with the minimum o._id (same as Pro). All of this engages only when distinct_on is present; ordinary queries take the old path — zero regression risk for 99% of queries.


Part 4. Free vs Pro: where Pro accidentally reached into Free

Architecturally Free and Pro share the base providers (redb.Core) but diverge at materialization:

  • Free calls get_object_json (native) to assemble objects.
  • Pro materializes in C# (ProLazyPropsLoader, ProSqlBuilder) and must never call a native function.

And here a principled bug surfaced. DeleteSubtreeAsync (subtree delete) collects descendant ids via the base TreeProviderBase.CollectDescendantIds. Pro overrides the loading tree methods (GetChildren, GetPolymorphicChildren, LoadDynamicObject) with C# materialization — but it did not override CollectDescendantIds, and that one used a recipe with get_object_json:

-- Tree_SelectPolymorphicChildren — the recipe that called the native function:
SELECT o._id as ObjectId, o._id_scheme as SchemeId, get_object_json(o._id, 1) as JsonData
FROM _objects o WHERE o._id_parent = $1
Enter fullscreen mode Exit fullscreen mode

On PG/MSSql this passes silently: get_object_json there is a server-side function present in every tier. On SQLite Pro the function isn't there (Pro doesn't load the native code) → a hard crash, no such function: get_object_json. And on PG/MSSql Pro it was silent waste: materializing each subtree node's full JSON only to throw the JSON away and take the _id.

The fix isn't a Pro override (that would leave the base calling get_object_json for a method that doesn't need JSON); it's removing JSON from the base method itself — it only needs an id list. We added an id-only recipe to all three dialects:

// ISqlDialect + PostgreSqlDialect / MsSqlDialect / SqliteDialect:
public string Tree_SelectChildrenIds() =>
    "SELECT o._id FROM _objects o WHERE o._id_parent = $1 ORDER BY o._name, o._id";
Enter fullscreen mode Exit fullscreen mode
// CollectDescendantIds — was QueryAsync<ChildObjectInfo>(Tree_SelectPolymorphicChildren), now:
var childIds = await Context.QueryScalarListAsync<long>(Sql.Tree_SelectChildrenIds(), parentId);
foreach (var childId in childIds) { ids.Add(childId); await CollectDescendantIds(childId, ids, ...); }
Enter fullscreen mode Exit fullscreen mode

Now the leak is closed at the source for every tier: Free doesn't materialize extra JSON for an id list, Pro doesn't reach into the native code, PG/MSSql Pro stop materializing for nothing. Pro source now has exactly zero get_object_json calls. And, incidentally, SQLite Pro turned out to be the perfect detector for these leaks: it crashes on any native call out of Pro — the thing PG/MSSql silently tolerate.

Also: DeleteSubtree and cascade

The same DeleteSubtree returned the wrong delete count. On SQLite the schema has FK _id_parent ... ON DELETE CASCADE — delete a parent and the children go by cascade. But changes() (rows-affected) does not count cascade-deleted rows. So DELETE WHERE _id IN (parent, children) could return 1 (only the parent deleted directly; children by cascade). We fixed the semantics: the method returns the size of the collected subtree (objectIds.Count), not the cascade-dependent rows-affected. On PG/MSSql (no cascade on _id_parent) it's the same number — no divergence.

Bool is an INTEGER

One more small one, surfaced in group-bys. SQLite has no boolean type — it stores 0/1 as INTEGER. In a pivot/projection, a bool value reaches the shared converter as a JSON number, and the bool branch in JsonValueConverter only caught true/string → the number 1 produced false. Group-by on a bool key collapsed (everything "false"). The fix — accept Number in the bool branch:

Type t when t == typeof(bool) => elem.ValueKind == JsonValueKind.True
    || (elem.ValueKind == JsonValueKind.Number && elem.TryGetDouble(out var bn) && bn != 0)
    || (elem.ValueKind == JsonValueKind.String && bool.TryParse(elem.GetString(), out var bl) && bl),
Enter fullscreen mode Exit fullscreen mode

PG/MSSql send true/false — untouched.


Part 5. How to poke at it yourself

This isn't article pseudocode. The repo ships two tools you verify all of this with by hand.

redb.Examples — ~150 runnable examples that run on any provider, SQLite included:

dotnet run --project redb.Examples -- E021 E146 E148   # date filter, aggregates, windows
Enter fullscreen mode Exit fullscreen mode

Flip AddRedb/AddRedbPro + UseSqlite and the same suite runs live on SQLite Free or Pro. The same code runs on Postgres/MSSql unchanged.

redb.CLI — a global .NET tool for schema and data management, supports sqlite across every command:

redb schema -p sqlite -o redb_sqlite.sql            # dump the full schema SQL (review/CI)
redb init   -p sqlite -c "Data Source=app.db"       # create the tables in an empty DB
redb export -p sqlite -c "Data Source=app.db" -o data.redb --compress
redb import -p sqlite -c "Data Source=app.db" -i data.redb --clean
Enter fullscreen mode Exit fullscreen mode

And the part that matters — trust is verified by tests. SQLite Free and Pro each pass the integration suite 200/200 — the same suite that gates Postgres and MSSql. For a brand-new provider that's worth more than adjectives: same suite, same bar.

War story #4: the CLI that "supported" sqlite — but silently didn't

While preparing this post, I wanted to show redb schema -p sqlite — and walked straight into my own tooling landmine. redb.CLI's code did support sqlite: a full SqliteProvider, the ProviderFactory.Create("sqlite") factory, the redbSqlite.sql schema resource. The csproj did not:

<!-- redb.CLI.csproj — pulled the engine from an old NuGet version: -->
<PackageReference Include="redb.SQLite" Version="1.2.*" />
Enter fullscreen mode Exit fullscreen mode

1.2.* is a version where the SQLite provider didn't exist at all (it's new, at 3.2.1). So typeof(redb.SQLite.RedbService).Assembly and the embedded redbSqlite.sql resource resolved against an assembly that doesn't have them, and any -p sqlite command silently drifted from the real code. The fix — point the references at the local projects (as redb.Examples already does):

<ProjectReference Include="..\redb.SQLite\redb.SQLite.csproj" />
Enter fullscreen mode Exit fullscreen mode

Same moral as the %% story: "the code supports it" ≠ "the build sees it." A version pin is part of the contract too, and a stale pin breaks a feature as quietly as a typo in SQL. It's trivially checkable — by running the command itself: redb schema -p sqlite now dumps the real schema (REAL Julian, _DateTimeOffset REAL) instead of failing on an empty resource.


Part 6. Traps you will hit

This series is honest about "what's not done and what you'll trip over," so no glossing:

  • The .db path follows the working directory. A relative string (Data Source=app.db) creates the file relative to the process cwd, not the project folder. I lost a couple of hours to this myself: dotnet run from different directories wrote to different files, and tests "passed/failed" against different DBs. Use an absolute path or pin the cwd.
  • :memory: is per-connection. For a connection pool to see one in-memory DB you need Mode=Memory;Cache=Shared plus one held-open connection. That's SQLite's lifecycle, not redb's: close the last connection and the DB evaporates.
  • NUMERICREAL by default. Fast, but lossy past double range. An exact path via TEXT is a planned setting. A known SQLite weak spot.
  • SQLite is single-writer. One writer per file; redb sets busy_timeout for concurrent writes, but don't expect Postgres-grade parallelism. For embedded/local, that's normal.
  • Free's native binaries ship for Windows x64, Linux x64 and Linux arm64. All three are packed into runtimes/<rid>/native/ and delivered to framework-dependent builds by a buildTransitive .targets — the extension is loaded by an explicit path, so the file must physically land in your output, and NuGet doesn't flatten runtimes/ for a no-RID build. macOS (osx-x64/osx-arm64 .dylib) builds from the same CMake project but needs a macOS runner — the one remaining gap, slated for the CI matrix. Pro has no native dependency — it's already everywhere today, which is exactly what WASM/mobile need.
  • bool in raw form is 0/1. Remember it when debugging _Boolean/_value_bool: true is stored as 1.

None of these poke through ordinary code — but in a debug session each one saves an evening.


Part 7. Pro on mobile and in the browser — and yes, free

Back to what the whole SQLite effort was for.

Writing Blazor WebAssembly, MAUI, or a standalone client? You want SQLite Pro: pure C#, loads no native code, runs in the browser sandbox and on a phone. A typed LINQ store in a single file inside your app.

And the part that draws skepticism, so plainly: Pro for this is free.

  • Go to redbase.app, register, and email a key request — you get a free license key back.
  • No bank/payment details. No card asked. Registration is the key-issuing mechanism, not a sales funnel.
  • The key goes in .WithLicense(...); the wiring instructions are right there after you register.

The barrier to a client-side scenario is zero.


Wrap-up

The SQLite provider made us do two non-obvious things and catch three bugs that would've cost more in prod than any review.

The non-obvious things: move the whole query engine into a C extension where the database has no stored procedures (and where a % in your data breaks mprintf), and re-answer "how do you store a date" for a database that has no date type — REAL Julian in UTC, sargable comparisons, julianday() on the literal side, and a neutral core hook instead of leaking SQLite specifics.

The bugs: a silently-dropped IS NULL in a multi-key filter (json_each.value is lossy by type), DISTINCT ON via ROW_NUMBER() instead of an ignore, and Pro accidentally reaching into the Free-only get_object_json on the subtree-delete path.

These are exactly the spots where the "one LINQ for every database" abstraction either holds or leaks. Ours holds: Free and Pro are green at 200/200 on the same suite as the other dialects, and you can poke at it with redb.Examples and redb.CLI from the repo.

Repo, docs, packages — redbase.app. The stack is at 3.2.1; the two SQLite NuGet packages (redb.SQLite / redb.SQLite.Pro) are 3.2.1 — a buildTransitive .targets hotfix, see War story #5. Questions like "does it do X on SQLite," bug reports — bring them; the provider's new and the feedback channel is open.

Top comments (0)