rinat kozin

Posted on May 21

An EF Core alternative for .NET apps with complex object graphs — full LINQ, no migrations, no DbContext

#dotnet #csharp #database #opensource

Today I'd like to slow down a bit and talk about redb.Core — the data engine at the heart of the RedBase ecosystem. The other pieces (redb.Route for pipelines, redb.Tsak for cluster runtime) lean on it, but this post is just about the database part.

I've been working on this project for several years. It started as an attempt to get rid of migrations and turned into what it is now — a typed object store for .NET over PostgreSQL and MSSQL.

It's not a weekend prototype. The free packages on NuGet are at version 2.0, there are 43 packages across the ecosystem, the architecture went through three rewrites, and as of this week it's been running 3 months on production at a 30-year national food distributor (more on that below).

This post is a technical walkthrough of redb.Core — what it is, how it differs from EF Core, what the generated SQL actually looks like, what the production workload looks like, and what's shipping next.

What redb.Core actually is

github.com/redbase-app/redb — Apache 2.0

RedBase stores typed C# objects in two tables (_objects + _values) over PostgreSQL or Microsoft SQL Server. Not JSON blobs. Not JSONB. Real typed columns with FK constraints — NUMERIC(38,18) for money, timestamptz for dates, uuid for GUIDs. Real B-tree indexes. ACID transactions.

The schema is your C# class:

[RedbScheme("Employee")]
public class EmployeeProps
{
    public string FirstName   { get; set; } = "";
    public string LastName    { get; set; } = "";
    public int    Age         { get; set; }
    public decimal Salary     { get; set; }
    public DateTime HireDate  { get; set; }
    public string[]? Skills   { get; set; }
    public Address? HomeAddress { get; set; }
    public Dictionary<int, decimal>? BonusByYear { get; set; }
}

That attribute is the entire schema definition. Call SyncSchemeAsync<EmployeeProps>() once — done. Add a property next sprint — redeploy, call sync again. Old objects still work. No migration files. No DBA ticket. No 2am rollback story.

// Save — entire object graph, one call
await redb.SaveAsync(employee);

// Load — full graph, arrays, dicts, nested classes — all materialized
var e = await redb.LoadAsync<EmployeeProps>(id);

// Query — real LINQ, real SQL
var seniors = await redb.Query<EmployeeProps>()
    .Where(e => e.Salary > 100_000 && e.Age >= 35)
    .OrderByDescending(e => e.Salary)
    .Take(50)
    .ToListAsync();

No DbContext. No Include chains. No Add-Migration. No mapper layer.

Object graphs in one call

This is the part that surprises people coming from EF. Props can contain other Props — single references, arrays, dictionaries — and the entire graph saves and loads as one operation:

[RedbScheme("Order")]
public class OrderProps
{
    public Customer Customer { get; set; }                          // nested class
    public Address ShippingAddress { get; set; }                    // nested class
    public Product[] Products { get; set; }                         // array of classes
    public RedbObject<PaymentProps>[] Payments { get; set; }        // array of full objects with own IDs
    public Dictionary<string, RedbObject<CouponProps>> Coupons { get; set; }  // dict of objects
    public Dictionary<(int Year, string Quarter), string> Reviews { get; set; } // tuple-key dict
}

await redb.SaveAsync(order);   // entire graph persisted, FK ordering handled
var loaded = await redb.LoadAsync<OrderProps>(id);
// loaded.Props.Customer.Address — ready
// loaded.Props.Payments[0].Props — ready (full RedbObject with own Id, DateCreate, etc.)
// loaded.Props.Coupons["SUMMER20"].Props — ready

In EF Core this would be 28 tables, ~40 Include/ThenInclude calls, manual junction tables for the many-to-many, and INSERT ordering that breaks every time someone adds a non-nullable FK without a default.

In RedBase: one SaveAsync, one LoadAsync. The nested RedbObject instances are real first-class objects — they have their own IDs, their own timestamps, they can be queried independently, they participate in tree structures. They are not denormalised JSON glued to the parent.

What the LINQ actually compiles to

Where(e => e.Salary > 100_000 && e.Age >= 35) doesn't get serialized to JSON and re-parsed (that's the Free engine's path — covered later). In Pro, the C# expression tree is walked node-by-node and emitted as parameterized SQL. Roughly:

WITH pvt AS (
  SELECT v._id_object,
         (array_agg(v._Numeric) FILTER (WHERE v._id_structure = $1))[1] AS "Salary",
         (array_agg(v._Long)    FILTER (WHERE v._id_structure = $2))[1] AS "Age"
    FROM _values v
   WHERE v._id_structure = ANY($3::bigint[])
     AND v._id_object IN (SELECT o._id FROM _objects o WHERE o._id_scheme = $4)
   GROUP BY v._id_object
)
SELECT o.*
  FROM _objects o
  JOIN pvt ON pvt._id_object = o._id
 WHERE pvt."Salary" > $5
   AND pvt."Age"    >= $6
 ORDER BY pvt."Salary" DESC
 LIMIT 50;

Parameterized. Plan-cached by PostgreSQL. One index scan on (_id_structure, _id_object), one aggregation pass, B-tree filter on flat columns. The number of filter fields doesn't change the shape of the query.

The C# → SQL compiler handles arithmetic (*, +, %), Math.*, String.Contains/StartsWith/Trim/ToLower, DateTime.Year/Month/..., nullable navigation (x.Address?.City) compiled to IS NOT NULL, the ternary operator compiled to CASE WHEN, StringComparison.OrdinalIgnoreCase compiled to native ILIKE, dictionary access dict["key"] compiled to pivot columns, and a few more edge cases that EF Core itself doesn't always handle.

You can preview the SQL of any query without executing it:

var sql = await query.ToSqlStringAsync();

Like IQueryable.ToQueryString() in EF, but works for trees, GroupBy, window functions too.

Why bulk save is so fast — two tables, two streams

Something to understand before the change-tracking section: the storage layout is two tables, so SaveAsync of a batch is two bulk operations, not N round-trips.

On PostgreSQL the provider uses Npgsql's BeginBinaryImportAsync — native COPY protocol, binary format:

// from redb.Postgres/Data/NpgsqlBulkOperations.cs
await using var writer = await conn.BeginBinaryImportAsync(
    "COPY _objects (_id, _id_parent, _id_scheme, _name, ...) FROM STDIN (FORMAT BINARY)");
foreach (var obj in objectsList) {
    await writer.StartRowAsync();
    await writer.WriteAsync(obj.Id,        NpgsqlDbType.Bigint);
    await writer.WriteAsync(obj.IdScheme,  NpgsqlDbType.Bigint);
    // ... typed writes
}
await writer.CompleteAsync();

For a batch save: one COPY stream writes the _objects rows, another writes the _values rows. Two streams, no per-row round-trips, no string-formatted INSERTs. MSSQL uses SqlBulkCopy for the same role.

This is why 1000 routes × ~40 fields = ~40,000 value rows save in tens of milliseconds inside the 200–300 ms budget. The bottleneck is the network round-trip and the COPY write, not the ORM machinery — there isn't any ORM machinery in the hot path.

Change tracking without DbContext (Pro)

DbContext keeps an in-memory snapshot of every entity you load — that's how it knows what changed. It's also why it isn't thread-safe and why the cache dies with the request.

RedBase Pro takes a different approach. With PropsSaveStrategy.ChangeTracking (Pro only; the free tier uses DeleteInsert), SaveAsync does this on a batch:

One bulk SELECT of existing values for all object IDs being saved.
Build two ValueTreeNode trees — one from your in-memory objects, one from the DB state.
Structural diff — subtrees with matching hashes are skipped entirely (no value comparison, no child traversal). Inserts, updates, and deletes are computed per node.
Three bulk operations — BulkInsertValuesAsync, BulkUpdateValuesAsync, BulkDeleteValuesAsync — each a single round-trip. Inserts go through COPY BINARY again.

Net effect: changing one field in a deeply nested object emits one UPDATE, not a full delete-and-reinsert of the entire props graph. And the comparison happens in C# on the application side — no DbContext lifetime to worry about, safe to run from a background Channel consumer or a Parallel.ForEach.

(In the production code from the previous section, the application also does its own obj.ComputeHash() check at the route level. That part is optional business-code — you could call SaveAsync on every route and the Pro tree-diff would still skip unchanged values internally. It's there as a coarse pre-filter so unchanged routes don't even enter the save pipeline at all, saving the diff work too.)

Production deployment — the numbers

The biggest deployment right now:

30-year national HoReCa food distributor
~150,000 orders/month, ~20,000 B2B customers, 600+ cities
3-node cluster, 4 cores / 8 GB RAM / 50 GB SSD per node
~550 daily internal users (operators, drivers, supervisors, dispatch, back-office)
10–15% CPU under full load
Integrations: SAP, Kafka, RabbitMQ, GPS feeds, Mercury / EGAIS / government APIs

Three months in production, no data-layer incidents. Two projects in the company use it, the second one came after the first one proved stable.

Real workload, real timings

The hottest pipeline is the SAP monitoring sync. Every 60 seconds a SQL polling consumer calls a stored procedure on SAP S/4 (usp_TsUM_MonitoringReport_xml), gets ~1000 transportation orders back as XML, syncs reference dictionaries (drivers, vehicles, list items), bulk-loads existing routes from RedBase, hash-compares each one, and saves only the changed ones.

The whole loop fits in ~200–300 ms for ~1000 routes. The actual production code:

From("direct://tsum")
    .RouteId("tsum-processing")
    .ProcessWithRedb(async (redb, exchange, ct) =>
    {
        var orders = (List<TransportationOrder>)exchange.In.Body;

        var sw = Stopwatch.StartNew();

        // 1. Sync dictionaries (Drivers, Vehicles, ListItems) — only new/changed
        var dicts = await DictionarySyncService.SyncFromOrdersAsync(redb, orders, ct);
        if (dicts.DriversNew + dicts.DriversChanged + dicts.VehiclesNew
            + dicts.VehiclesChanged + dicts.ListItemsRelinked > 0)
            await RefDataCache.RefreshAsync(redb);
        var syncMs = sw.ElapsedMilliseconds;

        // 2. Bulk load existing routes by Code — one query, ~1000 codes
        var codes = orders.Select(o => o.Code).ToList();
        var existing = await redb.Query<TransportationRoute>()
            .WhereRedb(o => codes.Contains(o.ValueString!))
            .ToListAsync();
        var existingByCode = existing.ToDictionary(o => o.ValueString!, o => o);

        // 3. Merge: update if hash changed, insert if new, skip if unchanged
        var toSave = new List<IRedbObject>();
        var updatedCount = 0;
        var skippedCount = 0;
        foreach (var order in orders)
        {
            var routeProps = MapOrderToRouteProps(order, dicts);
            if (existingByCode.TryGetValue(order.Code, out var obj))
            {
                var hashBefore = obj.ComputeHash();
                EnrichRouteFromOrder(obj.Props!, routeProps);
                if (obj.ComputeHash() != hashBefore) { toSave.Add(obj); updatedCount++; }
                else skippedCount++;
            }
            else
            {
                toSave.Add(new RedbObject<TransportationRoute>
                {
                    name = $"Route {order.Code}",
                    value_string = order.Code,
                    Props = routeProps
                });
            }
        }

        // 4. One batched save — mixed inserts + updates
        if (toSave.Count > 0)
            await redb.SaveAsync(toSave);

        Logger.LogInformation(
            "[TSUM] orders={Orders} routes(+{Created} ~{Updated} ={Skipped}) " +
            "drivers(+{DN} ~{DC}) vehicles(+{VN} ~{VC}) " +
            "sync={Sync} query={Query} save={Save} total={Total}ms",
            orders.Count, toSave.Count - updatedCount, updatedCount, skippedCount,
            dicts.DriversNew, dicts.DriversChanged,
            dicts.VehiclesNew, dicts.VehiclesChanged,
            syncMs, queryMs, saveMs, sw.ElapsedMilliseconds);
    });

TransportationRoute has ~40 fields and 12 RedbListItem references (Driver, Vehicle, CarMark, ShippingPoint, BusinessType, PlaceTo, PlaceFrom, LoadingZone, TransportStatus, Risk, DeliveryStatus, LoadStatus) plus 2 object references to AD users. Every single one is a foreign key in the database. None of them require a JOIN at query time — the materializer handles it.

Smaller queries (point-lookup of a single route, dictionary fetch, REST endpoints for the UI) run in 50–100 ms including HTTP overhead.

What's actually in the database

After running this in production, the storage looks like this:

~1000 transportation routes/day, plus delivery points, transport snapshots, garage states, slice settings, slice snapshots, drivers, vehicles, yard places, AD user refs — about a dozen [RedbScheme] classes in active use
~500,000 objects in _objects (routes, points, snapshots, dictionary items)
~15,000,000 rows in _values (every typed property of every object)
All of that lives in 2 tables — _objects and _values — plus the system tables for schemes, structures, lists, users, permissions

15 million typed value rows. Two tables. No 30-table schema. No 200 migration files. The query times above are on this dataset.

If you tried to model the same domain in EF Core flat tables, you'd end up with roughly: Routes, RoutePoints, TransportSnapshots, GarageStates, SliceSettings, SliceSnapshots, Drivers, Vehicles, CarMarks, ShippingPoints, BusinessTypes, YardPlaces, LoadingZones, TransportStatuses, DeliveryStatuses, Risks, TripRisks, AdUsers — plus junction tables and audit tables for each. 30+ tables, dozens of migrations, and every schema change is a deploy event.

In RedBase the schema change is git push. The next InitializeAsync() call adds the new structure rows. Done.

What would EF Core look like here?

I asked myself the same question before starting. Let's count what EF would need:

TransportationRoute entity → 1 table
12 ListItem references → 12 lookup tables + 12 nullable FKs
2 AD user references → 2 more FKs
The 1000-route batch update → 1000 entities tracked in DbContext for change detection
Hash-based skip of unchanged objects → doesn't exist in EF out of the box; you write it manually
The same logic across multiple processes/routes → DbContext is per-request, the change-tracker cache dies at the end of every batch

The realistic EF flow looks like this:

// Load existing — needs Include for every lookup or you get N+1
var existing = await db.Routes
    .Include(r => r.Driver)
    .Include(r => r.Vehicle)
    .Include(r => r.CarMark)
    .Include(r => r.ShippingPoint)
    .Include(r => r.BusinessType)
    .Include(r => r.PlaceTo)
    .Include(r => r.PlaceFrom)
    .Include(r => r.LoadingZone)
    .Include(r => r.TransportStatus)
    .Include(r => r.Risk)
    .Include(r => r.DeliveryStatus)
    .Include(r => r.LoadStatus)
    .Include(r => r.RiskSetBy)
    .Include(r => r.KcResponsible)
    .Where(r => codes.Contains(r.Code))
    .ToListAsync();

// Map, mutate, SaveChanges — 1000 tracked entities, full diff snapshot in RAM
// SaveChanges fires N UPDATE statements (one per row) in a transaction
await db.SaveChangesAsync();

In practice on a workload like this:

The Include chain on 14 lookups produces a query that PostgreSQL/MSSQL can't always optimise cleanly (cartesian explosion risk; you split-query and pay multiple round trips).
SaveChanges on 1000 modified entities emits 1000 individual UPDATEs unless you reach for EFCore.BulkExtensions or similar third-party libraries. RedBase ships COPY BINARY for inserts and a batched UPDATE path for updates in the box.
The DbContext snapshot of 1000 tracked entities, each with 14 included lookups, is real memory pressure. AsNoTracking() is faster but you lose change detection — you have to re-implement it.
The 200–300 ms budget on a 4-core container is not realistic in this scenario without significant manual optimisation and probably a separate bulk-update path.

I'm not saying EF can't do it. I'm saying the equivalent EF implementation is more code, more moving parts, and significantly slower without third-party packages. RedBase ships a single SaveAsync(toSave) and a hash-based skip primitive in the box.

The hash comparison itself (obj.ComputeHash() != hashBefore) is application-level business code — it's optional; the Pro tree-diff inside SaveAsync would skip unchanged values anyway. But used like this it lets the application skip the ~95% of routes that didn't change between SAP polls before they even enter the save pipeline. Combined with COPY BINARY into two tables and the Pro tree-diff for the routes that do change, the whole loop fits the 200–300 ms budget. No equivalent set of built-in primitives exists in EF Core; you either snapshot manually, re-save everything, or pull in third-party bulk packages.

What EF Core does with complex objects (vs what RedBase does)

The E000 benchmark uses EmployeeProps — a realistic model with nested classes, arrays, dictionaries, and RedbObject references:

What classic ORM needs	RedBase
~28 tables	2 tables
FK ordering on every INSERT	Single `SaveAsync`
40+ `Include`/`ThenInclude` calls	`LoadAsync<T>(id)` — one line
Migration file for every new field	Add property, call `SyncSchemeAsync`, done
~5,000 INSERTs for 100 employees	1 `BulkInsert` via COPY protocol

For 100 complex employees: EF Core ≈ 4,000–6,000 INSERTs across 28 tables in FK order. RedBase: ~3,000 typed value rows, one COPY command.

And the test results are published — 525 automated tests across all editions and both databases. All green.

The query engine: free vs Pro

This is where it gets technically interesting.

Free edition compiles your LINQ lambda to a JSON facet format, then calls a plpgsql stored procedure that parses that JSON and generates SQL dynamically. For simple 1–2 field filters, PostgreSQL optimizes correlated EXISTS subqueries well. It works.

Pro edition walks the C# expression tree directly in C# (ExpressionToSqlCompiler), emits native parameterized SQL. Same plan every call — PostgreSQL can cache it. 3–10× faster on complex multi-field filters.

But here's what's happening right now: I'm porting the Pro PVT CTE query engine into the free tier. The plan is full parity on PVT — same CTE shape, same expression coverage, same projection capabilities. Free and Pro will speak the same query dialect.

The remaining differences between Free and Pro will be elsewhere:

Query plan caching. Pro builds parameterized SQL in C# (ExpressionToSqlCompiler) — same shape every call, PostgreSQL caches the plan once and reuses it. The free tier generates SQL inside plpgsql with literal values inlined (properly escaped to prevent injection), so each call can produce a slightly different plan. Same correctness, different plan-cache behavior.
Materialization. Pro has a parallel materializer (multiple values streams hydrated concurrently into the object graph). Free uses a simpler sequential path.
Save strategy. PropsSaveStrategy.ChangeTracking (tree-diff with hash-based subtree skip) is Pro-only. Free uses DeleteInsert — still bulk via COPY, but no per-field diff.

So Pro stays faster on hot paths and large object graphs, but the query language itself — what you can write in .Where, .Select, projections, grouping, having, FTS, regex — will be the same on both sides.

Show me the SQL it generates

Here's a real example — a projection query on EmployeeProps with 16 computed columns:
full names, salary calculations, date extractions, seniority classification via CASE, coalesce for nullable fields, explicit type cast. This is the free tier:

WITH _pvt_cte AS (
    SELECT
        v._id_object,
        (array_agg(v._Long)    FILTER (WHERE v._id_structure = 1000017))[1] AS "Age",
        (array_agg(v._Numeric) FILTER (WHERE v._id_structure = 1000020))[1] AS "Salary",
        (array_agg(v._DateTimeOffset) FILTER (WHERE v._id_structure = 1000018))[1] AS "HireDate",
        (array_agg(v._String)  FILTER (WHERE v._id_structure = 1000016))[1] AS "LastName",
        (array_agg(v._String)  FILTER (WHERE v._id_structure = 1000015))[1] AS "FirstName"
    FROM _values v
    WHERE v._id_structure = ANY(ARRAY[1000015,1000016,1000017,1000018,1000020]::bigint[])
      AND v._id_object IN (SELECT o._id FROM _objects o WHERE o._id_scheme = 1000014)
    GROUP BY v._id_object
)
SELECT
    o._id                                                          AS id,
    ("FirstName" || ' ' || "LastName")                            AS full_name,
    UPPER("FirstName")                                             AS upper_name,
    LENGTH("FirstName")                                            AS name_len,
    ("Salary" * 12)                                                AS yearly_x12,
    (("Salary" * 0.15) - ("Salary" * 0.013))                      AS bonus_after_tax,
    ("Salary" / ("Age" + 1))                                       AS avg_per_year,
    ("Age" % 10)                                                    AS age_mod10,
    ABS(("Age" - 35))                                              AS abs_diff,
    FLOOR(("Salary" / 2))                                          AS floor_half_salary,
    EXTRACT(YEAR  FROM "HireDate")                                 AS hire_year,
    EXTRACT(MONTH FROM "HireDate")                                 AS hire_month,
    EXTRACT(YEAR  FROM AGE("HireDate", '2026-05-20'))              AS tenure_years,
    CASE
        WHEN ("Age" >= 60) THEN 'senior'
        WHEN ("Age" >= 35) THEN 'mid'
        ELSE 'junior'
    END                                                            AS seniority,
    COALESCE("FirstName", '<unknown>')                             AS display_name,
    ("Age")::text                                                  AS age_as_text
FROM _pvt_cte
JOIN _objects o ON o._id = _pvt_cte._id_object
WHERE "Salary" > '0'::numeric
ORDER BY "Salary" DESC
LIMIT 3

One _values scan. One GROUP BY. Sixteen computed output columns. Standard SQL — no magic, fully readable in pgAdmin, fully EXPLAIN ANALYZE-able.

This query was generated by pvt_build_projection_sql() — a plpgsql function that is shipping in the next free release.

When this fits and when it doesn't

Honest take.

Fits well: business apps with collections, nested structures, hierarchies, dictionaries, or schemas that change often. Apps where a business analyst can ask "add a field" and you want to ship the same day. Anything where you'd otherwise end up with 28 tables for one logical entity.

Doesn't fit: flat reporting databases with two or three fixed tables. Use Dapper there — it's honest and fast.

The common pushback is: "EF Core is good enough for 80% of projects." We'd flip that around: RedBase fits 80% of typical business projects — anything with collections, nested structures, lookup tables, hierarchies, dictionaries, or a schema that evolves alongside the product. The 20% where it's overkill is the genuinely flat case: two or three fixed tables, pure reporting, no evolving model. Use Dapper there; it's honest and fast. For everything else — which is most business software — the EF migration tax is real and it compounds with every sprint.

Three more axes that usually get left out of that comparison:

Extensibility. Adding a field is a property in C# and one InitializeAsync() call — no migration script, no review, no deploy window. The same change in an EF stack is at minimum: model edit + Add-Migration + reviewed SQL + coordinated deploy. Multiply by every iteration in a year.
Floor skill level. With RedBase, a junior developer who knows C# can add entities, fields, queries, and relationships on day one — the schema is just a class, the query is just LINQ. With EF Core on a complex model, you need to understand N+1, cartesian explosion from Include chains, DbContext lifetime, migration conflicts on shared branches, and when to reach for AsNoTracking(). That knowledge takes months to build and years to apply consistently. RedBase moves that complexity into the framework and out of the application code.
Cost of development. On the TsUM project the data-layer work — schemas, storage, queries, sync pipelines, change-tracking, bulk save — came in at roughly 128 person-hours. The estimate for the same scope on a classic EF + handwritten bulk-update + ASP.NET Identity + 30-table migration stack was on the order of 3000 person-hours. That's not a typo and it's not marketing — it's the gap between "add a property, redeploy" and "add a column, write migration, update DTOs, update mapper, update repository, write tests for all of it". The 80%/20% framing only holds if you count lines of code; once you count engineer-hours over a project lifetime it inverts.

What else is in the box

195+ working examples in the repo. The shortlist of things people usually don't expect to be built-in:

// Tree queries — recursive CTE underneath, no CTE to write yourself
var products = await redb.TreeQuery<Product>(londonHQ.Id, maxDepth: 10)
    .Where(p => p.InStock)
    .ToListAsync();

// Soft delete with atomic mark + background purge + progress in DB
await redb.SoftDeleteAsync(ids, user);

Also shipping in the core (this is the short list — the real list is much longer):

GroupBy + window functions in one query
Built-in users / roles / permissions (no ASP.NET Identity dependency, no separate auth schema)
Export / import via .redb files (PostgreSQL ↔ MSSQL portability)
Multi-database in one process with full domain isolation — separate connections, separate scheme caches, separate object caches per domain; one app can talk to several independent RedBase databases at the same time without their metadata leaking into each other
Scheme cache and object cache (both per-domain), invalidated on SyncSchemeAsync / SaveAsync, used by the materializer to skip repeated metadata lookups
Atomic soft-delete with background purge and progress visible in the DB
Tree queries with recursive CTE
Polymorphic trees — different C# types at each level of the same tree

There's a lot more that doesn't fit in one section. The architecture page has the full map.

What's coming next

The FreePvtQuery release is the nearest milestone. This is a complete rewrite of the free tier's query engine based on the PVT CTE architecture — bringing it to full parity with Pro on the query side:

Single-pass _values scan (vs correlated EXISTS per field)
Full expression system: arithmetic, string, date, math, regex, FTS, CASE, coalesce, cast
Projections with arbitrary computed columns (like the SQL above)
HAVING, DISTINCT ON, GROUP BY extensions

After that: MSSQL support for the new engine, C# LINQ-to-JSON bridge so you keep writing .Where(e => e.Salary > 100_000) without touching JSON. Pro will keep its edge on plan caching (parameterized SQL), parallel materialization, and tree-diff change tracking.

This is maybe 10% of what's in there

This post covered the basics: storage model, query engine, free PVT demo, production deployment. Topics I didn't touch:

Polymorphic tree hierarchies (different C# types at every level, same tree API)
Object graphs with RedbObject<T> references inside Props
Data migrations (Pro): fluent API, computed columns, in-place type changes
Export/import: .redb files (JSONL/ZIP), PostgreSQL ↔ MSSQL portability
Multi-database in one process (domain-isolated caches)
redb.CLI — schema management from command line
redb.PropsEditor — runtime props editing UI
Integration with redb.Route (22-transport pipeline engine) — separate article

If there's a direction you want covered next — migrations, polymorphic trees, benchmarks, internals of the diff-tree change tracking — drop it in the comments and that's what the next post will be.

Top comments (2)

BlinkNBuild • May 21

The "zero migrations" constraint is genuinely interesting as a forcing function — it pushes all schema evolution into the application layer, which trades migration complexity for query complexity. Curious how you handle multi-version deploys: if two app versions are running simultaneously during a rollout and they disagree on what's valid in the Properties bag, do you enforce a read-time schema contract or just let old readers ignore unknown keys?

rinat kozin • May 21 • Edited

Good question — this is the constraint, not the strength,
of the design, so let me be precise about what actually
happens in the code.

Scheme metadata in REDB lives in two physical tables:
_schemes (one row per Props class) and _structures
(one row per field, including nested classes / array
elements / dictionary entries). Each app instance
loads the scheme cache from the DB —
SELECT * FROM _structures WHERE _id_scheme = ? —
not from its own C# class. So old and new instances
can see different sets of structures, and which side
"wins" depends on who synced last.

That split divides the problem cleanly into reads
and writes.

READS — graceful by construction. The materializer
iterates structures, looks up _values rows by
(_id_object, _id_structure), and pushes them into
the Props class via reflection. If a structure
exists in DB but has no matching property on the
C# class (old reader, new field), SetPropertyValue
has nothing to assign to and the value is silently
ignored. No exception, no fallback. An old binary
keeps reading rows written by a new binary just fine.

WRITES — this is where rolling deploys need
discipline today, and I won't pretend otherwise.
Two save strategies, both with the same net effect
for unknown structures:

Free tier (DeleteInsert): SaveAsync does DELETE FROM _values WHERE _id_object IN (...) and then COPY BINARY of the new tree. No filter by _id_structure — it wipes everything for that object.
Pro tier (ChangeTracking, tree-diff): builds an in-memory tree from the C# instance and a DB tree from existing _values, both projected through the scheme cache. Nodes present in DB but absent in memory get marked Delete. So if the old version saves an object that the new version had enriched with extra structures, those extra _values get removed.

There's also a third edge that's worth being honest
about: the default of SyncStructuresFromTypeAsync is
strictDeleteExtra = true. AutoSyncSchemesAsync at
startup will remove _structures rows that aren't in
the running binary's Props classes. That means an
old version restarting mid-rollout can wipe schema
additions a newer version made — DDL-level data loss,
not just per-row. Both of these defaults are on my
list to flip: strictDeleteExtra → false by default,
and a new save mode PreserveUnknownStructures that
filters the Delete set in both engines to "only
structures the current C# class knows about". The
result would be: read graceful, write graceful,
sync graceful — concurrent multi-version writers
survive at the cost of saying explicitly "I don't
manage what I don't know".

What makes the rollout policy actually work in
production today is that we don't run the app as a
single monolithic binary. Write paths live inside
named "routing contexts" hosted by a small runtime
container we built on top of REDB and open-sourced
alongside it — redb.Route (the routing primitives,
~20 transport packages on NuGet incl. Kafka, RabbitMQ,
gRPC, Tcp, etc.) and redb.Tsak (the context container

admin API + Blazor dashboard, repo at github.com/redbase-app/redb-tsak). Each context has an explicit lifecycle: the API can mark a context for stop, the container stops dispatching new messages to it, in-flight handlers are allowed to finish naturally (no cancelled tokens through user code), and only then the context is reported Stopped. So "drain old writers before flipping schema" is not a manual checklist — it's an API call with a deterministic Stopped signal we can wait on before the new binary takes over the write path. That's what lets us treat strictDeleteExtra=false + single-writer-per-object as a workable contract today, in real production (3-node cluster, ~550 users, ~500k objects, ~15M _values rows, 3 months in), rather than as theory.

Concretely the playbook is:

Schema changes are additive. New fields go into Props classes, deploy goes out, new structure rows land in _structures the first time the new version runs InitializeAsync.
Old write contexts are drained via the runtime API before the new binary becomes the writer; reads stay live throughout (graceful path above). strictDeleteExtra = false is set in config so a stray old-version restart can't wipe new schema rows.
Removals go in two phases: mark obsolete in code one release, physically drop the field the next release after all writers are confirmed on the new version.

So in honest terms: the design forces schema
evolution into the application layer, which buys
you "no migration windows and no lock-the-table"
but costs you a deployment-coordination contract —
the same shape of problem as adding a NOT NULL
column under EF, with the cost moved rather than
removed. The interesting work now is making the
engine make that contract opt-in-safe instead of
leaning on the runtime to enforce it.

If you've solved this shape of problem in another
engine I'd genuinely like to hear how — your
question is exactly what pushed PreserveUnknownStructures
from "nice to have" to "next sprint".

DEV Community