A couple of weeks ago I published the redb.Core intro post — what RedBase is at the API level, why I wrote it, what production looks like, the LINQ surface, what generated SQL looks like for nested dictionary lookups. If you haven't read it, start there — it's the wide-angle shot.
This post starts a new series — "REDB inside" — that drills down into the engine. One article per layer:
- Part 1 (this post) — the database schema. 13 tables, what each one does, why the design is what it is, and the SQL you'd run to dump any object flat.
- Part 2 — code-first schemes. How
SyncSchemeAsync<T>walks a C# class and turns it into rows in_schemes+_structures, the_structure_hashmechanism, automatic onboarding viaInitializeAsync. - Part 3 — CRUD internals.
SaveAsync,LoadAsync, the TreeDiff change-tracking algorithm, COPY-protocol bulk insert, lazy loading. - Part 4 — LINQ-to-SQL. How
Where(x => x.Salary > 80000)becomesWHERE _id_structure = X AND _Long > 80000, the pivot CTE patterns, dialect differences (Postgresarray_agg FILTERvs MSSQLMAX CASE WHEN). - Part 5 — trees.
LoadTreeAsync,GetDescendantsAsync,WhereHasAncestor, closure-table vs recursive CTE. - Part 6 — window functions.
Win.RowNumber(),Win.Rank(),PartitionBy/OrderByover REDB objects.
Each one stands alone. You don't need to read them in order — but if you want to understand why anything in parts 2-6 works the way it does, you need this post. Everything else is built on the 13 tables.
The whole RedBase engine runs on 13 tables. No JSON blob hiding the schema, no
NVARCHAR(MAX)catch-all column — every C# type lands in its own typed column. Let me show you how that works, and why this isn't classical EAV even though it kind of looks like it at first glance.
"Wait, isn't this just EAV?"
It's the most common reaction I get, and it deserves a real answer before we look at any DDL.
Classical EAV (Entity–Attribute–Value) looks like this:
object_id | attribute_name | value
----------|----------------|---------
42 | "FirstName" | "Alice"
42 | "Age" | "28"
42 | "Salary" | "85000"
Everything in one table. Types erased. Attribute names are strings. Any non-trivial query becomes a self-join or a giant PIVOT. The filter "employees earning over $80k" turns into:
SELECT object_id
FROM attributes
WHERE attribute_name = 'Salary'
AND value::numeric > 80000; -- runtime cast, no usable index
Add three more conditions and you're looking at three self-joins on the same table. Explain plans get embarrassing.
REDB doesn't do that. Here's what _values actually looks like:
CREATE TABLE _values (
_id bigint NOT NULL,
_id_structure bigint NOT NULL, -- FK → _structures (which field this is)
_id_object bigint NOT NULL, -- FK → _objects
-- typed value columns, exactly one is non-NULL per row:
_String text NULL,
_Long bigint NULL,
_Guid uuid NULL,
_Double float NULL,
_DateTimeOffset timestamptz NULL,
_Boolean boolean NULL,
_ByteArray bytea NULL,
_Numeric numeric(38,18) NULL,
_ListItem bigint NULL, -- FK → _list_items
_Object bigint NULL, -- FK → _objects (cross-object reference)
-- relational collection storage:
_array_parent_id bigint NULL, -- FK → _values (parent element)
_array_index text NULL -- '0','1','2' for arrays, key for dictionaries
);
The field identity is a foreign key (_id_structure), not a string. The value lives in a typed column chosen at write time based on the field's declared C# type. Think of it as runtime type information (RTTI) persisted into the schema: the engine always knows what type each field is, because that's recorded once in _structures._id_type and reused for every value of that field.
Reading values back is one CASE expression dispatched on type, not a self-join:
CASE c.db_type
WHEN 'String' THEN v._String
WHEN 'Long' THEN v._Long::text
WHEN 'Guid' THEN v._Guid::text
...
END
The practical differences:
| Classical EAV | REDB _values
|
|
|---|---|---|
| Field identity | string in attribute_name
|
FK → _structures → _types
|
| Where the value lives | one text/varchar(max) column |
typed column per C# type |
Filter Salary > 80000
|
WHERE attribute_name='Salary' AND value::numeric > 80000 |
WHERE _id_structure = $1 AND _Long > 80000 |
| Index on the value | string index + runtime cast | partial B-tree index on the typed column |
| Arrays/dictionaries | separate table or JSON blob |
_array_parent_id + _array_index in the same row |
| Schema metadata | implicit in attribute names | first-class rows in _schemes/_structures/_types, denormalized into a metadata cache |
So yes, the row shape rhymes with EAV — but the type system and the indexing story are completely different. That's why I've been pushing back on the EAV label.
The 13 tables, at a glance
_types — type catalog (~37 system rows)
_schemes — schemes (C# classes mapped to DB rows)
_structures — fields of schemes (with nesting and collection metadata)
_objects — objects (data rows, tree-shaped via self-FK)
_values — field values (typed columns + relational collections)
_lists — pick-list/dictionary catalog
_list_items — pick-list entries
_users — users (system IDs −1, 0, 1)
_roles — roles
_users_roles — M2M user ↔ role
_permissions — permissions on objects (inherited along the tree)
_links — M2M relations between objects
_functions — stored expressions attached to schemes
_dependencies — cross-scheme dependencies
─────────────────────────────────────────────
_scheme_metadata_cache — denormalized cache of structures × types
_migrations — history of props-schema migrations
The last two live in their own SQL files but matter just as much in practice. Let's walk through them layer by layer.
Layer 1 — the type catalog: _types
CREATE TABLE _types (
_id bigint NOT NULL PRIMARY KEY,
_name text NOT NULL UNIQUE,
_db_type text NULL, -- which _values column to use: 'Long', 'String', 'Guid', ...
_type text NULL -- C# type name: 'long', 'string', 'Guid', ...
);
System type IDs are negative constants near long.MinValue. A small sample:
_id |
_name |
_db_type |
C# type |
|---|---|---|---|
| -9223372036854775709 | Boolean | Boolean | bool |
| -9223372036854775708 | DateTime | DateTimeOffset | DateTime |
| -9223372036854775704 | Long | Long | long |
| -9223372036854775700 | String | String | string |
| -9223372036854775695 | Decimal | Numeric | decimal |
| -9223372036854775675 | Class | — | nested class (marker only) |
| -9223372036854775668 | Array | — |
T[] / List<T> (marker only) |
| -9223372036854775667 | Dictionary | — |
Dictionary<K,V> (marker only) |
The scary-looking numbers are just constants picked far from anything a user-generated key could ever hit (user IDs start at 1_000_000 via a global_identity sequence). Class, Array, and Dictionary have no column of their own in _values — they're marker types; the actual leaves live in child rows.
There are ~37 built-in types total. Numeric ones (Int, Short, Byte, Float) physically store as Long/Double. Strings include semantic variants (Email, Url, Phone) that all use _String. Then DateOnly/TimeOnly/TimeSpan, geo (Latitude/Longitude), file metadata (FilePath/MimeType), Enum/EnumInt, and collection markers (Array/Dictionary/JsonDocument/XDocument).
Layer 2 — schemes and fields: _schemes + _structures
CREATE TABLE _schemes (
_id bigint NOT NULL,
_id_parent bigint NULL, -- nesting (namespaces)
_name text NOT NULL UNIQUE, -- e.g. 'MyApp.Models.EmployeeProps'
_alias text NULL,
_structure_hash uuid NULL, -- field hash for fast change detection
_type bigint NOT NULL -- FK → _types (Class by default)
);
CREATE TABLE _structures (
_id bigint NOT NULL,
_id_parent bigint NULL, -- nested props class
_id_scheme bigint NOT NULL, -- FK → _schemes
_id_type bigint NOT NULL, -- FK → _types
_id_list bigint NULL, -- FK → _lists (for ListItem fields)
_name text NOT NULL, -- C# property name
_alias text NULL,
_order bigint NULL,
_collection_type bigint NULL, -- NULL=scalar, Array_ID or Dictionary_ID
_key_type bigint NULL, -- key type for Dictionary<K,V>
_readonly boolean NULL,
_allow_not_null boolean NULL,
_is_compress boolean NULL,
_store_null boolean NULL,
_default_value bytea NULL
);
When you write:
[RedbScheme("Employee")]
public class EmployeeProps
{
public string FirstName { get; set; } = "";
public int Age { get; set; }
public decimal Salary { get; set; }
public string[]? Skills { get; set; }
public Address? HomeAddress { get; set; } // nested class
}
…and await redb.SyncSchemeAsync<EmployeeProps>() fires for the first time, the engine:
- Inserts a row into
_schemeswith_name = "MyApp.EmployeeProps". - Inserts one
_structuresrow per public property. - For
Skills: sets_collection_type = Array_ID. - For
HomeAddress: sets_id_type = Class_IDand recursively creates child_structuresrows whose_id_parentpoints back at the parent structure.
Then it computes a hash over all those structures and writes it to _schemes._structure_hash. Next time you call sync, comparing one UUID tells the engine whether anything actually changed — no row-by-row diff needed.
There's a DB-level trigger that validates field names: no system-reserved (_id, _name, _date_create), no C# keywords (class, int, string), no leading digits. If you accidentally try to name a property int, the INSERT throws before the bad row ever lands.
Layer 3 — objects: _objects
CREATE TABLE _objects (
_id bigint NOT NULL,
_id_parent bigint NULL, -- tree parent (self-FK)
_id_scheme bigint NOT NULL, -- FK → _schemes
_id_owner bigint NOT NULL, -- FK → _users
_id_who_change bigint NOT NULL, -- FK → _users
_date_create timestamptz NOT NULL,
_date_modify timestamptz NOT NULL,
_date_begin timestamptz NULL,
_date_complete timestamptz NULL,
_key bigint NULL,
_name text NULL,
_note text NULL,
_hash uuid NULL,
-- value columns for RedbPrimitive<T>:
_value_long bigint NULL,
_value_string text NULL,
_value_guid uuid NULL,
_value_bool boolean NULL,
_value_double float NULL,
_value_numeric numeric(38,18) NULL,
_value_datetime timestamptz NULL,
_value_bytes bytea NULL
);
Three things worth pointing out:
The tree via _id_parent is ON DELETE CASCADE. Drop a root, the whole subtree goes with it. Depth is unbounded. This is the primary organizational structure in REDB: sections, categories, folders, org charts, project trees — they're all just _objects rows pointing at a parent.
The _value_* columns are for RedbPrimitive<T>. When an object is conceptually a single primitive (e.g. RedbObject<long> for a counter, RedbObject<string> for a token), there's no need to spin up _values rows — the value rides in the object row itself. Eight columns, one per _db_type.
Soft delete is a scheme called @@__deleted (_id = -10). mark_for_deletion() walks the subtree via recursive CTE and atomically reparents everything under a trash container with _id_scheme = -10. Actual physical deletion is a separate, batched purge_trash(). This means you can offer "undelete" cheaply, and your data lake/CDC tools never see a destructive delete on the hot path.
Layer 4 — the values: _values
This is the table that earns its keep. Everything else exists to make this one fast and consistent.
The DDL was up top. The interesting part is how collections fit into a flat row layout without a side table.
Scalar field
Age = 28 produces exactly one row:
_id_structure=struct_Age _id_object=42 _Long=28 _array_parent_id=NULL _array_index=NULL
Array of primitives
Skills = ["C#", "SQL", "React"] produces a marker row plus one row per element:
-- marker: "the array property exists" (without it, the property is NULL, not [])
_id=100 _id_structure=struct_Skills _id_object=42 _array_index=NULL _array_parent_id=NULL
-- elements; _array_parent_id points at the marker; _array_index is the position
_id=101 _id_structure=struct_Skills _id_object=42 _String="C#" _array_index='0' _array_parent_id=100
_id=102 _id_structure=struct_Skills _id_object=42 _String="SQL" _array_index='1' _array_parent_id=100
_id=103 _id_structure=struct_Skills _id_object=42 _String="React" _array_index='2' _array_parent_id=100
That marker row matters: it's how the engine tells null (no marker, no elements) apart from [] (marker present, zero children). The same shape works for empty dictionaries too.
Dictionary
PhoneDir = { "home": "+7 999…", "work": "+7 495…" }:
-- marker
_id=200 _id_structure=struct_PhoneDir _id_object=42 _array_index=NULL _array_parent_id=NULL
-- entries; _array_index holds the dictionary key (as text)
_id=201 _id_structure=struct_PhoneDir _id_object=42 _String="+7 999..." _array_index='home' _array_parent_id=200
_id=202 _id_structure=struct_PhoneDir _id_object=42 _String="+7 495..." _array_index='work' _array_parent_id=200
_array_index is text precisely so dictionaries with string keys work without a separate table. Numeric dictionaries store keys as their string representation.
Nested class
HomeAddress.City = "Moscow" works the same way. The _structures rows for Address.City, Address.Street, etc. carry an _id_parent pointing at the parent structure (HomeAddress). The _values rows for those leaves carry an _array_parent_id pointing at the marker row for HomeAddress on this particular object.
Three unique indexes keep all of this consistent
CREATE UNIQUE INDEX UIX__values__structure_object
ON _values (_id_structure, _id_object)
WHERE _array_index IS NULL AND _array_parent_id IS NULL;
CREATE UNIQUE INDEX UIX__values__structure_object_parent
ON _values (_id_structure, _id_object, _array_parent_id)
WHERE _array_index IS NULL AND _array_parent_id IS NOT NULL;
CREATE UNIQUE INDEX UIX__values__structure_object_array_index
ON _values (_id_structure, _id_object, _array_parent_id, _array_index)
WHERE _array_index IS NOT NULL;
These three together guarantee: (a) at most one scalar/marker row per (structure, object), (b) at most one nested-class marker per (structure, object, parent), and (c) at most one element per (structure, object, parent, index). Try to insert a duplicate array element and the DB rejects it before any logic bug can corrupt the shape.
Layer 5 — permissions: _permissions
CREATE TABLE _permissions (
_id bigint NOT NULL,
_id_role bigint NULL, -- XOR with _id_user (CHECK constraint)
_id_user bigint NULL,
_id_ref bigint NOT NULL, -- 0 = global, otherwise FK → _objects
_select boolean NULL,
_insert boolean NULL,
_update boolean NULL,
_delete boolean NULL
);
Permissions inherit along the object tree. A recursive CTE walks upwards from the target object up to 50 levels looking for the nearest ancestor that has an explicit permission row. _id_ref = 0 is the global fallback ("can this principal touch anything at all?"). Resolution priority is: user > role, specific object > global.
There's an automatic trigger that, when a child object is created without its own permission row, copies down the resolved permission from the nearest ancestor. The point isn't to materialize every permission — it's to keep the recursive CTE short. After a few months of activity the depth the resolver has to climb stays roughly constant.
_scheme_metadata_cache — why a cache table
Every query needs to know: "for an object with _id_scheme = X, which _structures rows exist, and what's the type of each?" That's a JOIN through _structures → _types that would otherwise fire on every single read.
So that JOIN is denormalized into a separate table that gets refreshed when the scheme changes:
CREATE TABLE _scheme_metadata_cache (
_scheme_id bigint NOT NULL,
_structure_id bigint NOT NULL,
_parent_structure_id bigint,
_name text NOT NULL,
type_name text NOT NULL, -- 'Long', 'String', 'Guid', ...
db_type text NOT NULL, -- 'Long', 'String', 'Guid', ...
type_semantic text NOT NULL, -- 'Object', '_RObject', 'Array', ...
_collection_type bigint,
collection_type_name text,
_key_type bigint,
key_type_name text
-- ... all the other _structures attributes inlined
);
A trigger on _schemes._structure_hash invalidates the cache for that scheme; on the next read, sync_metadata_cache_for_scheme(scheme_id) rebuilds it lazily. The big build_hierarchical_properties_optimized() function — the one that materializes an object's full property tree into JSON — never JOINs _structures or _types directly. It reads from this cache, and only from this cache.
Two SQL queries: dump any object flat
Here are two queries that show exactly what's in the box for a given object. The first uses raw JOINs (use this for ad-hoc debugging in DataGrip/SSMS). The second uses _scheme_metadata_cache — what the engine actually runs at scale.
Query 1 — flat pivot, no cache
-- PostgreSQL
SELECT
s._name AS scheme_name,
st._name AS field_name,
t._name AS type_name,
CASE t._db_type
WHEN 'String' THEN v._String
WHEN 'Long' THEN v._Long::text
WHEN 'Guid' THEN v._Guid::text
WHEN 'Double' THEN v._Double::text
WHEN 'Boolean' THEN v._Boolean::text
WHEN 'DateTimeOffset' THEN v._DateTimeOffset::text
WHEN 'Numeric' THEN v._Numeric::text
WHEN 'ListItem' THEN v._ListItem::text
WHEN 'Object' THEN v._Object::text
WHEN 'ByteArray' THEN encode(v._ByteArray, 'base64')
ELSE NULL
END AS value_text,
CASE
WHEN v._array_parent_id IS NULL
AND v._array_index IS NULL
AND t._name IN ('Array','Dictionary','Class') THEN 'collection_marker'
WHEN v._array_parent_id IS NULL
AND v._array_index IS NULL THEN 'scalar'
WHEN v._array_parent_id IS NOT NULL THEN 'element[' || v._array_index || ']'
ELSE 'scalar'
END AS slot,
v._array_index,
v._array_parent_id
FROM _values v
JOIN _structures st ON st._id = v._id_structure
JOIN _schemes s ON s._id = st._id_scheme
JOIN _types t ON t._id = st._id_type
WHERE v._id_object = :object_id
ORDER BY st._order NULLS LAST, v._array_index NULLS FIRST;
-- MS SQL Server
SELECT
s._name AS scheme_name,
st._name AS field_name,
t._name AS type_name,
CASE t._db_type
WHEN 'String' THEN v._String
WHEN 'Long' THEN CAST(v._Long AS nvarchar(MAX))
WHEN 'Guid' THEN CAST(v._Guid AS nvarchar(MAX))
WHEN 'Double' THEN CAST(v._Double AS nvarchar(MAX))
WHEN 'Boolean' THEN CASE v._Boolean WHEN 1 THEN 'true' WHEN 0 THEN 'false' END
WHEN 'DateTimeOffset' THEN CAST(v._DateTimeOffset AS nvarchar(MAX))
WHEN 'Numeric' THEN CAST(v._Numeric AS nvarchar(MAX))
WHEN 'ListItem' THEN CAST(v._ListItem AS nvarchar(MAX))
WHEN 'Object' THEN CAST(v._Object AS nvarchar(MAX))
WHEN 'ByteArray' THEN N'<binary, base64 in app code>'
ELSE NULL
END AS value_text,
CASE
WHEN v._array_parent_id IS NULL AND v._array_index IS NULL THEN 'scalar'
WHEN v._array_parent_id IS NOT NULL THEN 'element[' + ISNULL(v._array_index,'') + ']'
ELSE 'scalar'
END AS slot,
v._array_index,
v._array_parent_id
FROM [dbo].[_values] v
JOIN [dbo].[_structures] st ON st._id = v._id_structure
JOIN [dbo].[_schemes] s ON s._id = st._id_scheme
JOIN [dbo].[_types] t ON t._id = st._id_type
WHERE v._id_object = @object_id
ORDER BY st._order, v._array_index;
This is a diagnostic query — paste it into psql/DataGrip/SSMS, plug in any object ID, and you see exactly what's stored: which fields, which slot (scalar / collection marker / array element), which type. The marker rows for arrays show up too, which is exactly what you want when you're hunting down a null vs [] regression.
Query 2 — same result via _scheme_metadata_cache
-- PostgreSQL (via _scheme_metadata_cache)
SELECT
c._scheme_id AS scheme_id,
c._name AS field_name,
c.type_name AS type_name,
c.db_type AS db_type,
c.collection_type_name AS collection_type,
CASE c.db_type
WHEN 'String' THEN v._String
WHEN 'Long' THEN v._Long::text
WHEN 'Guid' THEN v._Guid::text
WHEN 'Double' THEN v._Double::text
WHEN 'Boolean' THEN v._Boolean::text
WHEN 'DateTimeOffset' THEN v._DateTimeOffset::text
WHEN 'Numeric' THEN v._Numeric::text
WHEN 'ListItem' THEN v._ListItem::text
WHEN 'Object' THEN v._Object::text
WHEN 'ByteArray' THEN encode(v._ByteArray, 'base64')
ELSE NULL
END AS value_text,
v._array_index,
v._array_parent_id,
c._order
FROM _values v
JOIN _scheme_metadata_cache c ON c._structure_id = v._id_structure
WHERE v._id_object = :object_id
ORDER BY c._order NULLS LAST, v._array_index NULLS FIRST;
-- MS SQL Server (via _scheme_metadata_cache)
SELECT
c.[_scheme_id] AS scheme_id,
c.[_name] AS field_name,
c.[type_name] AS type_name,
c.[db_type] AS db_type,
c.[collection_type_name] AS collection_type,
CASE c.[db_type]
WHEN 'String' THEN v._String
WHEN 'Long' THEN CAST(v._Long AS nvarchar(MAX))
WHEN 'Guid' THEN CAST(v._Guid AS nvarchar(MAX))
WHEN 'Double' THEN CAST(v._Double AS nvarchar(MAX))
WHEN 'Boolean' THEN CASE v._Boolean WHEN 1 THEN 'true' WHEN 0 THEN 'false' END
WHEN 'DateTimeOffset' THEN CAST(v._DateTimeOffset AS nvarchar(MAX))
WHEN 'Numeric' THEN CAST(v._Numeric AS nvarchar(MAX))
WHEN 'ListItem' THEN CAST(v._ListItem AS nvarchar(MAX))
WHEN 'Object' THEN CAST(v._Object AS nvarchar(MAX))
WHEN 'ByteArray' THEN N'<binary>'
ELSE NULL
END AS value_text,
v._array_index,
v._array_parent_id,
c.[_order]
FROM [dbo].[_values] v
JOIN [dbo].[_scheme_metadata_cache] c ON c.[_structure_id] = v.[_id_structure]
WHERE v.[_id_object] = @object_id
ORDER BY c.[_order], v.[_array_index];
Why the second one is the production query:
- Two heavy JOINs (
_structures,_types) gone. The cache row carries everything those joins would have produced. - The cache already has a B-tree on
_structure_idand a stable_orderfor ordering — no extra sorts. - The cache is refreshed only when the scheme changes (
_structure_hashflips), not on every read. Steady-state queries pay zero cost for it. -
build_hierarchical_properties_optimized()goes one step further: it slurps all_valuesrows for one object into a_values[]array in a single SELECT, then walks the property tree purely in memory usingunnest(). Recursion into nested classes and array elements never touches the table again. For deeply nested object graphs this is a big deal — you can materialize a 40-row, 8-level-deep object with two SELECTs total.
How this connects to the C# API
For context — what those tables look like from a SaveAsync/LoadAsync perspective. The full API tour is in the intro post; here's just the mapping back to the tables we just looked at:
[RedbScheme("Employee")]
public class EmployeeProps
{
public string FirstName { get; set; } = "";
public string LastName { get; set; } = "";
public int Age { get; set; }
public decimal Salary { get; set; }
public string[]? Skills { get; set; }
}
// InitializeAsync scans the assembly →
// - inserts/updates rows in _schemes + _structures for each [RedbScheme]
// - refreshes _scheme_metadata_cache on changed schemes
await redb.InitializeAsync(typeof(EmployeeProps).Assembly);
// SaveAsync →
// - one row into _objects
// - one row per scalar into _values (using the typed column for the C# type)
// - for Skills: one marker row + one row per element, linked via _array_parent_id
var employee = new RedbObject<EmployeeProps>
{
name = "Alice Johnson",
Props = new EmployeeProps
{
FirstName = "Alice",
LastName = "Johnson",
Age = 28,
Salary = 120_000m,
Skills = ["C#", "PostgreSQL", "Redis"] // → 1 marker + 3 element rows
}
};
long id = await redb.SaveAsync(employee);
// LoadAsync →
// - SELECT _objects WHERE _id = id
// - SELECT * FROM _values WHERE _id_object = id (one shot, into a _values[])
// - build_hierarchical_properties_optimized() materializes the C# graph
var loaded = await redb.LoadAsync<EmployeeProps>(id);
SaveAsync reads _scheme_metadata_cache to know which column to write each value to, mints IDs from the global_identity sequence, and writes _objects + _values. LoadAsync reads _objects first, then one SELECT pulls every _values row for that object into a memory array, and the recursive materializer never goes back to the database for that load.
A few design choices that aren't obvious
Negative constants for system IDs. User-generated keys come from a sequence starting at 1_000_000. System types, schemes, users live near long.MinValue. The two ranges can never collide. This means _types._id = -10 for the @@__deleted scheme isn't a special-case in any query — it's just a perfectly normal FK that happens to be negative.
_structure_hash on _schemes. Without it, every SyncSchemeAsync<T> call would have to re-read the schema structure and compare row-by-row. With it, comparing one UUID tells you whether anything changed. The cache-invalidation trigger fires only on real changes, so steady-state operation pays nothing.
Marker rows. The trickiest design choice in _values. There's no separate "collections" table — instead, a row with _array_index = NULL AND _array_parent_id = NULL and a collection-typed _id_structure is the marker, and child rows fan out from it via _array_parent_id. This is what makes null vs [] distinguishable, lets dictionaries with string keys work without a side table, and keeps nested-class hierarchies in the same physical structure as flat fields.
_Numeric NUMERIC(38, 18). Deliberate. double for money quietly loses pennies; nobody wants a 0.0000000001-off invoice total in production. The 38/18 precision/scale is more than enough for currency, percentage, weight, even small molar quantities. The cost is storage size (16 bytes vs 8), which on the kind of property volume REDB sees is noise.
Postgres vs MSSQL — the cascade-delete story. On Postgres, _values has ON DELETE CASCADE on its FK to _structures. On MSSQL the same constraint would create multiple cascade paths and SQL Server refuses to compile that. The workaround: an INSTEAD OF DELETE trigger on _structures that manually cascades into _values. Same observable behavior, different machinery. There are a handful of places in the codebase where the dialect abstraction exists specifically to paper over this kind of thing.
Series links
- Intro post — An EF Core alternative for .NET apps with complex object graphs (published)
-
Part 1 (this post) — the 13 tables, RTTI vs EAV,
_values, collection storage,_scheme_metadata_cache, two diagnostic queries -
Part 2 — code-first schemes:
SyncSchemeAsync<T>,_structure_hash, automatic onboarding -
Part 3 — CRUD internals:
SaveAsync/LoadAsync, TreeDiff change tracking, COPY-protocol bulk insert -
Part 4 — LINQ → SQL: pivot CTEs, dialect splits, the
OfficeLocations["HQ"].Citywalkthrough -
Part 5 — trees:
LoadTreeAsync,WhereHasAncestor, the closure-table vs recursive-CTE story -
Part 6 — window functions:
Win.RowNumber(),Win.Rank(),PartitionBy/OrderByover REDB objects
Where to look
- github.com/redbase-app — all repos in the ecosystem
- github.com/redbase-app/redb — the redb.Core repo
- Postgres schema (redbPostgre.sql) — all 13 tables + 40+ indexes + triggers + stored functions in one file
- MSSQL schema (redbMSSQL.sql) — same shape, dialect-adjusted
- redbase.app — docs and worked examples (EN)
Questions/critique very welcome in the comments — especially if you've built something similar and have war stories about indexing strategies on the values table, that's exactly the discussion I want to have.

Top comments (0)