DEV Community

rinat kozin
rinat kozin

Posted on

REDB inside, part 1.1 — why the same 13 tables stay fast no matter how many classes you throw at them

REDB Index

A couple of weeks ago I published REDB inside, part 1 — the 13 tables the whole engine runs on. It walked through _objects, _values, _structures, why this isn't classical EAV, and what the _scheme_metadata_cache does. If you haven't read it, start there — the rest of this post assumes you know the layout.

This is part 1.1, not part 2. Same physical-storage conversation, different angle. Part 2 is going to be about code-first schemes (SyncSchemeAsync<T>), and the deep C# dive — LINQ translator, CRUD internals, trees — that's parts 3 through 5 of the series. This post stays in the database layer.

The reason for a 1.1 instead of jumping to 2 is simple: every time I publish part 1, the same question comes back in the comments and in DMs:

"Fine, typed columns beat string EAV. But you've still got 9.7 million rows in _values. Any real query — WHERE Salary > 80000, WHERE OrderDate >= '2026-06-01' — has to scan that table. Without one index per field per schema you're living in Seq Scan."

Fair question. Let's actually answer it, with the DDL, the prod numbers, and a side note about how to remove indexes once your app stabilizes.

Numbers in this post come from TSUM — a logistics system handling truck movements and orders through distribution centers. Real production, no benchmark setup.

TL;DR: In a classical EF schema, indexes multiply with tables. In REDB, the index set is designed once in DDL and serves any business schema you put on top — adding the 33rd class to your domain doesn't add a single line to redbPostgre.sql. The active surface for any business class is just 2 tables (_objects + _values), plus 2 more for lookups and 3 for RTTI metadata; the rest is infrastructure that doesn't grow with your class count. On TSUM prod this gives 999 orders / 991 ms on 2 cores with default Postgres settings and zero framework-level cache: one SELECT pulls all 999 existing routes (139 ms), bulk-save writes 32 changed objects through the COPY protocol (154 ms), and 967 unchanged rows never reach the database thanks to an app-level ComputeHash() short-circuit.


The "9.7 million rows" objection, defused in two sentences

Before we open any DDL, here's the structural answer to the objection above.

Classical EAV has rows like (object_id, attribute_name TEXT, value TEXT). An index on value is useless there because values are heterogeneous, casts happen at runtime, and selectivity on attribute_name is low. REDB is not that.

Two things change the picture:

  1. Values are already split into typed columns. _Long, _String, _Numeric, _DateTimeOffset, _Boolean, _Guid, _ListItem, _Object. No casts at query time. An index on _Long is a normal bigint btree, not a polymorphic guess.
  2. Every row in _values carries _id_structure — a bigint pointing at the schema field it belongs to. With 401 fields across the TSUM domain, WHERE _id_structure = X discards roughly 99.75% of the table before the optimizer even looks at the value column.

So a facet query never scans 9.7M rows. It scans ~_values_count / structure_count rows, then does an Index Only Scan on a typed column. We'll see this in the EXPLAIN-equivalent reasoning below.

That's the architecture. Now the question that actually matters: how exactly are the indexes laid out so that this runs in milliseconds?


The architectural payoff: indexes don't grow with your domain

In a classical EF layout, the table count grows linearly with business entities. Each table averages 3–5 indexes — primary key, FK indexes, business filters. So the total number of indexes in the database grows multiplicatively with project size.

Take TSUM. Here's what's in tsum.Domain/Entities:

  • TransportationRoute (the actual route)
  • TransportationPoint (route stops)
  • Driver, Vehicle, ShippingPoint, YardPlace
  • SliceSettings, SliceSnapshot, TransportSnapshot, TransportNorm
  • SpecialRcSettings, TonnageGroupSettings, GarageState
  • TsumAdUserRef, UserFilterPreference

Fifteen "root" entities. But that's just the start. Inside TransportationRoute alone you'll find RedbListItem references to: Drivers, Vehicles, CarMarks, ShippingPoints, BusinessTypes, YardPlaces (×2 — PlaceTo / PlaceFrom), LoadingZones, TransportStatuses, DeliveryStatuses, TripRisks. Every other entity has its own set: chassis types, readiness statuses, delay reasons, route classifications.

In a classical schema, every one of those lookups is its own table. Plus audit/history tables (TransportSnapshot, SliceSnapshot are clearly snapshot entities). Plus M2M tables for collection links. A realistic estimate for TSUM in a classical layout is 60–80 tables.

At 60–80 tables and 3–4 indexes per table, you're looking at 200–320 indexes to design, maintain, reindex, and drop when stale. Every release in an EF project ships an AddColumn + CreateIndex migration. By year two, somebody walks the schema with a flashlight removing duplicate indexes that piled up because three different developers added similar-but-not-identical indexes for slightly different queries.

In REDB the picture is different. The same 32 schemes / 401 properties in TSUM physically have 2 primary keys: pk__objects and pk__values. The full index list is hardcoded in redbPostgre.sql and runs around 50 indexes — for the entire system. Not per class. For the system.

Now here's the clarification I owe readers from part 1. I called REDB "13 tables" in that post, and a fair number of people walked away with the impression that all 13 are involved every time you query a class. They aren't. The actual breakdown is:

  • Data storage — 2 tables: _objects (the header) and _values (the field values). Every Query<T>(), every SaveAsync, every LINQ filter lands here.
  • Lookups — 2 tables: _lists and _list_items. These power RedbListItem fields (Driver, Vehicle, ShippingPoint, etc.). Lookups are shared across classes and don't grow when you add a new business entity.
  • RTTI / schema metadata — 3 tables: _types (primitives), _schemes (classes), _structures (class properties). The engine reads these through _scheme_metadata_cache; business queries never touch them.
  • Everything else — infrastructure: _users, _roles, _permissions, _links, _dependencies, _functions, _scheme_metadata_cache, soft-delete via @@__deleted. These don't grow with class count and aren't on the hot path.

So whatever class a developer writes in tsum.Domain, its data physically lives in 2 tables; reference fields hit 2 more for lookups; schema metadata sits in 3 RTTI tables. The six infrastructure tables look identical for a 5-class project and a 500-class one. That is what "indexes invariant of class count" really means — not "all 13 tables on every query," but "the active footprint for business data is two tables, and it doesn't change."

When a developer adds the 33rd class to tsum.Domain — say, WarehouseSlot — the database gets:

  • one row in _schemes,
  • N rows in _structures (one per property),
  • zero index migrations. None.

Because the entire query surface for the new class is served by the same indexes on _objects and _values that served the previous 32. WHERE _id_scheme = ID(WarehouseSlot) uses IX__objects__schemes. Sort by creation date — IX__objects__scheme_date_create. Find an object by some property value — IX__values__structure_object_lookup. Parent–child tree — IX__objects__scheme_parent.

This is the architectural payoff the post is built around. The index plan is fixed at engine design time, not smeared across business-app releases. The skeptic will counter: "so generic indexes on _values must be slower than per-field indexes in an EF schema." They aren't — and we'll see why below.

The same applies to MSSQL. Worth saying out loud: this post isn't Postgres-specific. redb.MSSql/sql/redbMSSQL.sql carries a mirror set of indexes with minor dialect differences (covered in a later section). The architectural trick — indexes on the physical schema, not on business tables — ports between engines without giving anything up.


A tour of redbPostgre.sql, grouped by what it's for

If you open the DDL and read top to bottom you'll see ~50 indexes interleaved. They're easier to understand grouped by use case rather than by table. Let's do it that way.

1. Storage uniqueness in _values

Three partial-unique indexes, one per shape:

-- Scalar fields: one (object, field) → one value
CREATE UNIQUE INDEX UIX__values__structure_object
ON _values (_id_structure, _id_object)
WHERE _array_index IS NULL AND _array_parent_id IS NULL;

-- The marker row for a nested class/array inside a parent
CREATE UNIQUE INDEX UIX__values__structure_object_parent
ON _values (_id_structure, _id_object, _array_parent_id)
WHERE _array_index IS NULL AND _array_parent_id IS NOT NULL;

-- Collection elements indexed by position/key
CREATE UNIQUE INDEX UIX__values__structure_object_array_index
ON _values (_id_structure, _id_object, _array_parent_id, _array_index)
WHERE _array_index IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

Why three instead of one. Each shape has a different uniqueness key. Scalars are unique by (structure, object). Nested-structure markers add parent. Array elements add index. A single covering unique index can't express this (NULLs would break the semantics), but three partials map cleanly onto the physical model.

Bonus: each partial only stores its slice of rows, so the btree is smaller than one giant unique index would be. On 9.7M rows that's tens of percent saved in page cache.

2. Facet search — the hottest path

A facet query in REDB is "find objects whose field X compares to Y." In SQL terms:

SELECT v._id_object FROM _values v
WHERE v._id_structure = @structure_id
  AND v._Long > 80000;
Enter fullscreen mode Exit fullscreen mode

That pattern is served by a single composite index:

CREATE INDEX IX__values__structure_object_lookup
ON _values (_id_structure, _id_object,
            _Long, _DateTimeOffset, _Boolean, _Double, _Guid, _Numeric, _ListItem, _Object);
Enter fullscreen mode Exit fullscreen mode

Every typed column except _String is in the key. Index Only Scan: Postgres finds the matching rows and grabs the value directly from the index — no heap fetch.

In parallel, two covering indexes with INCLUDE:

CREATE INDEX IX__values__object_structure_lookup
ON _values (_id_object, _id_structure, _array_index)
INCLUDE (_Long, _Double, _DateTimeOffset, _Boolean, _Guid, _Numeric, _ListItem, _Object);

CREATE INDEX IX__values__object_array_null
ON _values (_id_object, _id_structure)
INCLUDE (_Long, _Double, _DateTimeOffset, _Boolean, _Guid, _Numeric, _ListItem, _Object)
WHERE _array_index IS NULL;
Enter fullscreen mode Exit fullscreen mode

These cover the opposite direction: "load all values of an object in one query," for LoadAsync<T>. Same trick — Index Only Scan, zero heap reads.

Why _String is excluded. Postgres btree caps at ~2700 bytes per index row. If a JWT payload ends up in _String at 3 KB, the insert blows up. So strings get their own partial index with a length guard:

CREATE INDEX IX__values__String_not_null
ON _values (_id_structure, _id_object, _String)
WHERE _String IS NOT NULL AND length(_String) < 2000;
Enter fullscreen mode Exit fullscreen mode

Long strings (CMS bodies, base64 payloads) aren't covered — and they aren't filtered with = either. For substring search there's a separate GIN:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX IX__values__String_pattern
ON _values USING gin (_String gin_trgm_ops)
WHERE _String IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

This serves LIKE, ILIKE, $contains, $startsWith, $endsWith, regex — all through pg_trgm. The MSSQL counterpart is full-text search or a persisted computed column with a filtered index; the choice is in the dialect's DDL.

3. Collections (arrays and dictionaries)

Part 1 covered relational storage of collections — a marker row plus child rows linked by _array_parent_id and _array_index. The supporting indexes:

-- traverse collection elements by index
CREATE INDEX IX__values__array_parent_index
ON _values (_array_parent_id, _array_index);

-- look up by dictionary key (string keys)
CREATE INDEX IX__values__array_key
ON _values (_id_structure, _array_index)
WHERE _array_index IS NOT NULL;

-- partial: only rows that belong to collections
CREATE INDEX IX__values__parent_structure
ON _values (_array_parent_id, _id_structure)
WHERE _array_parent_id IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

The third one feeds the pivot CTE that reads nested Class/Dictionary fields like AddressBook["home"].City.

4. Object trees (_objects._id_parent)

Trees are the backbone of REDB — sections, categories, org charts, all linked through _id_parent:

-- direct children: scheme + parent → id
CREATE INDEX IX__objects__scheme_parent
ON _objects (_id_scheme, _id_parent, _id);

-- covering index for subtree traversal (metadata in INCLUDE)
CREATE INDEX IX__objects__parent_id_descendant_lookup
ON _objects (_id_parent, _id_scheme)
INCLUDE (_id, _id_owner, _date_create, _date_modify)
WHERE _id_parent IS NOT NULL;

-- partial: roots (objects without a parent)
CREATE INDEX IX__objects__root_objects
ON _objects (_id_scheme, _id)
WHERE _id_parent IS NULL;

-- reverse path: child → parent → scheme
CREATE INDEX IX__objects__id_parent_scheme
ON _objects (_id, _id_parent, _id_scheme)
WHERE _id_parent IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

WhereHasAncestor, WhereHasDescendant, LoadTreeAsync all run through this quartet.

5. Sorting and metadata

-- main "newest first" feed index
CREATE INDEX IX__objects__scheme_date_create
ON _objects (_id_scheme, _date_create DESC, _id);

-- sort by name within a scheme
CREATE INDEX IX__objects__scheme_name
ON _objects (_id_scheme, _name, _id);

-- global indexes for cross-scheme search
CREATE INDEX IX__objects__name ON _objects (_name);
CREATE INDEX IX__objects__hash ON _objects (_hash);
Enter fullscreen mode Exit fullscreen mode

And the indexes for RedbPrimitive<T> (when an object is a primitive — counter, token, single value):

CREATE INDEX IX__objects__value_long ON _objects (_value_long) WHERE _value_long IS NOT NULL;
CREATE INDEX IX__objects__value_string ON _objects (_value_string) WHERE _value_string IS NOT NULL;
CREATE INDEX IX__objects__value_guid ON _objects (_value_guid) WHERE _value_guid IS NOT NULL;
CREATE INDEX IX__objects__value_datetime ON _objects (_value_datetime) WHERE _value_datetime IS NOT NULL;
CREATE INDEX IX__objects__value_numeric ON _objects (_value_numeric) WHERE _value_numeric IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

All partial. Each one only indexes rows where its value column is set.

6. Schemes and structures (_structures, _schemes)

These are the cold tables — orders of magnitude smaller than _values, mostly read while building the metadata cache. They still need to be sub-millisecond:

-- covering index for ORDER BY field name
CREATE INDEX IX__structures__name
ON _structures (_name)
INCLUDE (_id, _id_type, _collection_type, _id_scheme);

-- covering lookup by structure ID
CREATE INDEX IX__structures__id_lookup
ON _structures (_id)
INCLUDE (_id_type, _name, _collection_type, _id_scheme);

-- partial: split collection vs non-collection
CREATE INDEX IX__structures__not_collection
ON _structures (_id, _name, _id_scheme)
WHERE _collection_type IS NULL;

CREATE INDEX IX__structures__collection
ON _structures (_id, _id_scheme, _collection_type)
WHERE _collection_type IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

The DDL comments record the reason: these indexes kill Seq Scan on InitPlan and EXISTS subqueries inside build_hierarchical_properties_optimized, dropping query cost from 6.10 to 4.29 (-30%).

What got removed

redbPostgre.sql has a commented-out block worth quoting:

-- ============================================
-- REMOVED REDUNDANT INDEXES (migration_drop_redundant_indexes.sql)
-- Reason: Covered by composite index IX__values__structure_object_lookup
-- Facet search ALWAYS filters by (_id_structure, _id_object) BEFORE value
-- ============================================
-- CREATE INDEX "IX__values__String" ON _values (_String) ...;
-- CREATE INDEX "IX__values__Long" ON _values (_Long) ...;
-- CREATE INDEX "IX__values__Guid" ON _values (_Guid) ...;
-- ...
Enter fullscreen mode Exit fullscreen mode

That's a cleanup history committed straight into the schema file. Early REDB shipped one index per typed column. Real workloads showed they never lit up: facet queries always start from _id_structure, never from a raw value column. The composite (_id_structure, _id_object, _Long, ...) swallowed them whole. The single-column ones got pulled.


Why this is fast on prod

Architecture is half the story. The other half is the actual numbers. Here's a typical TSUM order-processing tick from production logs:

[TSUM] orders=999 routes(+2 ~30 =967) drivers(+0 ~0) vehicles(+0 ~0)
       sync=482 query=139 save=154 total=991ms
Enter fullscreen mode Exit fullscreen mode

What it means:

[TSUM] orders=999 routes(+2 ~30 =967) drivers(+0 ~0) vehicles(+0 ~0) sync=482 query=139 save=154 total=991ms
Enter fullscreen mode Exit fullscreen mode
  • 999 orders arrived in one XML batch from SAP S/4 (the stored procedure usp_TsUM_MonitoringReport_xml).
  • routes(+2 ~30 =967): of the 999 incoming, 2 were new routes, 30 changed, 967 weren't touched at all (content unchanged).
  • drivers / vehicles (+0 ~0): the driver and vehicle dictionaries had no changes in this batch.
  • sync = 482 ms — this is lookup-dictionary sync (DictionarySyncService.SyncFromOrdersAsync): walk all 999 orders, reconcile drivers / vehicles / shipping points / business types / delivery statuses against REDB lists, upsert any new list_items if needed. This is not schema sync.
  • query = 139 msone SELECT to REDB: WhereRedb(o => codes.Contains(o.ValueString)) — pulls all 999 existing routes by their codes in a single query.
  • save = 154 msSaveAsync for 32 objects (2 created + 30 updated). The 967 unchanged ones never get here.
  • Environment: 2 cores, default shared_buffers, no pg_prewarm, no framework-level cache.

Important caveat: schema sync (SyncSchemeAsync<T> — the bit that compares _structure_hash and refreshes _scheme_metadata_cache) is a separate step that runs once at process startup, before any orders flow. It's not in this log line at all. The "sync" number here is application-level dictionary reconciliation, not schema reconciliation.

Let's break down what's physically happening behind each number.

Query: 139 ms for 999 objects in one call

The most striking number in the log. This isn't 999 separate LoadAsync calls — it's one facet query:

var existing = await redb.Query<TransportationRoute>()
    .WhereRedb(o => codes.Contains(o.ValueString!))
    .ToListAsync();
Enter fullscreen mode Exit fullscreen mode

It translates roughly to:

-- find all routes whose _value_string is in the list of 999 codes
SELECT o.* FROM _objects o
WHERE o._id_scheme = @routeSchemeId
  AND o._value_string = ANY(@codes);
Enter fullscreen mode Exit fullscreen mode

After that, one batched query for _values of all matched objects via IX__values__object_array_null (covering INCLUDE) — Index Only Scan, no heap access. Output: 999 fully hydrated RedbObject<TransportationRoute> instances with all ~50 properties each, including ListItem references to dictionaries.

139 ms / 999 objects = 0.14 ms per object on paper, but that math is misleading — the real cost is two SQL round trips (one for _objects, one for _values) plus C# hydration.

No N+1. No Includes. ListItem references resolve through RefDataCache in memory, so Vehicle, Driver, ShippingPoint get filled without extra SQL.

Save: 154 ms for 32 changed objects

Two things matter here, and both are critical.

First — application-level change tracking. Before SaveAsync, every existing route runs through ComputeHash():

var hashBefore = obj.ComputeHash();
EnrichRouteFromOrder(obj.Props, routeProps);
if (obj.ComputeHash() != hashBefore)
{
    toSave.Add(obj);  // data really changed — queue for save
    updatedCount++;
}
else
{
    skippedCount++;   // byte-identical — don't touch the database
}
Enter fullscreen mode Exit fullscreen mode

Of the 999 incoming orders, 967 are skipped right here. They never hit _objects/_values, never generate a single DML statement. This isn't a REDB-engine optimization — it's an application-code pattern, but REDB supports it cheaply via ComputeHash() over Props.

Second — bulk save in a single transaction. The remaining 32 objects go in one call:

await redb.SaveAsync(toSave);   // toSave: List<IRedbObject>, 32 items
Enter fullscreen mode Exit fullscreen mode

logs select and materialize from 9m properties

Here's what happens inside. _objects gets normalized in a batch first — for all 32 objects, one INSERT/UPDATE pass (new rows go via COPY/UNNEST-INSERT, modified rows via batched UPDATE). Then _values: for each modified object the value rows are recreated, and again not row-by-row but in one bulk call through Postgres COPY protocol. All in a single transaction_objects and _values commit atomically.

COPY matters. It's a streaming binary protocol that ingests N rows without a per-row round trip — the driver writes a continuous byte stream into the socket. For 32 objects × ~50 properties = ~1600 value rows, that's one COPY operation instead of 1600 INSERTs. Btree indexes get their batch and update amortized.

So 154 ms for 32 objects (~4.8 ms per object on full overwrite) isn't "the engine is very fast" — it's "the engine doesn't do unnecessary work." Bulk where it can bulk; early skip where data didn't change.

Sync: 482 ms — what it really is

482 ms is the heaviest line in the log, and it's also the most commonly misread one. It's not schema sync — schema sync already ran at Worker startup, before any orders showed up. The 482 ms is DictionarySyncService.SyncFromOrdersAsync:

  1. Five parallel dictionary loads from REDB: Drivers, Vehicles, ShippingPoints, BusinessTypes, DeliveryStatuses — via redb.ListProvider.GetListByNameWithItemsAsync(...). Pulls _lists plus all their _list_items.
  2. Walk over 999 orders: each DriverId, CarId, ShippingPoint, BusinessTypes is checked against existing list items. If new, an upsert is queued.
  3. If dictionaries changed — refresh RefDataCache (a static in-memory cache so the Order → RouteProps mapping doesn't re-query the DB).

This batch had zero deltas on drivers and vehicles (+0 ~0), but 482 ms still went into proving they didn't change — that's 999 rows being checked against thousands of existing list items across multiple dictionaries.

This is application-level sync. It belongs to TSUM's logistics process, not REDB's engine. But it makes a useful point: even when "discovering nothing needs to change" costs more than the actual save (482 ms sync vs 154 ms save), the whole tick still fits in under a second for a thousand orders.

Schema sync is a separate story. Cold start of the Worker — tens to hundreds of milliseconds, one-off (32 schemes, hash comparison). Hot restart with no model changes — single-digit milliseconds. It happens before the first [TSUM] log line and isn't counted in the per-tick budget.

Coming back to the 9.7M-row objection

The original objection was "how can this be fast at that table size." The answer: the engine never scans _values whole. Any query starts from one of these four anchors:

  • PK (LoadAsync by ID) — O(log n) on the btree.
  • _id_scheme + _value_string for facets through a RedbPrimitive column, like the TSUM case above — Index Only Scan on IX__objects__value_string.
  • _id_structure (regular facet search) — discards ~99.75% of rows immediately. With 401 properties, the average _id_structure covers ~24K of the 9.7M total.
  • _id_object (LoadAsync — all values for an object) — discards even more aggressively, since an object has tens of values, not thousands.

Table size in rows only affects the index size in pages. Postgres btree fan-out is in the hundreds, so 9.7M rows is a 4–5 level deep tree. Search is logarithmic, not linear.

logs select and materialize from 9m properties


"You don't always need everything you've got": pruning indexes after stabilization

redbPostgre.sql carries the full spectrum of indexes for every REDB scenario: facet search, trees, collections, GIN full-text, partial indexes for NOT NULL, indexes for recursive permission CTEs. That's right for a library: you don't know up front which workload profile a given app will have.

But in your specific app, some of those indexes may never light up. That's fine. TSUM, for instance, never calls WhereHasDescendant because its domain hierarchies are flat — so IX__objects__parent_id_descendant_lookup sits in pages but no scan ever goes through it.

After the app's been in production for two to four weeks under stable load, pull the index stats and trim aggressively:

SELECT schemaname, relname, indexrelname,
       idx_scan,
       pg_size_pretty(pg_relation_size(indexrelid)) AS sz
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND pg_relation_size(indexrelid) > 10*1024*1024  -- only indexes > 10 MB
ORDER BY pg_relation_size(indexrelid) DESC;
Enter fullscreen mode Exit fullscreen mode

idx_scan

On TSUM, this query surfaces about 4423 MB of indexes with idx_scan = 0. That doesn't mean those indexes are bad. It means nobody's hitting those query paths in this workload profile.

Working with the result:

  1. Don't drop everything immediately. First, walk the list: which indexes serve scenarios that "haven't fired yet" (quarterly reports, year-end reconciliations, big imports)? Leave those alone — re-creation later costs more than the disk you'd save now.

  2. The rest are candidates for DROP INDEX CONCURRENTLY. It's safe — no table lock — and you can always restore the index with the same CREATE INDEX CONCURRENTLY, since the DDL knows what shape it should have.

  3. After dropping, watch for two to four more weeks. If no unexpectedly slow queries show up in logs, the call was right. If something regresses, restore.

This is a standard PostgreSQL practice, not a REDB-specific trick. But it works cleaner in REDB because the index set is designed once and frozen in DDL. Pruning is one targeted decision in one place — not a hunt through three years of accumulated migrations across business tables.

The other direction works too. If you spot a single facet query that's hot enough to warrant an extra push, nothing stops you from adding a local index just for it:

-- e.g., "find routes for a specific driver in the last 7 days"
CREATE INDEX IX_my_app__route_by_driver
ON _values (_id_structure, _ListItem, _id_object)
WHERE _id_structure = <driver_field_id>;
Enter fullscreen mode Exit fullscreen mode

That kind of index is application-side optimization. It lives in your app's migration, not in the REDB core. And the same pg_stat_user_indexes audit applies to it after a few weeks.


What's deliberately not in the DDL, and why

A second-level skeptic asks the next question: "Why no hash index on _String? Why no BRIN on _date_create? Why no expression index on LOWER(_String)?"

Hash indexes. Historically weaker than btree on most metrics in Postgres, and they don't support partial conditions. Almost every REDB index is partial — so hash is structurally a non-fit, not a coin-flip choice.

BRIN on _date_create. A reasonable candidate on big archival tables, but redundant for the current REDB profile. IX__objects__scheme_date_create (DESC, _id) already serves the exact scenario BRIN would compete with. BRIN would only win in a cold-storage case (>10M rows on a single scheme) — that's a separate "archival classes" phase, and the current core DDL doesn't address it yet.

Expression indexes (LOWER(_String), _DateTimeOffset::date, etc.). Deliberately omitted. This is an application choice: case-insensitive search on a specific text field, aggregation by day/month — these are options that belong in the app's migration, not the engine. GIN with pg_trgm already provides case-insensitive substring search via ILIKE, and that covers most needs.

Bloom, BRIN-multi, GIN on jsonb. All candidates for narrow scenarios. If you have one very specific query that's provably hot and provably not covered by the standard set — you add a local index for it in a separate migration. Without touching redbPostgre.sql.

The principle is general: the core DDL covers 95% of scenarios, the remaining 5% lives in the application layer. That's the opposite of the EF approach, where the DDL belongs to the application from day one and every index is part of its migration history.


MSSQL: nearly the same picture

While we're in DDL territory, worth saying clearly: this post isn't Postgres-specific. redb.MSSql/sql/redbMSSQL.sql carries a mirror set of indexes, and the architectural trick — "indexes don't grow with class count" — works identically on both engines.

Ports cleanly:

  • All three partial-unique indexes on _values (in MSSQL — filtered indexes with WHERE, different syntax, same semantics).
  • Composite indexes for facet search (IX__values__structure_object_lookup exists in both DDLs with an identical key shape).
  • INCLUDE covering indexes — natively supported in MSSQL since 2005, port one-to-one.
  • Object-tree indexes (IX__objects__scheme_parent, IX__objects__parent_id_descendant_lookup) — unchanged.
  • Partial / filtered indexes for RedbPrimitive<T> — unchanged.

Differs:

  • GIN + pg_trgm for substring search on _String. The MSSQL counterpart is full-text search or a filtered index over a computed column. redbMSSQL.sql takes a different path (LIKE over a regular btree with a length-bound partial).
  • NULLS NOT DISTINCT in Postgres 15+ — MSSQL spells it differently (UNIQUE + filtered WHERE NOT NULL).
  • ON DELETE CASCADE from _structures to _values works directly in Postgres but not in MSSQL because of multiple cascade paths — replaced with an INSTEAD OF DELETE trigger. The DELETE plan is slightly different in MSSQL but the indexes are the same.
  • Per-type filtered indexes are spelled out more explicitly in MSSQL (IX__values__Long_filter, IX__values__Guid_filter, ...) — in the Postgres DDL they're commented out as "covered by composite." That's a strategy difference: the MSSQL optimizer is a bit more partial to narrow filtered indexes per type.

But the main point — architectural invariance is preserved. Same 2+2+3+infrastructure layout, same facet schema, same fact that the index set doesn't grow when you add a class. To a developer writing C# models, the difference between Postgres and MSSQL is invisible.


TSUM as the proof case

Bringing the numbers from the top of the post into one place.

What REDB serves in TSUM:

  • 32 classes in tsum.Domain (from TransportationRoute to UserFilterPreference),
  • 401 properties across those classes,
  • 227,896 objects at the time these metrics were captured,
  • 9,773,174 rows in _values.

This whole schema lands not in 13 tables per entity, but in 2 data tables (_objects + _values) + 2 lookup tables (_lists + _list_items) + 3 RTTI tables (_types + _schemes + _structures). The remaining six are infrastructure — permissions, users, roles, links, dependencies, functions, soft-delete, metadata cache — invariant of business class count. The full index set is the same redbPostgre.sql you'd ship for a five-class project. In a classical EF layout, those 32 classes with their lookups and snapshots would expand to roughly 60–80 tables and 200+ indexes, with a migration history accumulating across the project's lifetime.

Production metrics for a typical order-processing tick:

  • 999 orders in one XML batch from SAP, processed in 991 ms total:
    • sync = 482 ms — reconciling lookup dictionaries (Drivers, Vehicles, ShippingPoints, BusinessTypes, DeliveryStatuses) against REDB lists,
    • query = 139 msone SELECT loading all 999 existing routes by their codes,
    • save = 154 msSaveAsync for 32 changed objects (2 created + 30 updated). The other 967 are short-circuited by ComputeHash() before they reach the database.
  • Infrastructure — 2 cores, stock PostgreSQL settings (no shared_buffers tuning, no pg_prewarm, no bumped work_mem).
  • No REDB-framework-level cache — every Query<T>() round-trips to the database.

This isn't a hand-picked benchmark — it's a real production tick where a developer writes ordinary C# models with a [RedbScheme] attribute and the logistics process just runs. Bulk on the read (one SELECT for 999 objects), bulk on the write (_objects and _values together in one transaction over the COPY protocol), early skip where data didn't change.

The technical takeaway: REDB hits these numbers not through magic, but because the engine doesn't do unnecessary work. The index plan is baked into the storage schema; bulk runs where bulk applies; application-level change tracking saves the most expensive thing — DML for changes that aren't there.


Wrap-up and what's coming next

Short summary. REDB ships one index set in DDL — about 50 indexes for the entire system. That set serves any business schema: 32 classes, 100 classes, 500 classes — same indexes. That's the inverse of the EF approach, where indexes multiply with tables and slowly become technical debt that needs periodic pruning.

Operational pattern — once your application stabilizes, run pg_stat_user_indexes, find the idx_scan = 0 candidates, and prune carefully (with DROP INDEX CONCURRENTLY for safe rollback). Add your own local indexes for hot application-specific queries — in a separate migration, leaving the engine DDL alone.

MSSQL is nearly the same picture. Minor differences in text indexing and per-type filtered indexes, but architectural invariance carries over.

Coming up in the series:

  • Part 2 — Code-first schemes: how SyncSchemeAsync<T> turns a C# class into rows in _schemes + _structures, what _structure_hash is, how automatic onboarding works when you add a new property.
  • Part 3 — CRUD internals: SaveAsync and LoadAsync from inside, change tracking via TreeDiff, COPY-protocol bulk insert, lazy loading.
  • Part 4 — LINQ to SQL: how Where(x => x.Salary > 80000) becomes CASE WHEN _id_structure = X THEN _Long END > 80000, how OrderBy and window functions work.
  • Part 5 — Object trees: LoadTreeAsync, GetDescendantsAsync, WhereHasAncestor, recursive CTEs.
  • Part 6 — Window functions: Win.RowNumber(), Win.Rank(), PartitionBy/OrderBy over REDB objects.

The deep C# dive is parts 3–5. In the storage layer, one topic remains — schema migrations — and that's where the next post picks up.


Links


Earlier in the series (on dev.to)

Top comments (0)