A couple of weeks ago I published REDB inside, part 1 — the 13 tables the whole engine runs on. It walked through _objects, _values, _structures, why this isn't classical EAV, and what the _scheme_metadata_cache does. If you haven't read it, start there — the rest of this post assumes you know the layout.
This is part 1.1, not part 2. Same physical-storage conversation, different angle. Part 2 is going to be about code-first schemes (SyncSchemeAsync<T>), and the deep C# dive — LINQ translator, CRUD internals, trees — that's parts 3 through 5 of the series. This post stays in the database layer.
The reason for a 1.1 instead of jumping to 2 is simple: every time I publish part 1, the same question comes back in the comments and in DMs:
"Fine, typed columns beat string EAV. But you've still got 9.7 million rows in
_values. Any real query —WHERE Salary > 80000,WHERE OrderDate >= '2026-06-01'— has to scan that table. Without one index per field per schema you're living in Seq Scan."
Fair question. Let's actually answer it, with the DDL, the prod numbers, and a side note about how to remove indexes once your app stabilizes.
Numbers in this post come from TSUM — a logistics system handling truck movements and orders through distribution centers. Real production, no benchmark setup.
TL;DR: In a classical EF schema, indexes multiply with tables. In REDB, the index set is designed once in DDL and serves any business schema you put on top — adding the 33rd class to your domain doesn't add a single line to
redbPostgre.sql. The active surface for any business class is just 2 tables (_objects+_values), plus 2 more for lookups and 3 for RTTI metadata; the rest is infrastructure that doesn't grow with your class count. On TSUM prod this gives 999 orders / 991 ms on 2 cores with default Postgres settings and zero framework-level cache: one SELECT pulls all 999 existing routes (139 ms), bulk-save writes 32 changed objects through the COPY protocol (154 ms), and 967 unchanged rows never reach the database thanks to an app-levelComputeHash()short-circuit.
The "9.7 million rows" objection, defused in two sentences
Before we open any DDL, here's the structural answer to the objection above.
Classical EAV has rows like (object_id, attribute_name TEXT, value TEXT). An index on value is useless there because values are heterogeneous, casts happen at runtime, and selectivity on attribute_name is low. REDB is not that.
Two things change the picture:
-
Values are already split into typed columns.
_Long,_String,_Numeric,_DateTimeOffset,_Boolean,_Guid,_ListItem,_Object. No casts at query time. An index on_Longis a normal bigint btree, not a polymorphic guess. -
Every row in
_valuescarries_id_structure— a bigint pointing at the schema field it belongs to. With 401 fields across the TSUM domain,WHERE _id_structure = Xdiscards roughly 99.75% of the table before the optimizer even looks at the value column.
So a facet query never scans 9.7M rows. It scans ~_values_count / structure_count rows, then does an Index Only Scan on a typed column. We'll see this in the EXPLAIN-equivalent reasoning below.
That's the architecture. Now the question that actually matters: how exactly are the indexes laid out so that this runs in milliseconds?
The architectural payoff: indexes don't grow with your domain
In a classical EF layout, the table count grows linearly with business entities. Each table averages 3–5 indexes — primary key, FK indexes, business filters. So the total number of indexes in the database grows multiplicatively with project size.
Take TSUM. Here's what's in tsum.Domain/Entities:
-
TransportationRoute(the actual route) -
TransportationPoint(route stops) -
Driver,Vehicle,ShippingPoint,YardPlace -
SliceSettings,SliceSnapshot,TransportSnapshot,TransportNorm -
SpecialRcSettings,TonnageGroupSettings,GarageState -
TsumAdUserRef,UserFilterPreference
Fifteen "root" entities. But that's just the start. Inside TransportationRoute alone you'll find RedbListItem references to: Drivers, Vehicles, CarMarks, ShippingPoints, BusinessTypes, YardPlaces (×2 — PlaceTo / PlaceFrom), LoadingZones, TransportStatuses, DeliveryStatuses, TripRisks. Every other entity has its own set: chassis types, readiness statuses, delay reasons, route classifications.
In a classical schema, every one of those lookups is its own table. Plus audit/history tables (TransportSnapshot, SliceSnapshot are clearly snapshot entities). Plus M2M tables for collection links. A realistic estimate for TSUM in a classical layout is 60–80 tables.
At 60–80 tables and 3–4 indexes per table, you're looking at 200–320 indexes to design, maintain, reindex, and drop when stale. Every release in an EF project ships an AddColumn + CreateIndex migration. By year two, somebody walks the schema with a flashlight removing duplicate indexes that piled up because three different developers added similar-but-not-identical indexes for slightly different queries.
In REDB the picture is different. The same 32 schemes / 401 properties in TSUM physically have 2 primary keys: pk__objects and pk__values. The full index list is hardcoded in redbPostgre.sql and runs around 50 indexes — for the entire system. Not per class. For the system.
Now here's the clarification I owe readers from part 1. I called REDB "13 tables" in that post, and a fair number of people walked away with the impression that all 13 are involved every time you query a class. They aren't. The actual breakdown is:
-
Data storage — 2 tables:
_objects(the header) and_values(the field values). EveryQuery<T>(), everySaveAsync, every LINQ filter lands here. -
Lookups — 2 tables:
_listsand_list_items. These powerRedbListItemfields (Driver, Vehicle, ShippingPoint, etc.). Lookups are shared across classes and don't grow when you add a new business entity. -
RTTI / schema metadata — 3 tables:
_types(primitives),_schemes(classes),_structures(class properties). The engine reads these through_scheme_metadata_cache; business queries never touch them. -
Everything else — infrastructure:
_users,_roles,_permissions,_links,_dependencies,_functions,_scheme_metadata_cache, soft-delete via@@__deleted. These don't grow with class count and aren't on the hot path.
So whatever class a developer writes in tsum.Domain, its data physically lives in 2 tables; reference fields hit 2 more for lookups; schema metadata sits in 3 RTTI tables. The six infrastructure tables look identical for a 5-class project and a 500-class one. That is what "indexes invariant of class count" really means — not "all 13 tables on every query," but "the active footprint for business data is two tables, and it doesn't change."
When a developer adds the 33rd class to tsum.Domain — say, WarehouseSlot — the database gets:
- one row in
_schemes, - N rows in
_structures(one per property), - zero index migrations. None.
Because the entire query surface for the new class is served by the same indexes on _objects and _values that served the previous 32. WHERE _id_scheme = ID(WarehouseSlot) uses IX__objects__schemes. Sort by creation date — IX__objects__scheme_date_create. Find an object by some property value — IX__values__structure_object_lookup. Parent–child tree — IX__objects__scheme_parent.
This is the architectural payoff the post is built around. The index plan is fixed at engine design time, not smeared across business-app releases. The skeptic will counter: "so generic indexes on _values must be slower than per-field indexes in an EF schema." They aren't — and we'll see why below.
The same applies to MSSQL. Worth saying out loud: this post isn't Postgres-specific. redb.MSSql/sql/redbMSSQL.sql carries a mirror set of indexes with minor dialect differences (covered in a later section). The architectural trick — indexes on the physical schema, not on business tables — ports between engines without giving anything up.
A tour of redbPostgre.sql, grouped by what it's for
If you open the DDL and read top to bottom you'll see ~50 indexes interleaved. They're easier to understand grouped by use case rather than by table. Let's do it that way.
1. Storage uniqueness in _values
Three partial-unique indexes, one per shape:
-- Scalar fields: one (object, field) → one value
CREATE UNIQUE INDEX UIX__values__structure_object
ON _values (_id_structure, _id_object)
WHERE _array_index IS NULL AND _array_parent_id IS NULL;
-- The marker row for a nested class/array inside a parent
CREATE UNIQUE INDEX UIX__values__structure_object_parent
ON _values (_id_structure, _id_object, _array_parent_id)
WHERE _array_index IS NULL AND _array_parent_id IS NOT NULL;
-- Collection elements indexed by position/key
CREATE UNIQUE INDEX UIX__values__structure_object_array_index
ON _values (_id_structure, _id_object, _array_parent_id, _array_index)
WHERE _array_index IS NOT NULL;
Why three instead of one. Each shape has a different uniqueness key. Scalars are unique by (structure, object). Nested-structure markers add parent. Array elements add index. A single covering unique index can't express this (NULLs would break the semantics), but three partials map cleanly onto the physical model.
Bonus: each partial only stores its slice of rows, so the btree is smaller than one giant unique index would be. On 9.7M rows that's tens of percent saved in page cache.
2. Facet search — the hottest path
A facet query in REDB is "find objects whose field X compares to Y." In SQL terms:
SELECT v._id_object FROM _values v
WHERE v._id_structure = @structure_id
AND v._Long > 80000;
That pattern is served by a single composite index:
CREATE INDEX IX__values__structure_object_lookup
ON _values (_id_structure, _id_object,
_Long, _DateTimeOffset, _Boolean, _Double, _Guid, _Numeric, _ListItem, _Object);
Every typed column except _String is in the key. Index Only Scan: Postgres finds the matching rows and grabs the value directly from the index — no heap fetch.
In parallel, two covering indexes with INCLUDE:
CREATE INDEX IX__values__object_structure_lookup
ON _values (_id_object, _id_structure, _array_index)
INCLUDE (_Long, _Double, _DateTimeOffset, _Boolean, _Guid, _Numeric, _ListItem, _Object);
CREATE INDEX IX__values__object_array_null
ON _values (_id_object, _id_structure)
INCLUDE (_Long, _Double, _DateTimeOffset, _Boolean, _Guid, _Numeric, _ListItem, _Object)
WHERE _array_index IS NULL;
These cover the opposite direction: "load all values of an object in one query," for LoadAsync<T>. Same trick — Index Only Scan, zero heap reads.
Why _String is excluded. Postgres btree caps at ~2700 bytes per index row. If a JWT payload ends up in _String at 3 KB, the insert blows up. So strings get their own partial index with a length guard:
CREATE INDEX IX__values__String_not_null
ON _values (_id_structure, _id_object, _String)
WHERE _String IS NOT NULL AND length(_String) < 2000;
Long strings (CMS bodies, base64 payloads) aren't covered — and they aren't filtered with = either. For substring search there's a separate GIN:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX IX__values__String_pattern
ON _values USING gin (_String gin_trgm_ops)
WHERE _String IS NOT NULL;
This serves LIKE, ILIKE, $contains, $startsWith, $endsWith, regex — all through pg_trgm. The MSSQL counterpart is full-text search or a persisted computed column with a filtered index; the choice is in the dialect's DDL.
3. Collections (arrays and dictionaries)
Part 1 covered relational storage of collections — a marker row plus child rows linked by _array_parent_id and _array_index. The supporting indexes:
-- traverse collection elements by index
CREATE INDEX IX__values__array_parent_index
ON _values (_array_parent_id, _array_index);
-- look up by dictionary key (string keys)
CREATE INDEX IX__values__array_key
ON _values (_id_structure, _array_index)
WHERE _array_index IS NOT NULL;
-- partial: only rows that belong to collections
CREATE INDEX IX__values__parent_structure
ON _values (_array_parent_id, _id_structure)
WHERE _array_parent_id IS NOT NULL;
The third one feeds the pivot CTE that reads nested Class/Dictionary fields like AddressBook["home"].City.
4. Object trees (_objects._id_parent)
Trees are the backbone of REDB — sections, categories, org charts, all linked through _id_parent:
-- direct children: scheme + parent → id
CREATE INDEX IX__objects__scheme_parent
ON _objects (_id_scheme, _id_parent, _id);
-- covering index for subtree traversal (metadata in INCLUDE)
CREATE INDEX IX__objects__parent_id_descendant_lookup
ON _objects (_id_parent, _id_scheme)
INCLUDE (_id, _id_owner, _date_create, _date_modify)
WHERE _id_parent IS NOT NULL;
-- partial: roots (objects without a parent)
CREATE INDEX IX__objects__root_objects
ON _objects (_id_scheme, _id)
WHERE _id_parent IS NULL;
-- reverse path: child → parent → scheme
CREATE INDEX IX__objects__id_parent_scheme
ON _objects (_id, _id_parent, _id_scheme)
WHERE _id_parent IS NOT NULL;
WhereHasAncestor, WhereHasDescendant, LoadTreeAsync all run through this quartet.
5. Sorting and metadata
-- main "newest first" feed index
CREATE INDEX IX__objects__scheme_date_create
ON _objects (_id_scheme, _date_create DESC, _id);
-- sort by name within a scheme
CREATE INDEX IX__objects__scheme_name
ON _objects (_id_scheme, _name, _id);
-- global indexes for cross-scheme search
CREATE INDEX IX__objects__name ON _objects (_name);
CREATE INDEX IX__objects__hash ON _objects (_hash);
And the indexes for RedbPrimitive<T> (when an object is a primitive — counter, token, single value):
CREATE INDEX IX__objects__value_long ON _objects (_value_long) WHERE _value_long IS NOT NULL;
CREATE INDEX IX__objects__value_string ON _objects (_value_string) WHERE _value_string IS NOT NULL;
CREATE INDEX IX__objects__value_guid ON _objects (_value_guid) WHERE _value_guid IS NOT NULL;
CREATE INDEX IX__objects__value_datetime ON _objects (_value_datetime) WHERE _value_datetime IS NOT NULL;
CREATE INDEX IX__objects__value_numeric ON _objects (_value_numeric) WHERE _value_numeric IS NOT NULL;
All partial. Each one only indexes rows where its value column is set.
6. Schemes and structures (_structures, _schemes)
These are the cold tables — orders of magnitude smaller than _values, mostly read while building the metadata cache. They still need to be sub-millisecond:
-- covering index for ORDER BY field name
CREATE INDEX IX__structures__name
ON _structures (_name)
INCLUDE (_id, _id_type, _collection_type, _id_scheme);
-- covering lookup by structure ID
CREATE INDEX IX__structures__id_lookup
ON _structures (_id)
INCLUDE (_id_type, _name, _collection_type, _id_scheme);
-- partial: split collection vs non-collection
CREATE INDEX IX__structures__not_collection
ON _structures (_id, _name, _id_scheme)
WHERE _collection_type IS NULL;
CREATE INDEX IX__structures__collection
ON _structures (_id, _id_scheme, _collection_type)
WHERE _collection_type IS NOT NULL;
The DDL comments record the reason: these indexes kill Seq Scan on InitPlan and EXISTS subqueries inside build_hierarchical_properties_optimized, dropping query cost from 6.10 to 4.29 (-30%).
What got removed
redbPostgre.sql has a commented-out block worth quoting:
-- ============================================
-- REMOVED REDUNDANT INDEXES (migration_drop_redundant_indexes.sql)
-- Reason: Covered by composite index IX__values__structure_object_lookup
-- Facet search ALWAYS filters by (_id_structure, _id_object) BEFORE value
-- ============================================
-- CREATE INDEX "IX__values__String" ON _values (_String) ...;
-- CREATE INDEX "IX__values__Long" ON _values (_Long) ...;
-- CREATE INDEX "IX__values__Guid" ON _values (_Guid) ...;
-- ...
That's a cleanup history committed straight into the schema file. Early REDB shipped one index per typed column. Real workloads showed they never lit up: facet queries always start from _id_structure, never from a raw value column. The composite (_id_structure, _id_object, _Long, ...) swallowed them whole. The single-column ones got pulled.
Why this is fast on prod
Architecture is half the story. The other half is the actual numbers. Here's a typical TSUM order-processing tick from production logs:
[TSUM] orders=999 routes(+2 ~30 =967) drivers(+0 ~0) vehicles(+0 ~0)
sync=482 query=139 save=154 total=991ms
What it means:
[TSUM] orders=999 routes(+2 ~30 =967) drivers(+0 ~0) vehicles(+0 ~0) sync=482 query=139 save=154 total=991ms
-
999 orders arrived in one XML batch from SAP S/4 (the stored procedure
usp_TsUM_MonitoringReport_xml). - routes(+2 ~30 =967): of the 999 incoming, 2 were new routes, 30 changed, 967 weren't touched at all (content unchanged).
- drivers / vehicles (+0 ~0): the driver and vehicle dictionaries had no changes in this batch.
-
sync = 482 ms — this is lookup-dictionary sync (
DictionarySyncService.SyncFromOrdersAsync): walk all 999 orders, reconcile drivers / vehicles / shipping points / business types / delivery statuses against REDB lists, upsert any newlist_itemsif needed. This is not schema sync. -
query = 139 ms — one SELECT to REDB:
WhereRedb(o => codes.Contains(o.ValueString))— pulls all 999 existing routes by their codes in a single query. -
save = 154 ms —
SaveAsyncfor 32 objects (2 created + 30 updated). The 967 unchanged ones never get here. - Environment: 2 cores, default
shared_buffers, nopg_prewarm, no framework-level cache.
Important caveat: schema sync (
SyncSchemeAsync<T>— the bit that compares_structure_hashand refreshes_scheme_metadata_cache) is a separate step that runs once at process startup, before any orders flow. It's not in this log line at all. The "sync" number here is application-level dictionary reconciliation, not schema reconciliation.
Let's break down what's physically happening behind each number.
Query: 139 ms for 999 objects in one call
The most striking number in the log. This isn't 999 separate LoadAsync calls — it's one facet query:
var existing = await redb.Query<TransportationRoute>()
.WhereRedb(o => codes.Contains(o.ValueString!))
.ToListAsync();
It translates roughly to:
-- find all routes whose _value_string is in the list of 999 codes
SELECT o.* FROM _objects o
WHERE o._id_scheme = @routeSchemeId
AND o._value_string = ANY(@codes);
After that, one batched query for _values of all matched objects via IX__values__object_array_null (covering INCLUDE) — Index Only Scan, no heap access. Output: 999 fully hydrated RedbObject<TransportationRoute> instances with all ~50 properties each, including ListItem references to dictionaries.
139 ms / 999 objects = 0.14 ms per object on paper, but that math is misleading — the real cost is two SQL round trips (one for _objects, one for _values) plus C# hydration.
No N+1. No Includes. ListItem references resolve through RefDataCache in memory, so Vehicle, Driver, ShippingPoint get filled without extra SQL.
Save: 154 ms for 32 changed objects
Two things matter here, and both are critical.
First — application-level change tracking. Before SaveAsync, every existing route runs through ComputeHash():
var hashBefore = obj.ComputeHash();
EnrichRouteFromOrder(obj.Props, routeProps);
if (obj.ComputeHash() != hashBefore)
{
toSave.Add(obj); // data really changed — queue for save
updatedCount++;
}
else
{
skippedCount++; // byte-identical — don't touch the database
}
Of the 999 incoming orders, 967 are skipped right here. They never hit _objects/_values, never generate a single DML statement. This isn't a REDB-engine optimization — it's an application-code pattern, but REDB supports it cheaply via ComputeHash() over Props.
Second — bulk save in a single transaction. The remaining 32 objects go in one call:
await redb.SaveAsync(toSave); // toSave: List<IRedbObject>, 32 items
Here's what happens inside. _objects gets normalized in a batch first — for all 32 objects, one INSERT/UPDATE pass (new rows go via COPY/UNNEST-INSERT, modified rows via batched UPDATE). Then _values: for each modified object the value rows are recreated, and again not row-by-row but in one bulk call through Postgres COPY protocol. All in a single transaction — _objects and _values commit atomically.
COPY matters. It's a streaming binary protocol that ingests N rows without a per-row round trip — the driver writes a continuous byte stream into the socket. For 32 objects × ~50 properties = ~1600 value rows, that's one COPY operation instead of 1600 INSERTs. Btree indexes get their batch and update amortized.
So 154 ms for 32 objects (~4.8 ms per object on full overwrite) isn't "the engine is very fast" — it's "the engine doesn't do unnecessary work." Bulk where it can bulk; early skip where data didn't change.
Sync: 482 ms — what it really is
482 ms is the heaviest line in the log, and it's also the most commonly misread one. It's not schema sync — schema sync already ran at Worker startup, before any orders showed up. The 482 ms is DictionarySyncService.SyncFromOrdersAsync:
- Five parallel dictionary loads from REDB:
Drivers,Vehicles,ShippingPoints,BusinessTypes,DeliveryStatuses— viaredb.ListProvider.GetListByNameWithItemsAsync(...). Pulls_listsplus all their_list_items. - Walk over 999 orders: each
DriverId,CarId,ShippingPoint,BusinessTypesis checked against existing list items. If new, an upsert is queued. - If dictionaries changed — refresh
RefDataCache(a static in-memory cache so the Order → RouteProps mapping doesn't re-query the DB).
This batch had zero deltas on drivers and vehicles (+0 ~0), but 482 ms still went into proving they didn't change — that's 999 rows being checked against thousands of existing list items across multiple dictionaries.
This is application-level sync. It belongs to TSUM's logistics process, not REDB's engine. But it makes a useful point: even when "discovering nothing needs to change" costs more than the actual save (482 ms sync vs 154 ms save), the whole tick still fits in under a second for a thousand orders.
Schema sync is a separate story. Cold start of the Worker — tens to hundreds of milliseconds, one-off (32 schemes, hash comparison). Hot restart with no model changes — single-digit milliseconds. It happens before the first [TSUM] log line and isn't counted in the per-tick budget.
Coming back to the 9.7M-row objection
The original objection was "how can this be fast at that table size." The answer: the engine never scans _values whole. Any query starts from one of these four anchors:
-
PK (
LoadAsyncby ID) — O(log n) on the btree. -
_id_scheme+_value_stringfor facets through aRedbPrimitivecolumn, like the TSUM case above — Index Only Scan onIX__objects__value_string. -
_id_structure(regular facet search) — discards ~99.75% of rows immediately. With 401 properties, the average_id_structurecovers ~24K of the 9.7M total. -
_id_object(LoadAsync— all values for an object) — discards even more aggressively, since an object has tens of values, not thousands.
Table size in rows only affects the index size in pages. Postgres btree fan-out is in the hundreds, so 9.7M rows is a 4–5 level deep tree. Search is logarithmic, not linear.
"You don't always need everything you've got": pruning indexes after stabilization
redbPostgre.sql carries the full spectrum of indexes for every REDB scenario: facet search, trees, collections, GIN full-text, partial indexes for NOT NULL, indexes for recursive permission CTEs. That's right for a library: you don't know up front which workload profile a given app will have.
But in your specific app, some of those indexes may never light up. That's fine. TSUM, for instance, never calls WhereHasDescendant because its domain hierarchies are flat — so IX__objects__parent_id_descendant_lookup sits in pages but no scan ever goes through it.
After the app's been in production for two to four weeks under stable load, pull the index stats and trim aggressively:
SELECT schemaname, relname, indexrelname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS sz
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND pg_relation_size(indexrelid) > 10*1024*1024 -- only indexes > 10 MB
ORDER BY pg_relation_size(indexrelid) DESC;
On TSUM, this query surfaces about 4423 MB of indexes with idx_scan = 0. That doesn't mean those indexes are bad. It means nobody's hitting those query paths in this workload profile.
Working with the result:
Don't drop everything immediately. First, walk the list: which indexes serve scenarios that "haven't fired yet" (quarterly reports, year-end reconciliations, big imports)? Leave those alone — re-creation later costs more than the disk you'd save now.
The rest are candidates for
DROP INDEX CONCURRENTLY. It's safe — no table lock — and you can always restore the index with the sameCREATE INDEX CONCURRENTLY, since the DDL knows what shape it should have.After dropping, watch for two to four more weeks. If no unexpectedly slow queries show up in logs, the call was right. If something regresses, restore.
This is a standard PostgreSQL practice, not a REDB-specific trick. But it works cleaner in REDB because the index set is designed once and frozen in DDL. Pruning is one targeted decision in one place — not a hunt through three years of accumulated migrations across business tables.
The other direction works too. If you spot a single facet query that's hot enough to warrant an extra push, nothing stops you from adding a local index just for it:
-- e.g., "find routes for a specific driver in the last 7 days"
CREATE INDEX IX_my_app__route_by_driver
ON _values (_id_structure, _ListItem, _id_object)
WHERE _id_structure = <driver_field_id>;
That kind of index is application-side optimization. It lives in your app's migration, not in the REDB core. And the same pg_stat_user_indexes audit applies to it after a few weeks.
What's deliberately not in the DDL, and why
A second-level skeptic asks the next question: "Why no hash index on _String? Why no BRIN on _date_create? Why no expression index on LOWER(_String)?"
Hash indexes. Historically weaker than btree on most metrics in Postgres, and they don't support partial conditions. Almost every REDB index is partial — so hash is structurally a non-fit, not a coin-flip choice.
BRIN on _date_create. A reasonable candidate on big archival tables, but redundant for the current REDB profile. IX__objects__scheme_date_create (DESC, _id) already serves the exact scenario BRIN would compete with. BRIN would only win in a cold-storage case (>10M rows on a single scheme) — that's a separate "archival classes" phase, and the current core DDL doesn't address it yet.
Expression indexes (LOWER(_String), _DateTimeOffset::date, etc.). Deliberately omitted. This is an application choice: case-insensitive search on a specific text field, aggregation by day/month — these are options that belong in the app's migration, not the engine. GIN with pg_trgm already provides case-insensitive substring search via ILIKE, and that covers most needs.
Bloom, BRIN-multi, GIN on jsonb. All candidates for narrow scenarios. If you have one very specific query that's provably hot and provably not covered by the standard set — you add a local index for it in a separate migration. Without touching redbPostgre.sql.
The principle is general: the core DDL covers 95% of scenarios, the remaining 5% lives in the application layer. That's the opposite of the EF approach, where the DDL belongs to the application from day one and every index is part of its migration history.
MSSQL: nearly the same picture
While we're in DDL territory, worth saying clearly: this post isn't Postgres-specific. redb.MSSql/sql/redbMSSQL.sql carries a mirror set of indexes, and the architectural trick — "indexes don't grow with class count" — works identically on both engines.
Ports cleanly:
- All three partial-unique indexes on
_values(in MSSQL — filtered indexes withWHERE, different syntax, same semantics). - Composite indexes for facet search (
IX__values__structure_object_lookupexists in both DDLs with an identical key shape). -
INCLUDEcovering indexes — natively supported in MSSQL since 2005, port one-to-one. - Object-tree indexes (
IX__objects__scheme_parent,IX__objects__parent_id_descendant_lookup) — unchanged. - Partial / filtered indexes for
RedbPrimitive<T>— unchanged.
Differs:
-
GIN +
pg_trgmfor substring search on_String. The MSSQL counterpart is full-text search or a filtered index over a computed column.redbMSSQL.sqltakes a different path (LIKEover a regular btree with a length-bound partial). -
NULLS NOT DISTINCTin Postgres 15+ — MSSQL spells it differently (UNIQUE+ filteredWHERE NOT NULL). -
ON DELETE CASCADEfrom_structuresto_valuesworks directly in Postgres but not in MSSQL because of multiple cascade paths — replaced with anINSTEAD OF DELETEtrigger. The DELETE plan is slightly different in MSSQL but the indexes are the same. -
Per-type filtered indexes are spelled out more explicitly in MSSQL (
IX__values__Long_filter,IX__values__Guid_filter, ...) — in the Postgres DDL they're commented out as "covered by composite." That's a strategy difference: the MSSQL optimizer is a bit more partial to narrow filtered indexes per type.
But the main point — architectural invariance is preserved. Same 2+2+3+infrastructure layout, same facet schema, same fact that the index set doesn't grow when you add a class. To a developer writing C# models, the difference between Postgres and MSSQL is invisible.
TSUM as the proof case
Bringing the numbers from the top of the post into one place.
What REDB serves in TSUM:
- 32 classes in
tsum.Domain(fromTransportationRoutetoUserFilterPreference), - 401 properties across those classes,
- 227,896 objects at the time these metrics were captured,
- 9,773,174 rows in
_values.
This whole schema lands not in 13 tables per entity, but in 2 data tables (_objects + _values) + 2 lookup tables (_lists + _list_items) + 3 RTTI tables (_types + _schemes + _structures). The remaining six are infrastructure — permissions, users, roles, links, dependencies, functions, soft-delete, metadata cache — invariant of business class count. The full index set is the same redbPostgre.sql you'd ship for a five-class project. In a classical EF layout, those 32 classes with their lookups and snapshots would expand to roughly 60–80 tables and 200+ indexes, with a migration history accumulating across the project's lifetime.
Production metrics for a typical order-processing tick:
-
999 orders in one XML batch from SAP, processed in 991 ms total:
-
sync = 482 ms— reconciling lookup dictionaries (Drivers, Vehicles, ShippingPoints, BusinessTypes, DeliveryStatuses) against REDB lists, -
query = 139 ms— one SELECT loading all 999 existing routes by their codes, -
save = 154 ms—SaveAsyncfor 32 changed objects (2 created + 30 updated). The other 967 are short-circuited byComputeHash()before they reach the database.
-
- Infrastructure — 2 cores, stock PostgreSQL settings (no
shared_bufferstuning, nopg_prewarm, no bumpedwork_mem). - No REDB-framework-level cache — every
Query<T>()round-trips to the database.
This isn't a hand-picked benchmark — it's a real production tick where a developer writes ordinary C# models with a [RedbScheme] attribute and the logistics process just runs. Bulk on the read (one SELECT for 999 objects), bulk on the write (_objects and _values together in one transaction over the COPY protocol), early skip where data didn't change.
The technical takeaway: REDB hits these numbers not through magic, but because the engine doesn't do unnecessary work. The index plan is baked into the storage schema; bulk runs where bulk applies; application-level change tracking saves the most expensive thing — DML for changes that aren't there.
Wrap-up and what's coming next
Short summary. REDB ships one index set in DDL — about 50 indexes for the entire system. That set serves any business schema: 32 classes, 100 classes, 500 classes — same indexes. That's the inverse of the EF approach, where indexes multiply with tables and slowly become technical debt that needs periodic pruning.
Operational pattern — once your application stabilizes, run pg_stat_user_indexes, find the idx_scan = 0 candidates, and prune carefully (with DROP INDEX CONCURRENTLY for safe rollback). Add your own local indexes for hot application-specific queries — in a separate migration, leaving the engine DDL alone.
MSSQL is nearly the same picture. Minor differences in text indexing and per-type filtered indexes, but architectural invariance carries over.
Coming up in the series:
-
Part 2 — Code-first schemes: how
SyncSchemeAsync<T>turns a C# class into rows in_schemes+_structures, what_structure_hashis, how automatic onboarding works when you add a new property. -
Part 3 — CRUD internals:
SaveAsyncandLoadAsyncfrom inside, change tracking via TreeDiff, COPY-protocol bulk insert, lazy loading. -
Part 4 — LINQ to SQL: how
Where(x => x.Salary > 80000)becomesCASE WHEN _id_structure = X THEN _Long END > 80000, howOrderByand window functions work. -
Part 5 — Object trees:
LoadTreeAsync,GetDescendantsAsync,WhereHasAncestor, recursive CTEs. -
Part 6 — Window functions:
Win.RowNumber(),Win.Rank(),PartitionBy/OrderByover REDB objects.
The deep C# dive is parts 3–5. In the storage layer, one topic remains — schema migrations — and that's where the next post picks up.
Links
- GitHub: redbase-app
- redb.Core repo
- Postgres DDL (redbPostgre.sql)
- MSSQL DDL (redbMSSQL.sql)
- Docs and samples (EN)
Earlier in the series (on dev.to)
- An EF Core alternative for .NET apps with complex object graphs — full LINQ, no migrations, no DbContext — the wide-angle intro to RedBase: LINQ surface, production story, generated SQL.
- REDB inside, part 1 — the 13 tables the whole engine runs on (with the actual SQL, and why it's not EAV) — the schema walkthrough this post builds on.
- redb.Route 3.0.1 — flat DSL navigation, CRTP refactor, and a silent null fix — separate cycle on the ESB framework that sits on top of redb.




Top comments (0)