rinat kozin

Posted on May 29 • Edited on Jun 1

Why our Free was so far behind Pro — and what we just shipped in RedBase 3.0.0 (with the actual SQL)

#dotnet #csharp #postgres #opensource

Since the launch posts I keep getting the same question, in DMs, issues, private chats:

"Why is the Free version so different from Pro? It feels like two different products."

It's a fair question, and the honest answer isn't a marketing answer — it's a "this is how the codebase grew over the years" answer. So let me just tell you, and then show you the SQL — both tiers, with the actual parameter values, so you can see exactly what hits the database.

This is one post. There won't be a second one. I'd rather over-explain once than dribble out a series.

30-second background

RedBase is a typed object store. You write a C# class, decorate it with [RedbScheme("name")], the engine syncs the scheme into PostgreSQL or MSSql, and you query with normal LINQ:

[RedbScheme("Employee")]
public class EmployeeProps
{
    public string Name { get; set; } = "";
    public int Age { get; set; }
    public string City { get; set; } = "";
    public Address HomeAddress { get; set; } = new();
    public Dictionary<string, Address> OfficeLocations { get; set; } = new();
    public RedbListItem? Status { get; set; }
    public DateTimeOffset DateCreate { get; set; }
}

var employees = await service.Query<EmployeeProps>()
    .Where(p => p.Age > 30 && p.City == "London")
    .OrderBy(p => p.Name)
    .ToListAsync();

Storage is six tables: _objects, _values, _structures, _schemes, _list_items, _users. One row per object lives in _objects. Each property of that object lives as one or more rows in _values, keyed by (_id_object, _id_structure, _array_index). So filtering by N properties means joining/pivoting N rows back into one logical row per object.

How you do that pivot defines the performance ceiling of the engine. That's the part 3.0.0 fixes for the Free tier.

How we got here — the two paths that existed before 3.0.0

Pro path (since ~2024) — compiled in C

Pro walks the LINQ expression tree in process via ExpressionToSqlCompiler + SqlExpressionVisitor, registers bind variables through SqlParameterCollector, and emits a single PVT (pivot) CTE per query. For the LINQ above, with _id_scheme=42, structure id 101=Age, 102=City:

-- Pro on PostgreSQL — exact text Npgsql sends to the server
WITH pvt_cte AS (
  SELECT v._id_object,
    (array_agg(v._Long)   FILTER (WHERE v._id_structure = $1 AND v._array_index IS NULL))[1] AS "Age",
    (array_agg(v._String) FILTER (WHERE v._id_structure = $2 AND v._array_index IS NULL))[1] AS "City"
  FROM _values v
  JOIN (SELECT _id FROM _objects WHERE _id_scheme = $3) o ON v._id_object = o._id
  WHERE v._id_structure = ANY($4)
  GROUP BY v._id_object
)
SELECT o.* FROM _objects o
JOIN pvt_cte pvt ON o._id = pvt._id_object
WHERE pvt."Age" > $5 AND pvt."City" = $6
ORDER BY (array_agg(v._String) FILTER (WHERE ...))[1];

-- bound parameters (NpgsqlParameter, never interpolated into text):
--   $1 = 101         (bigint, structure id of Age)
--   $2 = 102         (bigint, structure id of City)
--   $3 = 42          (bigint, scheme id)
--   $4 = {101, 102}  (bigint[])
--   $5 = 30          (bigint)
--   $6 = 'London'    (text)

What this gets you:

One index hit on (_id_structure, _id_object) — _id_structure = ANY($4) uses the structure-id covering index, narrowing to only the props the query touches.
One GROUP BY pass — FILTER (WHERE …) is PostgreSQL's way of splitting values into columns during aggregation. Each scalar pivot field costs one array_agg aggregator, all running in the same scan. The [1] subscript picks the single element because scalar fields have at most one _values row.
Outer WHERE on flat columns — pvt."Age" > $5 is a normal B-tree comparison on the CTE's projected columns. The optimizer reorders these freely.
Stable parameterized SQL text — same $1..$N placeholders every call, so PostgreSQL's prepared-statement plan cache works. Connection pools love this.

5 fields or 50 — same query shape, same cost class.

And here is the same query through Free 3.0.0 on PostgreSQL — byte-for-byte the same shape:

-- Free 3.0.0 on PostgreSQL — emitted by pvt_build_query_sql(scheme, facets_jsonb)
WITH pvt_cte AS (
  SELECT v._id_object,
    (array_agg(v._Long)   FILTER (WHERE v._id_structure = 101 AND v._array_index IS NULL))[1] AS "Age",
    (array_agg(v._String) FILTER (WHERE v._id_structure = 102 AND v._array_index IS NULL))[1] AS "City"
  FROM _values v
  JOIN (SELECT _id FROM _objects WHERE _id_scheme = 42) o ON v._id_object = o._id
  WHERE v._id_structure IN (101, 102)
  GROUP BY v._id_object
)
SELECT o.* FROM _objects o
JOIN pvt_cte pvt ON o._id = pvt._id_object
WHERE pvt."Age" > 30 AND pvt."City" = 'London';

-- The literals you see here are NOT string-concatenated user input.
-- They are emitted by format(..., %L) with explicit type casts inside
-- the plpgsql builder. The C# caller made two parameterized roundtrips:
--   1) SELECT pvt_build_query_sql($1, $2::jsonb, $3, $4)   -- returns SQL text
--        with ($1=schemeId, $2=facets, $3=limit, $4=offset) as Npgsql parameters
--   2) EXECUTE the returned text on the connection → collect _id values
--   3) SELECT * FROM get_objects_json($1::bigint[], $2)    -- hydrate by ids
--        with ($1={...matched ids...}, $2=maxDepth) as Npgsql parameters
-- See the SQL-injection section below for the full chain.

Same CTE, same array_agg FILTER, same outer flat WHERE. The only structural difference: the SQL text is built once per call by pvt_build_query_sql inside the database, so the outer EXECUTE runs a unique string each call — values come baked in as %L-quoted literals with type casts (30, 'London') rather than as $N placeholders. That has plan-cache implications versus Pro (the SQL text varies, so PG can't reuse a plan across calls with different values), but the index path and join shape the planner picks are identical. For the parameterized story, see how this gets called from any language a few sections down.

Free path (until 3.0.0) — interpreted on the database side

The Free path serialized the LINQ filter into a JSON facet structure, shipped it to a plpgsql function, and that function built dynamic SQL inside the database. For the same LINQ above:

// C# side just packs the facets and calls one function
var facets = new {
    @and = new object[] {
        new { Age  = new { gt = 30 } },
        new { City = new { eq = "London" } },
    }
};
await conn.QueryAsync("SELECT * FROM search_objects_with_facets($1, $2)",
                      schemeId, facets);

The function then emitted, for those exact arguments:

-- Old Free on PostgreSQL — N props in WHERE → N correlated EXISTS
SELECT o.*
FROM _objects o
WHERE o._id_scheme = 42
  AND EXISTS (
    SELECT 1 FROM _values v
    WHERE v._id_object = o._id
      AND v._id_structure = 101
      AND v._Long > 30
  )
  AND EXISTS (
    SELECT 1 FROM _values v
    WHERE v._id_object = o._id
      AND v._id_structure = 102
      AND v._String = 'London'
  );

PostgreSQL optimizes correlated EXISTS well when the right index is there, so this is fine at 1–3 props. By 10 props the planner is doing real work to choose join order. By 30 props you're seeing query times you don't want. (For curiosity: the legacy path stays in the database under the name search_objects_with_facets() for back-compat — old callers keep working.)

On MSSql Free the picture was uglier still — a wide-CASE inline pivot that hadn't received the same iterations as the PG side. Different SQL shape from PG, different code path inside the library, different bugs. The two Free implementations had drifted.

Why the gap existed for so long

I built the Pro engine because we needed it. We use Pro internally — that's what runs the production workloads. The Free path kept working for its users, but parity was always "next quarter." It's not a single feature, it's:

pivot CTE generation for both dialects;
base-field pushdown into the inner _objects subquery;
nested-dict accessor (Field[key].Child);
ListItem .Value / .Alias via a single JOIN;
GroupBy, HAVING, ArrayGroupBy, DistinctBy, Sql.Function whitelist, $expr trees;
null semantics, $exists / $notNull;
a filter-splitting optimizer that decides what's a Shape A (pure base), B (narrow) or C (wide pivot) query;
and all of it round-tripping through 200+ integration tests per backend.

Spread that across two databases and you understand why it kept slipping. The interest spike from the launch posts is what pushed me to drop the half-written stuff and ship this. It's not finished-finished — that's why it's an OSS release instead of a private polish round.

What 3.0.0 actually changes

The headline: Free and Pro now emit the same PVT CTE shape, on both PostgreSQL and MSSql.

1. MSSql Free hit full v2-pvt parity — 145/145

The old MSSql Free path is gone. The new one is a 27-file SQL module under redb.MSSql/sql/v2-pvt/, assembled by an MSBuild target into a single pvt_bundle.sql resource embedded in RedBase.MSSql.dll:

00_module_init.sql           -- version sentinel, drop-and-replace bootstrap
10_pvt_field_collection.sql  -- walks the facet JSON, harvests structure ids
13_pvt_condition.sql         -- per-field WHERE fragments
14_pvt_where.sql             -- recursive $and/$or/$expr walker
15_pvt_order.sql             -- ORDER BY building (incl. $expr ordering)
16_pvt_split.sql             -- Shape A/B/C classifier, pushdown engine
17_pvt_expr.sql              -- $expr classifier + scalar expression compiler
20_pvt_build_query_sql.sql   -- the entry point — returns the final SQL text
99_smoke_auto.sql            -- 195 PASS / 0 FAIL / 1 SKIP regression suite
... (and friends)

The MSSql dialect uses MAX(CASE WHEN …) instead of array_agg FILTER (SQL Server has no FILTER clause), but the rest of the shape is the same. Same LINQ as above, on MSSql Free, becomes:

-- Free MSSql 3.0.0 — emitted by the T-SQL builder under [dbo].[pvt_build_query_sql]
WITH pvt_cte AS (
  SELECT v.[_id_object],
    MAX(CASE WHEN v.[_id_structure] = 101 AND v.[_array_index] IS NULL THEN v.[_Long]   END) AS [Age],
    MAX(CASE WHEN v.[_id_structure] = 102 AND v.[_array_index] IS NULL THEN v.[_String] END) AS [City]
  FROM [_values] v
  JOIN (SELECT [_id] FROM [_objects] WHERE [_id_scheme] = 42) o
    ON v.[_id_object] = o.[_id]
  WHERE v.[_id_structure] IN (101, 102)
  GROUP BY v.[_id_object]
)
SELECT o.* FROM [_objects] o
JOIN pvt_cte pvt ON o.[_id] = pvt.[_id_object]
WHERE pvt.[Age] > 30 AND pvt.[City] = N'London';

-- Same idea as the PG Free block above: literals are inlined by the T-SQL
-- builder via QUOTENAME() for identifiers and quoted-literal emission for
-- values (numeric literals stay bare, strings become N'...' with embedded
-- quotes escaped). The C# caller made two parameterized roundtrips:
--   1) EXEC sp_executesql N'SELECT [dbo].[pvt_build_query_sql](@scheme, @facets, @limit, @offset)',
--        N'@scheme bigint, @facets nvarchar(max), @limit int, @offset int', ...   -- returns SQL text
--   2) EXEC sp_executesql @sql                                                    -- collect _id values
--   3) SELECT * FROM [dbo].[get_objects_json](@ids, @maxDepth)                    -- hydrate by ids
-- User input never touches a SQL parser unquoted. The SQL-injection section
-- below walks the chain.

MAX(CASE WHEN ...) is the SQL Server idiom that mirrors array_agg FILTER: NULLs are ignored by aggregates, so the CASE acts as a "pass through if matching, otherwise null" gate. Same one-pass GROUP BY, same outer WHERE on flat columns. (Unlike Pro — which sends a parameterized statement with @p1..@pN placeholders and benefits from SQL Server's plan cache — Free's text varies per call, so plan reuse depends on SQL Server's auto-parameterization. Shape and index path are identical to Pro either way.)

Everything PG Free had now works identically on MSSql Free: flat + tree queries, scalar / array / dict fields, nested-dict Field[key].Child, ListItem joins for .Id / .Value / .Alias, same-scheme nested POCO compound paths, OrderBy / Take / Skip / DistinctBy, full GroupBy with HAVING, ArrayGroupBy via OUTER APPLY, array aggregates, Sql.Function whitelist, $expr, null semantics.

2. Auto-deploy of the v2-pvt bundle

ISqlDialect got a new method:

public interface ISqlDialect
{
    // ... existing dialect surface ...
    string? Query_PvtRequiredVersion();   // semver the embedded bundle ships
}

RedbServiceBase.EnsurePvtModuleDeployedAsync runs on InitializeAsync():

var required = Dialect.Query_PvtRequiredVersion();
if (required is null) return;                 // dialect doesn't use v2-pvt

var deployed = await Context.ExecuteScalarAsync<string?>(
    Dialect.SelectPvtModuleVersionSql());     // calls pvt_module_version()

if (string.Equals(deployed, required, StringComparison.Ordinal))
    return;                                    // exact match — already deployed

Logger?.LogInformation("v2-pvt mismatch (deployed={D}, required={R}). Applying bundle.",
                       deployed, required);
await Context.ExecuteNonQueryAsync(ReadEmbeddedBundle());

The match is exact-string, not semver-numeric. Any change to any of the 27 source files bumps the bundle version, the MSBuild target regenerates pvt_bundle.sql on the next build, and on next service startup the new bundle gets applied. No DBA in the loop, no "did you forget to run the migration?" tickets.

(One MSBuild gotcha cost me half a day: declaring <EmbeddedResource Include="sql\v2-pvt\pvt_bundle.sql" /> silently renames the manifest resource from redb.MSSql.sql.v2-pvt.pvt_bundle.sql to redb.MSSql.sql.v2_pvt.pvt_bundle.sql — dashes become underscores. GetManifestResourceStream returns null and you stare at it for a while. Fix: pin <LogicalName>redb.MSSql.sql.v2-pvt.pvt_bundle.sql</LogicalName> explicitly. Free tip if you ever ship SQL through embedded resources.)

3. Base-field pushdown — the 894 ms → 11 ms one

This is the regression that originally motivated 2.0.1. When a query mixes a base-field predicate (on _objects columns) with a props predicate:

var children = await service.Query<EmployeeProps>()
    .WhereRedb(o => o._id_parent == 555)   // base field, manager id 555
    .Where(p => p.Age > 30)                 // props field
    .ToListAsync();

The naive layout puts the base predicate in the outer WHERE — that's what 2.0.0 did:

-- Bad: 894 ms on a scheme with millions of _values rows
WITH pvt_cte AS (
  SELECT v._id_object,
    (array_agg(v._Long) FILTER (WHERE v._id_structure = $1))[1] AS "Age"
  FROM _values v
  JOIN _objects o ON v._id_object = o._id
  WHERE o._id_scheme = $2 AND v._id_structure = ANY($3)
  GROUP BY v._id_object
)
SELECT o.* FROM _objects o
JOIN pvt_cte pvt ON o._id = pvt._id_object
WHERE o._id_parent = $4    -- outer: applies AFTER the CTE pivoted the entire scheme
  AND pvt."Age" > $5;

-- $1=101, $2=42, $3={101}, $4=555, $5=30

PostgreSQL aggregates _values for every employee in the scheme and only then narrows by _id_parent. Self-inflicted full pivot.

3.0.0 (both Free and Pro, both dialects) classifies the base predicate at compile time and injects it into the inner _objects subquery of the PVT CTE:

-- Good: 11 ms
WITH pvt_cte AS (
  SELECT v._id_object,
    (array_agg(v._Long) FILTER (WHERE v._id_structure = $1))[1] AS "Age"
  FROM _values v
  JOIN (SELECT _id FROM _objects
        WHERE _id_scheme = $2
          AND _id_parent = $3   -- pushed in
       ) o ON v._id_object = o._id
  WHERE v._id_structure = ANY($4)
  GROUP BY v._id_object
)
SELECT o.* FROM _objects o
JOIN pvt_cte pvt ON o._id = pvt._id_object
WHERE pvt."Age" > $5;

-- $1=101, $2=42, $3=555, $4={101}, $5=30

Free 3.0.0 emits the same shape — here it is on PostgreSQL, with the values baked in by pvt_split_filter:

-- Free 3.0.0 on PostgreSQL — pushdown identical to Pro
WITH pvt_cte AS (
  SELECT v._id_object,
    (array_agg(v._Long) FILTER (WHERE v._id_structure = 101))[1] AS "Age"
  FROM _values v
  JOIN (SELECT _id FROM _objects
        WHERE _id_scheme = 42
          AND _id_parent = 555           -- pushed in by pvt_split_filter
       ) o ON v._id_object = o._id
  WHERE v._id_structure IN (101)
  GROUP BY v._id_object
)
SELECT o.* FROM _objects o
JOIN pvt_cte pvt ON o._id = pvt._id_object
WHERE pvt."Age" > 30;

And on MSSql Free 3.0.0 — again, literals inlined by the T-SQL builder, not parameters:

-- Free MSSql 3.0.0 — pushdown via the same classifier, MAX/CASE pivot, values inlined
WITH pvt_cte AS (
  SELECT v.[_id_object],
    MAX(CASE WHEN v.[_id_structure] = 101 AND v.[_array_index] IS NULL THEN v.[_Long] END) AS [Age]
  FROM [_values] v
  JOIN (SELECT [_id] FROM [_objects]
        WHERE [_id_scheme] = 42
          AND [_id_parent] = 555         -- pushed in by the T-SQL split classifier
       ) o ON v.[_id_object] = o.[_id]
  WHERE v.[_id_structure] IN (101)
  GROUP BY v.[_id_object]
)
SELECT o.* FROM [_objects] o
JOIN pvt_cte pvt ON o.[_id] = pvt.[_id_object]
WHERE pvt.[Age] > 30;

The inner subquery resolves via the (_id_scheme, _id_parent) covering index, returns ~N matching _ids, and the pivot only sees _values rows for those N objects. Pro got this in 2.0.1; 3.0.0 brings the same to Free through pvt_split_filter and the pvt_expr_is_base_only classifier in 17_pvt_expr.sql.

4. Nested-dict accessor short-circuit

For Dictionary<string, T> fields, Where(o => o.OfficeLocations["HQ"].City == "New York") builds the dict-keyed pivot column in the CTE. There's no separate _dict_key column in _values — the existing _array_index slot (text) doubles as the dict key for dictionary fields (it stays an integer-as-text for arrays, holds the key for dicts, is NULL for scalars):

(array_agg(v._String) FILTER (
  WHERE v._id_structure = 117    -- structure id of City inside OfficeLocations
    AND v._array_index = 'HQ'    -- _array_index doubles as the dict key
))[1] AS "OfficeLocations[HQ].City"

The bug fixed in this cycle (MSSql 0.1.3, PG 0.6.1 earlier): the outer WHERE was also emitting a separate EXISTS over _values to re-check the same dict key, even though the CTE had already done that work. The six-line fix in 13_pvt_condition.sql makes the outer WHERE just reference the pivot column directly:

-- Before
WHERE EXISTS (
  SELECT 1 FROM _values v
  WHERE v._id_object = o._id
    AND v._id_structure = 117
    AND v._array_index = 'HQ'
    AND v._String = 'New York'
)

-- After
WHERE pvt."OfficeLocations[HQ].City" = $4    -- $4='New York'

One predicate, no subquery, the optimizer can reorder it freely with the rest of the WHERE.

5. What landed in Pro this cycle

While Free was catching up, Pro added its own things (the version is 3.0.0 for both):

GroupBy + HAVING in Pro. HavingAsync existed in Free since 1.2.x but had no Pro counterpart. Now:

var stats = await service.Query<EmployeeProps>()
    .GroupBy(p => p.Department.Value)
    .Where(g => g.Count() > 10 && g.Average(p => p.Age) < 40)   // HAVING
    .Select(g => new { Dept = g.Key, N = g.Count() })
    .ToListAsync();

→

WITH pvt_cte AS (
  SELECT v._id_object,
    (array_agg(v._listitem) FILTER (WHERE v._id_structure = $1))[1] AS "Department_id",
    (array_agg(v._Long)     FILTER (WHERE v._id_structure = $2))[1] AS "Age"
  FROM _values v
  JOIN (SELECT _id FROM _objects WHERE _id_scheme = $3) o ON v._id_object = o._id
  WHERE v._id_structure = ANY($4)
  GROUP BY v._id_object
)
SELECT li."_value" AS "Dept", COUNT(*) AS "N"
FROM _objects o
JOIN pvt_cte pvt ON o._id = pvt._id_object
LEFT JOIN _list_items li ON li._id = pvt."Department_id"
GROUP BY li."_value"
HAVING COUNT(*) > $5 AND AVG(pvt."Age") < $6;

-- $1=130, $2=101, $3=42, $4={130,101}, $5=10, $6=40

The Free side has had HavingAsync since 1.2.x — here's the same LINQ through Free 3.0.0 on PostgreSQL for parity, so you can see they really do meet in the middle now:

-- Free 3.0.0 PG — same HAVING shape, values inlined by pvt_build_query_sql
WITH pvt_cte AS (
  SELECT v._id_object,
    (array_agg(v._listitem) FILTER (WHERE v._id_structure = 130))[1] AS "Department_id",
    (array_agg(v._Long)     FILTER (WHERE v._id_structure = 101))[1] AS "Age"
  FROM _values v
  JOIN (SELECT _id FROM _objects WHERE _id_scheme = 42) o ON v._id_object = o._id
  WHERE v._id_structure IN (130, 101)
  GROUP BY v._id_object
)
SELECT li."_value" AS "Dept", COUNT(*) AS "N"
FROM _objects o
JOIN pvt_cte pvt ON o._id = pvt._id_object
LEFT JOIN _list_items li ON li._id = pvt."Department_id"
GROUP BY li."_value"
HAVING COUNT(*) > 10 AND AVG(pvt."Age") < 40;

And on MSSql Free 3.0.0 — same shape, literals inlined:

-- Free MSSql 3.0.0 — same HAVING shape, MAX/CASE pivot, values inlined by the T-SQL builder
WITH pvt_cte AS (
  SELECT v.[_id_object],
    MAX(CASE WHEN v.[_id_structure] = 130 THEN v.[_listitem] END) AS [Department_id],
    MAX(CASE WHEN v.[_id_structure] = 101 THEN v.[_Long]     END) AS [Age]
  FROM [_values] v
  JOIN (SELECT [_id] FROM [_objects] WHERE [_id_scheme] = 42) o
    ON v.[_id_object] = o.[_id]
  WHERE v.[_id_structure] IN (130, 101)
  GROUP BY v.[_id_object]
)
SELECT li.[_value] AS [Dept], COUNT(*) AS [N]
FROM [_objects] o
JOIN pvt_cte pvt ON o.[_id] = pvt.[_id_object]
LEFT JOIN [_list_items] li ON li.[_id] = pvt.[Department_id]
GROUP BY li.[_value]
HAVING COUNT(*) > 10 AND AVG(pvt.[Age]) < 40;

33/33 HAVING + 6/6 no-HAVING in GroupByHavingTestsBase, green on PG.Pro and MSSql.Pro — and identical row counts on the Free side.

ArrayGroupBy unified across all four tiers — GroupBy(items => items.SelectMany(o => o.Skills)) works identically on PG Free, PG.Pro, MSSql Free, MSSql.Pro.

MSSql.Pro AggregateBatch parity with PG.Pro — non-numeric MinAsync / MaxAsync (over string/DateTime/Guid) and a Where(...) filter inside a batch aggregation now produce the same plan shape on MSSql.Pro as PG.Pro.

Plus a stack of bug-fixes documented in CHANGELOG.md.

"Wait — Free builds SQL dynamically inside plpgsql. What about SQL injection?"

Right question. If you skimmed the Free SQL above and noticed it's EXECUTE-ing a dynamically-built string inside a database function — your instinct is correct, that would be a problem in a naively-written generator. Here's what actually guards against it.

Pro side (C#) — boring and safe

Pro builds parameterized SQL via SqlParameterCollector. Every value the user expression captured (30, "London", managerId) becomes an NpgsqlParameter / SqlParameter. The text Npgsql sends to PostgreSQL contains $1..$N placeholders only; values travel out-of-band through the wire protocol and never see a SQL parser. No template, no concatenation, no format(). There's literally no path for an injection because there's no string-to-SQL bridge for values.

Free side on PostgreSQL (plpgsql) — three layers, on purpose

Free has it harder than Pro because the SQL builder runs inside the database, so the values arrive baked into the JSON. The defenses, layer by layer:

1. Values pass through format(..., %L) with explicit type casts.

%L is PostgreSQL's literal-quoting format specifier — same semantics as quote_literal(). It single-quotes the value, escapes embedded quotes, and returns a SQL-safe literal. Example from 04_pvt_inner_condition.sql:

-- The leaf emitter for "Long $gt N" looks like this:
RETURN format('fv._Long %s %L::bigint', op_symbol, operator_value);

If a malicious payload arrives as operator_value = "1; DROP TABLE _objects --", %L produces:

fv._Long > '1; DROP TABLE _objects --'::bigint

The ::bigint cast then fails at parse time with a normal type error. The single quotes from %L make the injection inert; the type cast makes it a hard error rather than silent data corruption. Every value branch in 04_pvt_inner_condition.sql uses this pattern: %L::bigint, %L::numeric, %L::timestamptz, %L::uuid, %L::boolean, %L::interval. Strings use plain %L (which is '…'-quoting), and any further use as text is still inside single quotes.

2. Identifiers come from the database, not from the JSON.

The user's JSON facets refer to fields by name ("Age", "OfficeLocations[HQ].City"). The builder doesn't concatenate that string into the SQL. Instead, pvt_field_collection looks the name up in _structures for the queried scheme — by joining _structures._name = $jsonKey AND _structures._id_scheme = $scheme. If the name doesn't exist, the field is rejected. If it does exist, what gets emitted is the matched _id_structure (a bigint, always safe) and a pivot column alias built with quote_ident(structure_name_from_db) or %I. So:

The structure id is a number from the DB.
The pivot column name is quote_ident'd, so "weird name with quotes" becomes a double-quoted SQL identifier.
Reserved bases (0$:id, 0$:parent_id, etc.) are dispatched through a fixed CASE — unknown bases raise:

  RAISE EXCEPTION 'Unknown RedbObject base field: "%" ...'

No path lets arbitrary text become an unquoted identifier.

3. Operator names are a fixed whitelist.

$eq, $gt, $lt, $ne, $in, $like, $ilike, $arrayContains, $expr, etc. are matched by a CASE/IF tree in 04_pvt_inner_condition.sql and friends. Unknown operators don't fall through to anything — they RAISE EXCEPTION:

RAISE EXCEPTION 'pvt_build_agg_expr: unsupported aggregate "%"', v_op;

Same for Sql.Function — only names in the whitelist (COALESCE, LOWER, UPPER, …) survive; everything else raises.

4. The final EXECUTE evaluates a string built only from the three pieces above.

So even though EXECUTE is involved, every byte of that string came from either:

a quote_literal'd (%L) value with a type cast, or
a quote_ident'd (%I) identifier sourced from _structures by id lookup, or
a hardcoded operator template chosen by a whitelist switch.

That's three independent gates, and any of them is sufficient to defeat injection. The plpgsql generator is paranoid by design — format/%L/%I are used uniformly, not as a one-off. The 99-test smoke suite (99_smoke_auto.sql) includes adversarial payloads ("; DROP TABLE …, embedded quotes, type-mismatched literals) and expects parse errors or rejection, not successful execution.

If you want to audit the PG side: every emitter is in 04_pvt_inner_condition.sql, 05_pvt_single_facet.sql, 13_pvt_condition.sql. Grep for format( — every value site has %L and a cast, every identifier site has %I or quote_ident.

Free side on MSSql (T-SQL) — the same three layers, different primitives

SQL Server has no format(..., %L/%I) and no quote_literal/quote_ident shorthand, so the MSSql builder under redb.MSSql/sql/v2-pvt/ implements each layer with the T-SQL primitives that are available. Same three gates, just spelled differently.

1. Values are quoted with REPLACE(..., '''', '''''') and/or validated with TRY_CAST before they get inlined.

For strings (including dict keys and $expr literals) the builder wraps the value in N'…' after doubling embedded single quotes — the SQL Server equivalent of quote_literal. From the actual pivot builder (pvt_bundle.sql, the dict-key branch):

-- Real emitter for dict-key match in the CTE
-- (the dict key goes into _values._array_index — same slot used for array indices)
SET @dict_key_literal =
    N'N''' + REPLACE(@dict_key, N'''', N'''''') + N'''';
-- Then concatenated into:
--   ... AND v.[_array_index] = N'HQ' THEN v.[_String] END) AS [...]

If @dict_key = "HQ'; DROP TABLE _values; --", the REPLACE doubles the embedded quote to '', and the emitted SQL becomes:

v.[_array_index] = N'HQ''; DROP TABLE _values; --'

The whole payload stays a single string literal. The tail of the payload never escapes the quotes, so it never gets executed.

For numerics the builder doesn't even concatenate raw user text — it first parses it with TRY_CAST and only inlines the parsed value if the parse succeeded. From 06_pvt_hierarchical.sql:

DECLARE @ha_id BIGINT = TRY_CAST(@ha_raw AS BIGINT);
IF @ha_id IS NOT NULL
    SET @result += N' AND dbo.pvt_is_descendant_of(' + @alias + N'.[_id], '
                 + CAST(@ha_id AS NVARCHAR(20)) + N') = 1';
-- If @ha_id is NULL the clause is silently dropped — bad input never
-- becomes SQL text. The only thing that ever gets inlined is a real BIGINT.

TRY_CAST returns NULL rather than raising on bad input, and the builder simply omits the clause in that case — so "1; DROP TABLE _objects --" never reaches the SQL text. The behaviour is fail-closed (no clause emitted) rather than fail-loud (no exception thrown); the predicate effectively becomes a no-op. Same pattern for WhereLevel (TRY_CAST AS INT), WhereHasDescendant, WhereContainsObject, and the other base-id predicates. For dates it's TRY_CONVERT(datetimeoffset, …), for GUIDs it's TRY_CONVERT(uniqueidentifier, …) — quote-then-typed-parse, drop-on-fail.

2. Identifiers come from _structures via QUOTENAME(), never from raw JSON.

Every pivot column alias, every projection alias, every table/column name in the dynamic SQL goes through QUOTENAME(). From the pivot CTE builder:

DECLARE @alias NVARCHAR(420) = QUOTENAME(@field_name);  -- @field_name comes from _structures
-- ...
RETURN N'o.' + QUOTENAME(@col) + N' AS ' + @alias;

QUOTENAME('weird]name') returns [weird]]name] — embedded ] is doubled, the result is always a single safely-bracketed identifier. As on the PG side, the source of @field_name is a _structures._name row matched by (_id_scheme = @scheme AND _name = @json_key) — if the JSON refers to a field the queried scheme doesn't have, the lookup returns nothing and the field is rejected with a RAISERROR before any SQL text is built.

For base-fields (0$:id, 0$:parent_id, 0$:scheme_id, …) there's a fixed CASE; unknown bases RAISERROR with the same "Unknown RedbObject base field" message as the PG side.

3. Operator names are a fixed CASE/IF whitelist with RAISERROR on miss.

Same logic as PG: $eq, $gt, $lt, $in, $like, $arrayContains, $expr, etc. are matched by a CASE tree. Anything not on the list raises:

RAISERROR(N'pvt: unsupported operator "%s"', 16, 1, @op);

Sql.Function calls go through the same whitelist — COALESCE, LOWER, UPPER, LEN, LEFT, RIGHT, ISNULL, etc. Unknown function names raise; the user can't smuggle xp_cmdshell through Sql.Function<int>("xp_cmdshell", "...") because that name isn't in the CASE.

4. The dynamic SQL is built only from the three pieces above and executed with sp_executesql.

Final execution is EXEC sp_executesql @sql (with parameterized @scheme_id, @limit, @offset etc. as sp_executesql arguments, not concatenated). Every byte of @sql came from either:

a REPLACE-quoted string literal in N'…', or
a TRY_CAST-validated typed value inlined as its parsed form, or
a QUOTENAME-bracketed identifier sourced from _structures by id lookup, or
a hardcoded operator template chosen by a whitelist CASE.

The 99_smoke_auto.sql regression suite on MSSql includes the same adversarial payloads as the PG suite — '; DROP TABLE …, embedded quotes, ] in field names, type-mismatched literals, unknown operators — and expects each one to be either rejected with RAISERROR (operators / unknown bases / unknown identifiers), silently dropped from the predicate (typed-value parse failures), or come out the other side as an inert quoted literal that does nothing (string values).

Summary

Layer	PG Free emits via	MSSql Free emits via
String values	`format('… %L', val)`	`N'' + REPLACE(val, '''', '''''') + N''`
Numeric / date / GUID values	`%L::bigint` / `%L::numeric` / `%L::timestamptz` / `%L::uuid` (parse-or-die at runtime)	`TRY_CAST(... AS BIGINT)` / `TRY_CONVERT(datetimeoffset, ...)` / `TRY_CONVERT(uniqueidentifier, ...)` (parse-or-drop-clause before inline)
Identifiers	`%I` / `quote_ident(name_from__structures)`	`QUOTENAME(name_from__structures)`
Operators / function names	CASE whitelist → `RAISE EXCEPTION`	CASE whitelist → `RAISERROR`
Final execution	`EXECUTE` of the assembled string	`EXEC sp_executesql` of the assembled string

Same three independent gates on both databases. Any one of them is sufficient to defeat injection; the suite tests all three.

If you find a leak in either dialect, file an issue with the JSON payload that triggers it. I'll treat it as a security bug, not a feature request.

The polyglot facet API — Free's one structural advantage

A side-effect of Free's SQL-side builder: it's accessible from any language. Pro compiles in C#, which is its strength but also its limitation — you need a .NET process. Free compiles in SQL, so the same builder C# Free clients hit is callable from any driver that can run SQL.

The shipped surface is two functions, not one:

Function	Input	Output
`pvt_build_query_sql(scheme_id, facets jsonb, limit, offset, order jsonb, max_depth, …)`	facets	the assembled `SELECT _id FROM …` as `text`
`get_objects_json(ids bigint[], max_depth integer DEFAULT 10)`	matched ids	table of `(Id bigint, JsonData text)` with fully hydrated objects

The honest flow is therefore two SQL roundtrips: build → EXECUTE the returned text → hydrate. The 10-line plpgsql wrapper below collapses that into a single call you can ship into your DB once and then call from any language:

-- one-time setup: paste into your DB as a wrapper of the two shipped functions
CREATE OR REPLACE FUNCTION redb_run_facets(
    p_scheme_id bigint,
    p_facets    jsonb   DEFAULT NULL,
    p_limit     integer DEFAULT NULL,
    p_offset    integer DEFAULT 0,
    p_order     jsonb   DEFAULT NULL,
    p_max_depth integer DEFAULT 10
) RETURNS TABLE("Id" bigint, "JsonData" text)
LANGUAGE plpgsql STABLE AS $$
DECLARE v_sql text; v_ids bigint[];
BEGIN
    v_sql := pvt_build_query_sql(p_scheme_id, p_facets, p_limit, p_offset, p_order, p_max_depth);
    EXECUTE 'SELECT array_agg(_id) FROM (' || v_sql || ') t' INTO v_ids;
    IF v_ids IS NULL THEN RETURN; END IF;
    RETURN QUERY SELECT g."Id", g."JsonData" FROM get_objects_json(v_ids, p_max_depth) g;
END $$;

With that wrapper in place:

# Python — psycopg
import psycopg, json

facets = {
    "$and": [
        {"Age":  {"$gt": 30}},
        {"City": {"$eq": "London"}},
    ]
}

with psycopg.connect(DSN) as conn, conn.cursor() as cur:
    cur.execute(
        'SELECT "JsonData" FROM redb_run_facets(%s, %s::jsonb, %s, %s)',
        (scheme_id, json.dumps(facets), 100, 0),    # limit, offset
    )
    employees = [row[0] for row in cur.fetchall()]
    # each row is JSON text — nested objects, arrays, dictionaries,
    # ListItem references — all fully reconstructed.

// Node — pg
const facets = { $and: [{ Age: { $gt: 30 } }, { City: { $eq: 'London' } }] };
const { rows } = await client.query(
  'SELECT "JsonData" FROM redb_run_facets($1, $2::jsonb, $3, $4)',
  [schemeId, JSON.stringify(facets), 100, 0],
);
const employees = rows.map(r => JSON.parse(r.JsonData));

// Go — pgx
facets := []byte(`{"$and":[{"Age":{"$gt":30}},{"City":{"$eq":"London"}}]}`)
rows, err := conn.Query(ctx,
    `SELECT "JsonData" FROM redb_run_facets($1, $2::jsonb, $3, $4)`,
    schemeID, facets, 100, 0,
)
// scan each row into a json.RawMessage

// Java — JDBC
String facets = "{\"$and\":[{\"Age\":{\"$gt\":30}},{\"City\":{\"$eq\":\"London\"}}]}";
try (PreparedStatement ps = conn.prepareStatement(
        "SELECT \"JsonData\" FROM redb_run_facets(?, ?::jsonb, ?, ?)")) {
    ps.setLong(1, schemeId);
    ps.setString(2, facets);
    ps.setInt(3, 100);
    ps.setInt(4, 0);
    try (ResultSet rs = ps.executeQuery()) {
        while (rs.next()) {
            String hydratedJson = rs.getString(1);
        }
    }
}

-- raw psql — same engine, no driver at all
SELECT "JsonData"
FROM redb_run_facets(
    42::bigint,
    '{"$and":[{"Age":{"$gt":30}},{"City":{"$eq":"London"}}]}'::jsonb,
    100, 0
);

If you'd rather skip the wrapper and call the shipped functions directly, the two-roundtrip pattern is:

-- 1. build the query text
SELECT pvt_build_query_sql(42, '{"$and":[…]}'::jsonb, 100, 0) AS sql;
-- 2. EXECUTE the returned text on the client → collect _id values
-- 3. hydrate
SELECT * FROM get_objects_json(ARRAY[101,102,103]::bigint[], 10);

The JSON facet schema is a stable contract — see FREE-OVER-PRO.md for the full operator list. Tuple-key dictionaries, nested RedbObject<T> references, RedbListItem references with .Id/.Value/.Alias — all reconstruct correctly in the JSON get_objects_json emits. No schema knowledge needed on the client side.

And the injection story above applies the same way here — your psycopg/pgx/JDBC driver binds the JSON as a jsonb parameter, so even your facet payload doesn't touch a SQL parser until it's already inside the parameterized builder.

This is Free-only by design. Pro's edge has always been the CLR-side compiled pipeline plus parallel materialization (ProLazyPropsLoader does 2 bulk _values SELECTs, indexes them into ILookup, and ProPropsMaterializer runs Parallel.ForEach to assemble 20+ value types, nested objects, arrays, dictionaries, ListItem refs into typed RedbObject<T> — all CPU-bound, zero further DB access). That stays where it is.

The honest matrix

Capability	Free 3.0.0	Pro 3.0.0
Query shape (PVT CTE, base pushdown, MAX/CASE ↔ array_agg)	Same	Same
`GroupBy + HAVING`	✓	✓
`ArrayGroupBy` (unnest / OUTER APPLY)	✓	✓
`DistinctBy`, `Sql.Function` (whitelist), `$expr` trees	✓	✓
Tree queries (`WhereHasAncestor`, `WhereLevel`, etc.)	✓	✓
Window functions (`Lag`/`Lead`/`Rank`/`DenseRank`)	✓	✓
Polyglot facet API (Python / Node / Go / Java / raw SQL)	✓	—
Compiled in C# (no JSON intermediate)	—	✓
Parallel CLR materialization (`Parallel.ForEach`)	—	✓
Bulk save via `COPY` protocol	—	✓
Hash-based `ChangeTracking` (diff only what changed)	—	✓
Combined `GroupBy` + window + `AggregateBatch`	✓	✓

Two things to be honest about:

Free has higher latency on full-object load than Pro — get_object_json builds JSON in plpgsql and the client deserializes. Pro's bulk-+-parallel materialization is a real win when you're loading thousands of complex objects with 20+ nested fields each.
Pro still has features Free won't get — parallel bulk materialization (ProLazyPropsLoader + ProPropsMaterializer) and hash-based ChangeTracking remain Pro-only. The query shape itself is now identical.

But the query shape — the part that determines whether the database uses your indexes properly, whether the plan caches, whether 30 props in a WHERE becomes 30 EXISTS or one pivot — that's the same in both tiers now.

Migration notes

If you're on 2.0.x Free, upgrading to 3.0.0:

Update packages — RedBase.Core, RedBase.Postgres / RedBase.MSSql to 3.0.0.
First startup applies the v2-pvt bundle automatically. The new module is additive at the C# API level — your existing LINQ-style queries just compile through the new path. Note: the legacy plpgsql function search_objects_with_facets() (used only by pre-2.0 raw-SQL callers) is no longer installed by redb_init.sql — it lives under redb.Postgres/sql/deprecated/ if you still need it; paste it manually before upgrading if you had direct SQL callers.
Pre-existing config flag is now honored. DefaultStrictDeleteExtra in built-in presets Development, HighPerformance, Migration was set to false but silently ignored before 3.0.0. It now actually works — meaning those presets will no longer auto-delete _structures rows for fields removed from your Props class on startup. If you were relying on the old (silently-broken) behavior, set c.DefaultStrictDeleteExtra = true explicitly.
MSSql Free users: the wide-CASE inline pivot is gone. If you had query-text-matching log scrapers, the new shape is WITH pvt_cte AS (... MAX(CASE WHEN _id_structure = @p1 AND _array_index IS NULL THEN _X END) ...) SELECT o.* FROM _objects o JOIN pvt_cte pvt ON ....

No source-level breaking API changes. No data migrations. If you've never touched the SQL surface directly, you can upgrade and not notice anything except your slow queries getting faster.

Known gaps I'd like help finding

Nested-field cross-scheme JOIN — Where(o => o.CurrentProject.Props.Name == "X") where CurrentProject is a RedbObject<OtherScheme> reference. Free PVT rejects the path; Pro SchemeFieldResolver throws. Needs new infrastructure on both sides. Tracked in FREE-OVER-PRO.md §2 #6b.
MSSql Free is fresh. Two weeks of intensive testing, not two years. EXPLAIN traces and failing repros are very welcome.
Polyglot facet API documentation is thinner than the C# API. The operator surface is stable (it's what the C# parser emits), but it's not yet a documented wire-protocol RFC.

There will be follow-up fixes after this post lands — that's expected. If you hit something, please open a thread in GitHub Discussions rather than sitting on it.

Try it

# Free
dotnet add package RedBase.Core      --version 3.0.0
dotnet add package RedBase.Postgres  --version 3.0.0     # or RedBase.MSSql

# Pro (commercial)
dotnet add package RedBase.Core.Pro       --version 3.0.0
dotnet add package RedBase.Postgres.Pro   --version 3.0.0   # or RedBase.MSSql.Pro

Repo: https://github.com/redbase-app/redb
Full 3.0.0 changelog: CHANGELOG.md
Architecture deep-dive (with the same LINQ→SQL examples): https://redbase.app/architecture
Issues / bug reports: https://github.com/redbase-app/redb/issues — please cite the SQL and the dialect.

If you read all the way to here, thank you. Tell me what's wrong with it.

DEV Community