.NET Learning Notes: Custom In-Memory Provider(4) - ReadPath - From IQueryable to Result Execution

#csharp #database #dotnet #learning

The read path in this provider follows EF Core’s standard query pipeline: expression construction, translation, compilation, and execution. The provider does not implement individual LINQ operators as standalone execution logic. Instead, it integrates into EF Core’s existing pipeline and participates at the appropriate extension points.

When a query starts, no execution occurs. Methods such as Where, Select, OrderBy, ThenBy, Skip, and Take are non-terminal operations. They only build or transform expression trees. Each call composes additional structure onto the existing query expression. At this stage, nothing is executed and no entities are materialized.

Translation

The provider integrates during the translation phase through IQueryableMethodTranslatingExpressionVisitor. EF Core parses the LINQ method calls and invokes this visitor to produce a ShapedQueryExpression. In this implementation, the provider does not translate LINQ into a custom DSL or separate query representation. Instead, it constructs a ShapedQueryExpression that carries two things: the underlying row-level query expression and the associated shaper expression. The structure of the query records each translated function's parameters.

It is important to understand that translation is not initiated by the provider. EF Core drives the entire process. LINQ method calls are first parsed into method-call expressions, and EF Core then invokes the appropriate TranslateXXX methods on the provider’s visitor. The provider does not “scan” or interpret LINQ directly — it simply responds to EF Core’s translation callbacks and records the query semantics accordingly.

QueryRoot and ShapedQuery

A QueryRoot represents the starting point of a query. When application code writes:

context.Set<TestEntity>()

EF Core does not interpret this as “give me a collection of TestEntity”. Instead, it treats it as a declarative query origin: “This query starts from the entity type TestEntity.” Internally, EF Core represents this as a query root expression associated with the entity type. At this stage: No data is accessed, No enumeration occurs, No filtering or projection happens. A QueryRoot is purely symbolic. It describes what is being queried, not how to retrieve it. Every EF Core query—no matter how simple—must begin with a QueryRoot.

A Shaped Query is the executable form of a query. It represents the result of translating a QueryRoot (and any LINQ operators) into something EF Core can actually run. A shaped query combines two essential components:

Query Expression: Describes how to retrieve raw data (rows, objects, records) from the data source.
Shaper Expression: Describes how to transform each retrieved item into the final result type returned to the user. Together, they define: “How the query executes” and “What shape the results take.”

This is why EF Core uses the term “shaped”—the query not only fetches data, but also defines how that data is molded into entities, projections, or scalar values.

/// <summary>
/// Entry point from QueryRoot to ShapedQuery.
/// For a full table scan:
/// QueryExpression = CustomMemoryQueryExpression(entityType)
/// ShaperExpression = CustomMemoryEntityShaperExpression(entityType)
/// </summary>
protected override ShapedQueryExpression? CreateShapedQueryExpression(IEntityType entityType)  
{  
    if (entityType == null) throw new ArgumentNullException(nameof(entityType));  
    var queryExpression = new CustomMemoryQueryExpression(entityType);  
    var shaperExpression = new CustomMemoryEntityShaperExpression(entityType);  
    return new ShapedQueryExpression(queryExpression, shaperExpression);  
}

Recording Query Semantics

In this provider, the translation stage is intentionally “record-only”. EF Core parses a LINQ query into method-call expressions and invokes CustomMemoryQueryableMethodTranslatingExpressionVisitor to translate each operator, but the provider does not execute anything at this point. Instead, it stores the translated intent inside CustomMemoryQueryExpression, and passes that forward to the compilation stage.

CustomMemoryQueryExpression acts as a lightweight query model owned by the provider. It does not try to become a DSL, and it does not attempt to replace EF Core’s pipeline. Its job is simpler: capture the entity type being queried and record every non-terminal operator in the exact order it appears in the original LINQ chain. The ordering requirement is the key reason this type exists. Operators like OrderBy, Skip, and Take are not commutative, and their meaning depends on sequencing. OrderBy().Skip(10).Take(5) is a different query from Skip(10).Take(5).OrderBy(). If the provider only stored “there is a skip and a take somewhere” without preserving their positions, it would not be able to reproduce correct behavior during execution.

This is why CustomMemoryQueryExpression keeps Steps as an ordered list rather than a set of flags. Each translation method appends a new step and returns a new CustomMemoryQueryExpression instance, so the expression remains immutable and composable while still preserving the original operator sequence.

For example, when EF Core encounters a Where(...) call, visitor handles it by reading the existing CustomMemoryQueryExpression from the shaped query, appending a WhereStep(predicate), and returning a new ShapedQueryExpression that carries the updated query expression:

protected override ShapedQueryExpression? TranslateWhere(
    ShapedQueryExpression source, 
    LambdaExpression predicate)
{
    if (source.QueryExpression is not CustomMemoryQueryExpression q)
    {
        return null;
    }

    var q2 = q.AddStep(new WhereStep(predicate));

    return new ShapedQueryExpression(q2, source.ShaperExpression);
}

A concrete example makes the “record in order” design easier to see. Suppose the user writes:

var query =
    context.Set<Product>()
        .Where(p => p.Price > 1000)
        .OrderBy(p => p.Id)
        .Skip(10)
        .Take(5);

protected override ShapedQueryExpression? TranslateWhere(ShapedQueryExpression source, LambdaExpression predicate)  
{  
    if (source.QueryExpression is not CustomMemoryQueryExpression q)  
    {        return null;  
    }  
    var q2 = q.AddStep(new WhereStep(predicate));  

    return new ShapedQueryExpression(q2, source.ShaperExpression);  
}

protected override ShapedQueryExpression? TranslateOrderBy(ShapedQueryExpression source,  
    LambdaExpression keySelector, bool ascending)  
{  
    if (source.QueryExpression is not CustomMemoryQueryExpression q)  
        return null;  
    var kind = ascending ? CustomMemoryQueryStepKind.OrderBy : CustomMemoryQueryStepKind.OrderByDescending;  
    // OrderBy: descending = !ascending, thenBy = false  
    var q2 = q.AddStep(new OrderStep(kind, keySelector));  

    return new ShapedQueryExpression(q2, source.ShaperExpression);  
}

protected override ShapedQueryExpression? TranslateSkip(ShapedQueryExpression source, Expression count)  
{  
    if (source.QueryExpression is not CustomMemoryQueryExpression q)  
        return null;  

    // record skip  
    var q2 = q.AddStep(new SkipStep(count));  

    return new ShapedQueryExpression(q2, source.ShaperExpression);  
}

protected override ShapedQueryExpression? TranslateTake(ShapedQueryExpression source, Expression count)  
{  
    if (source.QueryExpression is not CustomMemoryQueryExpression q)  
        return null;  

    var q2 = q.AddStep(new TakeStep(count));  

    return new ShapedQueryExpression(q2, source.ShaperExpression);  
}

During translation, this does not run a scan, does not apply filters, and does not materialize entities. Instead, the visitor transforms the query into a single CustomMemoryQueryExpression whose Steps sequence matches the LINQ chain: first a WhereStep(Price > 1000), then an OrderByStep(Id), then a SkipStep(10), then a TakeStep(5). That ordered list is the only thing your compilation stage needs in order to replay the semantics correctly over the underlying row source. In other words, translation is where the provider captures “what the query means”, and compilation is where it later decides “how to execute it”.

This also explains why your translation methods return new ShapedQueryExpression(q2, source.ShaperExpression) rather than producing results. The shaped query is the carrier EF Core expects at the boundary between translation and compilation. Your provider simply plugs its recorded query model (CustomMemoryQueryExpression) into that carrier, so the next phase can compile it into executable logic without losing any of the original LINQ intent.

In short, translation is not a provider-controlled stage but an EF Core-driven callback sequence. The provider participates by responding to translation hooks and capturing semantics, rather than independently parsing or executing LINQ.

Compilation Stage and Identity Resolution Integration

The compilation stage transforms the recorded query semantics into an executable expression tree. At this point, translation has already produced a ShapedQueryExpression containing a CustomMemoryQueryExpression. That object records the entity type, the ordered non-terminal steps, and any terminal operator. No execution has occurred yet.

A ShapedQueryExpression consists of two parts: a query expression and a shaper expression. The query expression describes how raw data should be retrieved. The shaper expression describes how each element should be shaped into the expected result type. Translation constructs both pieces, but neither is executed during that phase.

When a terminal operation such as ToList, First, Single, Count, or Any is invoked, EF Core enters the compilation phase and calls the provider’s IShapedQueryCompilingExpressionVisitor. The responsibility of this phase is to construct an expression tree that EF Core will later compile into a delegate. Execution only happens after that delegate is produced.

Compilation begins by constructing the row-level source:

IMemoryTable<TEntity>.QueryRows

This yields an IQueryable<SnapshotRow>, representing the storage-level rows. No entity instances exist at this stage.

The provider then builds a projection that converts each SnapshotRow into a tracked entity instance. This is done by inserting a Queryable.Select call whose selector invokes a helper method such as:

TrackFromRow<TEntity>(...)

The resulting pipeline conceptually resembles:

QueryRows
    .Select(row => TrackFromRow<TEntity>(...))
    .Where(...)
    .OrderBy(...)
    ...

The non-terminal steps recorded during translation are replayed in their original order. Each recorded step is translated into the corresponding Queryable method call expression and appended to the expression tree. For example, a recorded WhereStep becomes a Queryable.Where call over the current source expression. Terminal operators are appended as the final node, shaping the overall result form.

The final result of compilation is a complete expression tree describing the entire query. EF Core compiles this tree into a delegate and executes it.

Reusing EF Core Identity Resolution and Fix-Up

Identity management is not implemented by the provider. Instead, the provider integrates with EF Core’s tracking infrastructure at materialization time.

Inside TrackFromRow<TEntity>, the provider first ensures that the QueryContext has an initialized state manager. It then attempts identity resolution using the entity’s primary key:

var entry = qc.TryGetEntry(
    primaryKey,
    row.Key,
    throwOnNullKey: false,
    out var hasNullKey);

If a tracked instance already exists for that key, it is returned immediately. This guarantees that within a single DbContext, only one object instance exists per primary key.

If no tracked entry is found, the provider materializes a new instance using EF Core’s IEntityMaterializerSource. The snapshot is converted into a ValueBuffer, and EF Core’s materializer is invoked to construct the entity. After materialization, the provider calls:

qc.StartTracking(entityType, instance, originalValues);

This inserts the entity into EF Core’s identity map. From that point forward, identity resolution, navigation fix-up, and relationship consistency are handled entirely by EF Core.

The provider does not maintain its own identity map and does not implement fix-up logic. It supplies stable row identity and defers object identity semantics to EF Core. Because tracking is injected at materialization time, the provider remains fully compatible with change tracking and relationship fix-up without reimplementing those mechanisms.

Architectural Separation

The responsibilities remain clearly separated.

The storage layer provides SnapshotRow and guarantees stable row identity.
The compilation layer constructs the executable query pipeline.
The tracking layer, including identity resolution and fix-up, is fully managed by EF Core.

By delegating object identity to EF Core and keeping the storage model snapshot-based, the provider preserves EF Core’s semantics while maintaining a clean separation between row storage and entity instantiation.