Auto Estimate PostgreSQL Table and Index Size from EF Core Models

#dotnet #efcore #postgres #devops

TL;DR: EfCore.StorageEstimator combines EF Core model metadata with explicit workload assumptions so you can estimate PostgreSQL heap, index, and graph growth before production data exists.

If you use EF Core with PostgreSQL, you can usually answer schema questions like these:

What tables will exist?
Which columns are nullable?
Which indexes are we creating?
Are we using owned types, many-to-many, complex properties, or inheritance?

What you usually cannot answer quickly is this: how much storage will this model consume once real tenants, projects, assets, or events start piling up?

That is the gap EfCore.StorageEstimator is trying to close. It takes the parts EF Core already knows from the compiled model and combines them with the parts only you know: expected row counts, field fill rates, average payload lengths, and navigation multiplicities.

The tool estimates PostgreSQL table size, index size, and how row counts grow across related entities. Here is the kind of report it produces from a real EF Core model:

Path	Entity	Rows	Heap Bytes	Index Bytes	Total Bytes	Table	Properties	Indexes
SampleProject	SampleProject	120	81920	49152	131072	sample_projects	9	2
SampleProject.Assets	SampleAsset	1440	11796480	204800	12001280	sample_assets	7	1
SampleProject.Tags	SampleTag	480	40960	32768	73728	sample_tags	2	1

Total Estimated Rows: 2040
Total Estimated Bytes: 12206080

Why this exists

PostgreSQL storage is not intuitive to estimate by hand. Tables and indexes are stored in fixed-size pages, commonly 8 KB, with per-page headers, per-row headers, alignment rules, null bitmaps, and index entry overhead. Large values can also be pushed into TOAST storage. The official PostgreSQL docs explain the physical model, but that still leaves you doing a lot of manual math for real schemas.

EF Core already has the schema part. Its metadata model describes how your entity types map to the database. If you already use Npgsql, that model also knows PostgreSQL-specific store types like jsonb, bytea, arrays, and numeric precision.

Most teams still do capacity planning in a spreadsheet, a Notion page, or a rough architecture note. That usually drifts away from the actual model. Indexes get missed. Owned types get forgotten. Many-to-many join tables are hand-waved. Row counts are discussed separately from the schema that creates the storage bill.

EfCore.StorageEstimator puts those two sides together.

A minimal example

Start by annotating the parts of the model that EF Core cannot infer from the schema alone.

using EfCore.StorageEstimator.Planning;

[StorageEntity(12)]
public sealed class Project
{
  [StorageField(AverageLength = 64)]
  public string Name { get; set; } = string.Empty;

  [StorageNavigation(8)]
  public IReadOnlyList<TaskItem> Tasks { get; } = [];
}

[StorageEntity(1)]
public sealed class TaskItem
{
  [StorageField(0.5d, AverageLength = 120)]
  public string? Notes { get; set; }
}

Then run the estimate from a dedicated analysis process:

using EfCore.StorageEstimator;
using EfCore.StorageEstimator.Estimation;
using EfCore.StorageEstimator.Rendering;
using Microsoft.Extensions.DependencyInjection;

var services = new ServiceCollection()
  .AddStorageEstimator()
  .BuildServiceProvider();

var estimator = services.GetRequiredService<IStorageEstimator>();

var report = estimator.Estimate(new StorageEstimateRequest
{
  Model = dbContext.Model,
  Roots =
  [
    new StorageTraversalRoot(typeof(Project)),
    new StorageTraversalRoot(typeof(Project))
    {
      EntityCountOverride = 5_000,
      Label = "Large Tenant"
    }
  ]
});

var markdown = new MarkdownReportRenderer().Render(report);
Console.WriteLine(markdown);

A few details matter here:

Model = dbContext.Model tells the estimator to read the real EF Core model.
EntityCountOverride lets you compare scenarios without editing attributes.
Label gives the root a friendlier name in the output.

What the report looks like

The sample app in the repository renders a Markdown report like this:

Path	Entity	Rows	Heap Bytes	Index Bytes	Total Bytes	Table	Properties	Indexes
SampleProject	SampleProject	120	81920	49152	131072	sample_projects	9	2
SampleProject.Assets	SampleAsset	1440	11796480	204800	12001280	sample_assets	7	1
SampleProject.Tags	SampleTag	480	40960	32768	73728	sample_tags	2	1

Total Estimated Rows: 2040
Total Estimated Heap Bytes: 11919360
Total Estimated Index Bytes: 286720
Total Estimated Bytes: 12206080

That report shape is useful because it answers several questions at once:

which path in the graph is producing the most rows
which table is consuming most of the heap
how much of the bill comes from indexes
how much of the estimate is grounded in real schema metadata

In this sample, the root entity is small, but the Assets branch dominates total storage once multiplicity is applied. That is the kind of result that is easy to miss in a spreadsheet and obvious in a traversal-based report.

What the packages do

The repository currently exposes three main surfaces:

Package	Purpose
`EfCore.StorageEstimator.Contracts`	Planning attributes and DTOs
`EfCore.StorageEstimator`	Runtime estimator, EF Core schema reader, PostgreSQL sizing math, Markdown and JSON renderers
`EfCore.StorageEstimator.Analyzers`	Roslyn diagnostics for invalid planning metadata

It is a library-first toolchain. You run it from a dedicated console app, internal tool, test harness, or analysis host. The estimator reads dbContext.Model, it does not need a live database connection to calculate an estimate.

Warnings and analyzers

The estimator emits warnings when traversal stops at an unannotated branch or when storage sizing falls back to default assumptions.

The analyzer package catches invalid metadata earlier in the build:

Rule	Meaning
`EFSA001`	`[StorageNavigation]` targets a type without `[StorageEntity]`
`EFSA002`	`[StorageEntity]` row count must be greater than zero
`EFSA003`	`[StorageField]` fill rate must be between `0` and `1`
`EFSA004`	`[StorageField(AverageLength = ...)]` must be zero or greater
`EFSA005`	`[StorageNavigation]` multiplicity must be greater than zero

That combination matters because the estimate stays explicit. Schema facts come from EF Core, workload assumptions live in code, and missing planning data shows up as warnings instead of disappearing into hand-wavy math.

Summary

If you already use EF Core and PostgreSQL, you already own half the information needed for capacity planning. EF Core knows the shape of the schema. Your team knows the expected scale. EfCore.StorageEstimator combines those two inputs into a report you can run early, review often, and tighten over time.

It gives you a concrete estimate before production data exists, using the model you already have.

Learn more: