DEV Community: bertrand HARTWIG

Running PostgreSQL Correctly with Docker Compose

bertrand HARTWIG — Wed, 17 Jun 2026 15:01:31 +0000

This guide explains how to run a PostgreSQL instance with Docker Compose using a configuration that provides a solid baseline for a properly configured PostgreSQL deployment. (excluding backup and monitoring).

The goal is not only to start PostgreSQL, but to start it with sane defaults, persistent storage, proper health checks, query observability, and memory settings aligned with the resources allocated to the container.

The example below is generated by pgAssistant, which provides this kind of PostgreSQL Docker configuration for free.

Why this configuration matters

A PostgreSQL container should not be treated as a simple throwaway service when it stores application data. Several Docker Compose options have a direct impact on reliability, performance, security, and observability.

The most important points are:

restart: always
shm_size
persistent volumes
container CPU and memory limits
POSTGRES_INITDB_ARGS
a proper health check
pg_stat_statements
autovacuum=on
PostgreSQL tuning parameters generated from pgTune

Example `docker-compose.yml`

services:
  northwind-db:
    restart: always
    image: postgres:17-alpine
    shm_size: 1GB

    # Optional: Use tmpfs for shared memory when running in Swarm mode
    # volumes:
    #   - type: tmpfs
    #     target: /dev/shm
    #     tmpfs:
    #       size: 1073741824

    volumes:
      - northwind_data:/var/lib/postgresql/data

    ports:
      - "5432:5432"

    deploy:
      resources:
        limits:
          cpus: "6.0"
          memory: 4GB

    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=xxxxx
      - POSTGRES_DB=northwind
      - POSTGRES_INITDB_ARGS=--auth-local=scram-sha-256 --auth-host=scram-sha-256

    healthcheck:
      test: ["CMD", "pg_isready", "-U", "postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

    command: >
      postgres
        -c shared_preload_libraries='pg_stat_statements'
        -c autovacuum=on
        -c max_connections=200
        -c shared_buffers='1GB'
        -c effective_cache_size='3GB'
        -c maintenance_work_mem='256MB'
        -c checkpoint_completion_target=0.9
        -c wal_buffers='16MB'
        -c default_statistics_target=100
        -c random_page_cost=1.1
        -c effective_io_concurrency=200
        -c work_mem='5242kB'
        -c min_wal_size='1GB'
        -c max_wal_size='4GB'
        -c huge_pages='off'
        -c max_worker_processes=6
        -c max_parallel_workers=6
        -c max_parallel_maintenance_workers=3
        -c max_parallel_workers_per_gather=3

volumes:
  northwind_data:

restart: always

restart: always

This option tells Docker to automatically restart the PostgreSQL container if it stops unexpectedly.

For a database service, this is important because PostgreSQL is usually a critical dependency for the application. Without a restart policy, the database may remain stopped after a crash, host reboot, or Docker daemon restart.

restart: always does not replace monitoring, backups, replication, or high availability, but it provides a basic resilience layer that should almost always be enabled for a database container.

Shared memory and `shm_size`

shm_size: 1GB

PostgreSQL relies on shared memory for several internal operations. Docker containers have a default shared memory size that is often too small for a properly tuned PostgreSQL instance.

A good rule is to align shm_size with the PostgreSQL shared_buffers value.

For example:

shm_size: 512MB

should be consistent with:

-c shared_buffers='512MB'

In the example generated by pgAssistant, both values are set to 1GB:

shm_size: 1GB
-c shared_buffers='1GB'

This is important because shared_buffers defines how much memory PostgreSQL uses for its own shared buffer cache. If Docker shared memory is undersized compared to PostgreSQL configuration, PostgreSQL may fail to start or behave poorly under load.

Persistent volume

volumes:
  - northwind_data:/var/lib/postgresql/data

mount point is diffrent with postgresql 18 :

volumes:
  - northwind_data:/var/lib/postgresql

A PostgreSQL database must use persistent storage.

The directory /var/lib/postgresql/data or /var/lib/postgresql with v18 is where the official PostgreSQL Docker image stores the database cluster. If this directory is not backed by a Docker volume or another persistent storage mechanism, the data may be lost when the container is removed.

The named volume:

volumes:
  northwind_data:

ensures that database files survive container recreation.

This is one of the most important parts of the configuration.

Container resources

deploy:
  resources:
    limits:
      cpus: "6.0"
      memory: 4GB

The CPU and memory allocated to the container are not just Docker-level constraints. They are also the values that should be used to calculate PostgreSQL tuning parameters.

Tools such as pgTune need to know how many CPUs and how much memory are available to PostgreSQL. The values used in pgTune must match the resources allocated to the container.

For example, if the container is limited to:

cpus: "6.0"
memory: 4GB

then pgTune should be configured using:

6 CPUs
4 GB of RAM

Using host-level resources instead of container-level resources would produce an incorrect PostgreSQL configuration.

POSTGRES_INITDB_ARGS

POSTGRES_INITDB_ARGS=--auth-local=scram-sha-256 --auth-host=scram-sha-256

POSTGRES_INITDB_ARGS allows passing arguments to initdb when the PostgreSQL data directory is initialized for the first time.

In this example, it enables SCRAM-SHA-256 authentication for both local and host connections:

--auth-local=scram-sha-256
--auth-host=scram-sha-256

This is important because authentication settings are part of the initial database cluster setup.

Note that these options only apply when the PostgreSQL data directory is empty. If the persistent volume already contains an initialized database, changing POSTGRES_INITDB_ARGS will not reinitialize the database.

Health check

healthcheck:
  test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
  interval: 10s
  timeout: 5s
  retries: 5

A health check allows Docker to determine whether PostgreSQL is actually ready to accept connections.

This is different from simply checking whether the container process is running. PostgreSQL may be running but not ready yet, especially during startup, crash recovery, or initialization.

The command:

pg_isready -U postgres

checks the readiness of the PostgreSQL server.

A proper health check is important when other services depend on the database. It allows orchestration tools, deployment scripts, and dependent containers to wait until PostgreSQL is ready before connecting.

`pg_stat_statements`

-c shared_preload_libraries='pg_stat_statements'

pg_stat_statements is essential for PostgreSQL query optimization.

It tracks SQL execution statistics, including how often queries run, how long they take, and how much load they generate. This makes it possible to identify expensive queries that need indexes, rewriting, caching, or schema improvements.

Enabling it requires two steps.

First, it must be loaded at server startup:

-c shared_preload_libraries='pg_stat_statements'

Second, the extension must be created inside the target database:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

A common way to do this with Docker is to add an initialization SQL script under /docker-entrypoint-initdb.d/, for example:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Without pg_stat_statements, database optimization is mostly guesswork. With it, you can focus on the queries that actually consume time and resources.

Useful example query:

SELECT
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

`autovacuum=on`

-c autovacuum=on

Autovacuum should remain enabled.

PostgreSQL uses MVCC, which means updates and deletes leave behind dead tuples. Autovacuum cleans up these dead tuples and helps prevent table and index bloat.

Disabling autovacuum is dangerous for most applications. It can lead to degraded performance, excessive storage growth, transaction ID wraparound risks, and poor query plans due to outdated statistics.

In most cases, the right approach is not to disable autovacuum, but to tune it if the workload requires it.

pgTune-generated PostgreSQL parameters

The remaining PostgreSQL parameters passed to the postgres command are generated from pgTune:

-c max_connections=200
-c shared_buffers='1GB'
-c effective_cache_size='3GB'
-c maintenance_work_mem='256MB'
-c checkpoint_completion_target=0.9
-c wal_buffers='16MB'
-c default_statistics_target=100
-c random_page_cost=1.1
-c effective_io_concurrency=200
-c work_mem='5242kB'
-c min_wal_size='1GB'
-c max_wal_size='4GB'
-c huge_pages='off'
-c max_worker_processes=6
-c max_parallel_workers=6
-c max_parallel_maintenance_workers=3
-c max_parallel_workers_per_gather=3

These values should not be copied blindly from another machine.

They depend on:

allocated memory
available CPUs
storage type
expected workload
maximum number of connections
PostgreSQL version

pgTune provides a strong baseline configuration based on the resources allocated to the PostgreSQL container. It does not replace workload-specific tuning, but it gives a much better starting point than PostgreSQL defaults for many real-world deployments.

Important relationship between Docker resources and pgTune

The values used in pgTune must match the resources actually available to PostgreSQL.

For example, this Docker resource limit:

cpus: "6.0"
memory: 4GB

must be reflected in the pgTune input.

That is why pgAssistant generates both:

the Docker resource limits, and
the PostgreSQL runtime parameters derived from those limits.

This avoids a common mistake: tuning PostgreSQL for the host machine while the container is limited to fewer resources.

About pgAssistant

pgAssistant generates this kind of PostgreSQL Docker Compose configuration for free.

It helps create a more reliable and better-tuned PostgreSQL setup

This provides a solid minimum configuration for running PostgreSQL properly with Docker.

What’s New in pgAssistant Since Version 2.8

bertrand HARTWIG — Sun, 07 Jun 2026 05:48:09 +0000

What’s New in pgAssistant Since Version 2.8

Since version 2.8, pgAssistant has evolved significantly.

The initial goal was to introduce a Global Advisor capable of combining multiple PostgreSQL signals—schema design, indexes, maintenance statistics, configuration, and workload activity—to provide higher-level recommendations.

Several releases later, this experimental feature has become a much more mature expert system. At the same time, the Query Advisor, Index Advisor, ranking engine, collector integrations, and developer-facing maintenance views have also improved.

This article summarizes the main changes introduced between pgAssistant 2.8.0 and 2.9.2.

Global Advisor: from an experiment to a broader expert system

Version 2.8 introduced the first version of the Global Advisor.

Its purpose was to move beyond isolated query analysis and evaluate the database as a whole. Instead of looking only at a single execution plan, pgAssistant started correlating signals from:

database schemas;
foreign keys;
indexes;
table statistics;
vacuum and analyze activity;
configuration settings;
storage usage.

The first release was intentionally experimental, but the advisor quickly expanded.

By version 2.8.2, the Global Advisor included 14 different recommendations, together with:

a summary of the detected findings;
clearer recommendation grouping;
a button to display all generated SQL suggestions;
one-click copying of all suggested commands.

This made the advisor more useful as both a diagnostic tool and a review checklist.

Better foreign-key diagnostics

Foreign keys were one of the first areas improved after the initial Global Advisor release.

Data type mismatches

pgAssistant now detects foreign-key columns whose data types differ from the referenced columns.

The detection logic was refined to reduce false positives and provide clearer remediation guidance.

The recommendation also takes into account that changing a column type can:

rewrite the table;
rebuild dependent indexes;
acquire an ACCESS EXCLUSIVE lock;
require additional disk space;
require a maintenance window.

The suggested SQL includes both the type change and a subsequent ANALYZE.

Missing foreign-key indexes

The advisor also detects useful indexes missing from foreign-key columns.

The logic now considers the size of both the child and referenced tables, reducing noise for very small tables and prioritizing cases where the index is more likely to improve:

joins;
lookups;
parent-table UPDATE operations;
parent-table DELETE operations.

Stronger index diagnostics

Index analysis has received several important improvements.

Better index recommendations from execution plans

The Index Advisor can now identify a potentially better index even when an execution-plan node already uses an existing index.

Previously, the presence of an index scan could hide opportunities for a more selective or better-aligned index. The new logic evaluates whether the existing index is actually the best available access path.

Duplicate and redundant indexes

The Global Advisor now identifies:

strictly duplicate indexes;
unused indexes;
non-unique indexes fully covered by equivalent unique indexes;
tables with an unusually high index-to-table size ratio.

For strictly duplicate indexes, pgAssistant now keeps the most-used equivalent index rather than selecting one only by object identifier.

This is important because identical index definitions do not necessarily have identical operational value.

Safer unused-index detection

An index with idx_scan = 0 is not automatically useless.

The counter may have been reset recently, or the observed workload may not yet be representative.

The unused-index checks now expose:

the database statistics reset timestamp;
the observation age;
table scan activity;
table write activity.

A generic unused index is reported only when:

at least 24 hours of statistics are available; and
the table has experienced meaningful activity.

The current thresholds require at least:

100 table scans; or
1,000 row modifications.

For structurally redundant indexes, the statistics age remains informational rather than blocking the recommendation, because the redundancy can already be proven from PostgreSQL system catalogs.

Invalid index handling

The invalid-index recommendation has also been redesigned.

Instead of always suggesting a drop and manual recreation, pgAssistant now prefers:

REINDEX INDEX schema.index_name;

This preserves the index definition and is generally safer for indexes supporting constraints.

The advisor also recognizes common artifacts left by failed concurrent rebuilds:

_ccnew;
_ccold.

These cases receive specific guidance, because blindly rebuilding or dropping such indexes could leave duplicates or remove the wrong object.

PostgreSQL release and support checks

Version 2.9.1 introduced PostgreSQL release checks in the Global Advisor.

pgAssistant retrieves the official release information from:

https://www.postgresql.org/versions.json

It can now detect two different situations.

A newer minor release is available

For example:

PostgreSQL 17.6 is not the latest minor release available for branch 17. Upgrade to PostgreSQL 17.10, released on 2026-05-14, to benefit from the latest bug, security, reliability, and data-integrity fixes.

The recommendation includes:

the installed release;
the latest release in the same major branch;
the publication date of that release;
a recommendation to review intermediate release notes.

The PostgreSQL branch is no longer supported

The advisor also reports end-of-life versions, even if the final minor release is already installed.

For example:

PostgreSQL 13.23 is the latest minor release of the 13 branch, but this branch is no longer supported. Plan a major upgrade to a supported PostgreSQL version. End-of-life date: 2025-11-13.

This distinction matters because installing the latest minor release does not restore support for an obsolete major branch.

More accurate maintenance recommendations

Maintenance recommendations were refined to reduce unnecessary findings.

Tables never vacuumed

The previous implementation could report very small tables simply because no vacuum timestamp was available.

The rule now requires stronger evidence, combining:

table size;
an absolute number of dead tuples;
dead tuple percentage;
absence of recorded vacuum activity.

This prevents pgAssistant from recommending maintenance for insignificant tables containing only a few rows or dead tuples.

Stale statistics

The stale-statistics check considers:

the age of the last manual or automatic analyze;
the amount of data modified since the last analyze;
the table size;
the modification ratio.

The recommendation is therefore based on workload and table significance rather than only on a timestamp.

Vacuum recommendations

Starting with version 2.8.4, pgAssistant recommends ANALYZE or VACUUM only when the latest relevant maintenance operation is older than six days.

This reduces repetitive advice for tables that were maintained recently.

Autovacuum urgency

A dedicated recommendation calculates whether a table has exceeded its effective autovacuum threshold.

The calculation considers:

global autovacuum settings;
table-specific reloptions;
estimated row count;
dead tuple count;
configured maximum thresholds.

This gives a more meaningful signal than a simple dead tuple percentage.

Version-aware PostgreSQL configuration checks

Configuration recommendations now consider the installed PostgreSQL version.

This is necessary because default values have changed between major releases.

The advisor checks settings such as:

autovacuum;
track_counts;
track_activities;
log_checkpoints;
log_autovacuum_min_duration;
checkpoint_completion_target;
checkpoint_timeout.

For example, checkpoint_completion_target used a lower default in older PostgreSQL releases, while logging defaults also changed in more recent versions.

The recommendation text now explains whether a value was historically normal for that PostgreSQL version or differs from a modern default.

Sequence exhaustion detection

Version 2.8.3 introduced a recommendation for sequences approaching their maximum value.

pgAssistant calculates the percentage of the available sequence range already consumed and reports:

a medium-level warning above 75%;
a high-level warning above 90%.

This helps identify future insert failures before the sequence is exhausted.

Better workload prioritization

The query ranking algorithm was improved in version 2.8.4.

Sorting queries only by mean execution time or total execution time often produces misleading priorities.

A query executed once may be slow but have little overall impact, while a moderately slow query executed thousands of times can consume much more of the workload.

The updated ranking gives more weight to:

execution frequency;
total workload impact;
repeatability;
technical signals.

At the same time, it reduces the influence of low-impact one-off queries.

The result is a more practical answer to the question:

Which PostgreSQL queries should I optimize first?

Richer context for AI-assisted query analysis

The query-analysis prompt is now enriched with column statistics collected by pgAssistant.

Execution plans alone rarely provide enough context for accurate recommendations.

The additional statistics help the AI reason about:

cardinality;
null fractions;
common values;
data distributions;
selectivity;
correlations.

This reinforces pgAssistant’s hybrid approach:

deterministic expert-system rules for reliable technical diagnostics;
AI for explanation, synthesis, and contextual analysis.

Table Definition Helper redesign

The Table Definition Helper interface was redesigned to provide a clearer view of:

table size;
index footprint;
row estimates;
dead tuples;
estimated bloat;
schema information.

The new card-based interface includes:

summary indicators;
immediate search while typing;
client-side filters;
sorting;
pagination;
visual severity levels.

This makes it easier to navigate large databases without repeatedly submitting server-side searches.

New Table Health view

Version 2.9.2 introduces a new Table Health page in the DBA Corner.

Despite its location, the goal is not to turn pgAssistant into a general-purpose DBA console.

The purpose remains developer-oriented:

Help developers understand the state of a table before escalating the issue to a DBA.

For every schema and table, the view displays:

table size;
index size;
estimated row count;
dead tuple count and percentage;
tuples modified since the last analyze;
update activity;
HOT update percentage;
latest manual vacuum;
latest autovacuum;
latest manual analyze;
latest autoanalyze;
vacuum age;
analyze age;
database statistics reset timestamp.

Tables are classified using statuses such as:

HEALTHY;
ANALYZE_DUE;
NEVER_ANALYZED;
HIGH_DEAD_TUPLES.

Users can filter, search, sort, and inspect tables immediately.

They can also launch:

ANALYZE schema.table;

or:

VACUUM (ANALYZE) schema.table;

through the existing SQL execution endpoint, with a confirmation dialog showing the exact command and execution result.

Collector and Grafana integration

Version 2.9.0 added two API endpoints used by:

This allows pgAssistant findings to be collected over time and visualized across multiple PostgreSQL instances.

The integration makes it possible to build fleet-level views such as:

recommendation evolution;
database design issues;
maintenance risks;
corrected findings;
environment-level comparisons.

Broader schema coverage

Several Global Advisor rules were unintentionally limited to:

n.nspname = 'public'

This meant that databases using application-specific schemas could be only partially analyzed.

The rules now inspect all user schemas while excluding:

pg_catalog;
information_schema;
pg_toast;
temporary schemas;
temporary TOAST schemas.

This bug fix affects several important checks, including:

foreign-key type mismatches;
missing foreign-key indexes;
duplicate indexes;
redundant indexes;
unused indexes.

Additional fixes and compatibility improvements

Other changes since version 2.8 include:

support for the Qwen 3.6 model;
improved connection-form behavior when using a PostgreSQL connection URI;

These fixes improve both model compatibility and the initial connection experience.

A clearer direction for pgAssistant

The releases since version 2.8 reflect a broader direction for the project.

pgAssistant is not intended to replace PostgreSQL DBAs.

Instead, it aims to help developers:

understand PostgreSQL behavior;
identify the most important problems;
distinguish strong evidence from weak signals;
generate safer remediation commands;
collect the context required before involving a DBA.

The product increasingly combines two complementary approaches:

A deterministic expert system, based on PostgreSQL catalogs, statistics, configuration, and execution plans.
AI-assisted explanations, used to summarize findings and make complex PostgreSQL behavior easier to understand.

The goal is not to automate every database decision.

The goal is to make PostgreSQL diagnostics more transparent, more explainable, and more actionable.

PostgreSQL: Which Queries Should You Optimize First?

bertrand HARTWIG — Thu, 14 May 2026 08:08:52 +0000

When investigating PostgreSQL performance, the usual starting point is pg_stat_statements. From there, many teams sort queries by mean_exec_time or total_exec_time and start optimizing the first rows in the list.

That approach is simple, but it often leads to the wrong priorities.

A query that takes five seconds but runs twice a day is not necessarily more important than a query that takes five milliseconds and runs millions of times. Conversely, a query with a high total execution time may simply be a normal core workload query, not necessarily the best optimization target.

The real question is not:

Which query is the slowest?

It is:

Which query has the highest operational impact and the clearest optimization potential?

This is the principle behind the query-ranking algorithm implemented in pgAssistant.

Read the full post here

Indexing every WHERE column is not PostgreSQL optimization

bertrand HARTWIG — Mon, 11 May 2026 15:20:15 +0000

One PostgreSQL indexing mistake I see often:

“The query filters on A, B and C, so let’s create an index on A, B, C.”

That may work, but it may also be the wrong index.

For composite B-tree indexes, PostgreSQL cares about predicate type, column order, selectivity, table size, and the actual execution plan.

In this post, I explain why equality predicates usually belong before range predicates, why n_distinct from statistics matters, and why a theoretically good index is useless if the planner never uses it.

I also show how pgAssistant turns this into an automated index recommendation workflow using EXPLAIN ANALYZE and planner statistics.

Full write-up:
https://beh74.github.io/pgassistant-blog/post/query_advisor/

Designing the Right PostgreSQL Index Using Query Plans and Statistics

bertrand HARTWIG — Mon, 11 May 2026 06:04:15 +0000

PostgreSQL index design is often misunderstood.

Many developers think that creating a good index simply means:

“Create an index containing the columns from the WHERE clause.”

In reality, efficient index design is far more nuanced.

The order of columns inside a composite index matters enormously, and the best choice depends on:

Predicate types (=, >=, BETWEEN, LIKE, etc.)
Column selectivity
Table size
PostgreSQL planner statistics
Actual execution plans

This article explains the core principles behind efficient PostgreSQL index design before showing how pgAssistant automates this process using execution plans and database statistics.

Why Index Design Is Difficult

Consider the following query:

SELECT
    order_id,
    customer_id,
    employee_id,
    order_date,
    ship_country
FROM public.orders
WHERE customer_id = $1
  AND employee_id = $2
  AND order_date >= DATE $3;

At first glance, several index definitions may appear reasonable:

(customer_id, employee_id, order_date)

(order_date, customer_id, employee_id)

(employee_id, customer_id, order_date)

But these indexes do not behave the same way.

Choosing the correct ordering requires understanding how PostgreSQL traverses B-Tree indexes.

The Fundamental Rule of B-Tree Indexes

For PostgreSQL B-Tree indexes:

Equality predicates should come first
Range predicates should come last

This is the single most important rule in multi-column index design.

Equality Predicates Are Highly Selective

Predicates such as:

=
IN (...)
IS NULL

allow PostgreSQL to navigate directly to a very precise section of the index tree.

In our query:

customer_id = $1
employee_id = $2

are equality predicates.

If the index begins with these columns:

(customer_id, employee_id, ...)

PostgreSQL can rapidly narrow the search space.

Conceptually:

customer_id = exact branch
employee_id = exact sub-branch

The planner can jump almost directly to the matching rows.

Why Range Predicates Should Come Last

Now consider:

order_date >= DATE $3

This is a range predicate.

Once PostgreSQL enters a range scan inside a B-Tree index, the remaining columns become far less useful for navigation.

For example, with this index:

(order_date, customer_id, employee_id)

the planner must first scan all matching dates:

order_date >= DATE $3

which may represent a very large portion of the table.

Only afterward can additional filtering occur.

This usually produces significantly more index scanning.

That is why range predicates are generally placed at the end of composite indexes.

Column Order Among Equality Predicates

Once equality predicates are identified, the next challenge is:

Which equality column should come first?

The answer depends on selectivity.

PostgreSQL exposes this information through planner statistics.

PostgreSQL Statistics Drive Good Index Design

For our query, PostgreSQL statistics are:

order_date [>=]:
    n_distinct=814
    null_frac=0.0000
    mcv_count=100
    histogram_bounds=101

customer_id [=]:
    n_distinct=89
    null_frac=0.0000
    mcv_count=89
    histogram_bounds=0

employee_id [=]:
    n_distinct=9
    null_frac=0.0000
    mcv_count=9
    histogram_bounds=0

The most important metric here is:

n_distinct

Understanding `n_distinct`

n_distinct estimates the number of distinct values in a column.

Higher n_distinct usually means:

higher selectivity
fewer matching rows
better filtering efficiency

In our example:

Column	n_distinct
customer_id	89
employee_id	9

customer_id is significantly more selective.

Therefore, PostgreSQL benefits more from filtering by customer_id first.

Why Selectivity Matters

Imagine the table contains 1 million rows.

Filtering by:

employee_id

may still leave:

1,000,000 / 9 ≈ 111,111 rows

Filtering by:

customer_id

may reduce the result to:

1,000,000 / 89 ≈ 11,236 rows

Starting with the most selective equality predicate drastically reduces the search space.

This improves:

index scan efficiency
cache locality
heap access reduction
execution time

The Correct Index Design

Applying these principles:

Equality predicates first
Most selective equality columns first
Range predicates last

produces:

CREATE INDEX CONCURRENTLY
    pga_idx_orders_customer_id_employee_id_order_date
ON public.orders
    (customer_id, employee_id, order_date);

This ordering allows PostgreSQL to:

Navigate efficiently using exact matches
Reduce scanned rows as early as possible
Apply the range scan only after narrowing the search space

Good Index Design Also Depends on Table Size

One of the biggest misconceptions about PostgreSQL optimization is:

“Indexes are always faster.”

This is false.

For small tables, PostgreSQL often prefers a Sequential Scan (Seq Scan) even when an index exists.

Why?

Because using an index has overhead:

traversing the B-Tree
reading index pages
performing heap lookups
random I/O access

For sufficiently small tables, scanning the entire table sequentially is cheaper.

Query Plans Matter More Than Theory

A theoretically perfect index is useless if PostgreSQL never uses it.

That is why index recommendation engines should never rely only on SQL syntax.

They must also inspect:

execution plans
estimated costs
table statistics
row estimates
planner decisions

The Importance of Execution Plans

The execution plan reveals how PostgreSQL actually executes a query.

For example:

EXPLAIN ANALYZE
SELECT ...

may show:

Seq Scan on orders

or:

Index Scan using ...

This distinction is critical.

A query may contain filter predicates that look index-friendly, but PostgreSQL may correctly determine that:

the table is too small
selectivity is too low
too many rows would still be scanned

and therefore prefer a sequential scan.

How pgAssistant Recommends Indexes

pgAssistant does not simply parse SQL queries.

It combines multiple sources of information:

1. Query Plan

pgAssistant analyzes nodes in the query plan to identify candidate index columns.

2. Predicate Types

It classifies predicates into categories:

Equality predicates

=
IN
IS NULL

Range predicates

>
>=
<
<=
BETWEEN
LIKE 'prefix%'

Equality predicates are prioritized before range predicates.

3. Column Statistics

pgAssistant uses PostgreSQL planner statistics such as:

n_distinct
null_frac
most_common_vals
most_common_freqs
histogram_bounds

to estimate column selectivity.

Columns with higher selectivity are prioritized earlier in the index definition.

4. Table Statistics

pgAssistant also evaluates table-level statistics, including:

estimated row counts
table size
planner cost estimates

This is extremely important because some tables are simply too small to justify an index.

In these cases, recommending an index would create unnecessary maintenance overhead without improving performance.

How pgAssistant Recommends PostgreSQL Indexes

Why Query Syntax Alone Is Not Enough

A good recommendation depends on:

predicate types
column selectivity
table statistics
planner estimates
execution plans
existing indexes already used by PostgreSQL

This is precisely the approach implemented by pgAssistant.

pgAssistant Uses Execution Plans First

pgAssistant starts from the PostgreSQL execution plan.

It analyzes:

EXPLAIN (ANALYZE, FORMAT JSON)

This is extremely important because the execution plan reveals:

whether PostgreSQL uses a Seq Scan
whether an Index Scan already exists
whether residual filtering still occurs after index access
whether planner row estimations are inaccurate
whether a composite index could reduce heap filtering

This avoids many false-positive recommendations.

A query may look index-friendly while PostgreSQL is already using the optimal access path.

pgAssistant Analyzes Existing Access Paths

The advisor first determines how PostgreSQL currently accesses the table.

Examples:

Seq Scan
Index Scan
Index Only Scan
Bitmap Heap Scan

This distinction is critical.

Sequential Scan Case

If PostgreSQL performs a Seq Scan, pgAssistant evaluates whether an index could realistically improve performance.

Indexed Access Case

If PostgreSQL already uses an index, pgAssistant does not stop there.

It also analyzes:

Index Cond
Filter
Recheck Cond
Rows Removed by Filter

This allows pgAssistant to detect situations such as:

Index Scan using idx_customer on orders
  Index Cond: (customer_id = 42)
  Filter: (employee_id = 5)

In this case, PostgreSQL uses an index, but still visits many rows that are later discarded by the executor.

pgAssistant can therefore recommend a more selective composite index such as:

(customer_id, employee_id)

instead of considering the existing index “good enough”.

Predicate Classification

Once predicates are extracted from the execution plan, pgAssistant classifies them by operator type.

Internally, predicates are ranked according to B-Tree efficiency.

Equality Predicates

Highest priority:

=
IN
IS NULL

These predicates allow PostgreSQL to navigate directly to a very small portion of the index tree.

Prefix Search Predicates

Second priority:

LIKE 'abc%'

Prefix searches can still benefit efficiently from B-Tree traversal.

Range Predicates

Lowest priority:

>
>=
<
<=
BETWEEN

Once PostgreSQL enters a range scan, subsequent columns become far less effective for index navigation.

That is why range predicates are typically placed last in composite indexes.

How pgAssistant Orders Index Columns

After classifying predicates, pgAssistant computes candidate index ordering.

The internal ordering logic is:

1. Equality predicates first
2. Prefix predicates second
3. Range predicates last
4. Inside each category:
   highest cardinality first

This logic is implemented directly inside:

reorder_index_candidate_columns()

The advisor therefore builds indexes that align with PostgreSQL B-Tree traversal behavior.

Why Column Cardinality Matters

pgAssistant uses PostgreSQL statistics to estimate selectivity.

The most important metric is:

n_distinct

which estimates the number of distinct values in a column.

Higher cardinality usually means:

fewer matching rows
better filtering
smaller index scan ranges

For our example:

customer_id [=]: n_distinct=89
employee_id [=]: n_distinct=9
order_date [>=]: n_distinct=814

Although order_date has the highest cardinality, it is a range predicate and therefore placed last.

Among equality predicates:

customer_id > employee_id

because:

89 > 9

The final ordering becomes:

(customer_id, employee_id, order_date)

pgAssistant Uses PostgreSQL Statistics

pgAssistant enriches every recommendation using planner statistics extracted from PostgreSQL.

Examples include:

n_distinct
null_frac
most_common_vals
most_common_freqs
histogram_bounds

The advisor even exposes these statistics in its recommendation reasoning.

Example:

customer_id [=]: n_distinct=89, null_frac=0.0000
employee_id [=]: n_distinct=9, null_frac=0.0000
order_date [>=]: n_distinct=814

This makes the recommendation transparent and explainable.

pgAssistant Also Uses Table Statistics

An index is not always beneficial.

This is one of the most important concepts in PostgreSQL optimization.

For small tables, PostgreSQL often correctly prefers:

Seq Scan

instead of:

Index Scan

because:

sequential reads are cheap
index traversal has overhead
heap fetches introduce random I/O
scanning the entire table may cost less

This is why pgAssistant also evaluates:

estimated row counts
table size
planner costs
execution frequency
workload intensity

The advisor does not blindly recommend indexes whenever a sequential scan appears.

Detecting Inefficient Indexed Access

One particularly powerful aspect of pgAssistant is its ability to analyze residual filtering.

For indexed scans, the advisor evaluates:

Rows Removed by Filter

If PostgreSQL retrieves many tuples from the index only to discard them afterward, pgAssistant detects that the current index may be incomplete.

Internally, the advisor computes:

Residual filter kept X% of tuples visited

This helps identify situations where adding an additional column to a composite index could drastically reduce heap filtering.

Detecting Planner Estimation Problems

pgAssistant also compares:

plan_rows
vs
actual_rows

Large estimation gaps may indicate:

stale statistics
data skew
correlation issues
missing extended statistics

This additional analysis improves the reliability of recommendations.

The pgAssistant Recommendation Algorithm

Conceptually, pgAssistant follows this workflow:

1. Analyze EXPLAIN ANALYZE JSON plan
2. Detect access paths
3. Extract predicates from:
   - Index Cond
   - Filter
   - Recheck Cond
4. Classify predicates by operator type
5. Rank predicates for B-Tree efficiency
6. Use PostgreSQL statistics to estimate selectivity
7. Order columns by:
   - predicate class
   - cardinality
8. Evaluate table statistics
9. Evaluate execution costs
10. Detect residual filtering
11. Compare against existing indexes
12. Recommend index only if beneficial

Example: Final Recommendation

Given:

SELECT
    order_id,
    customer_id,
    employee_id,
    order_date,
    ship_country
FROM public.orders
WHERE customer_id = $1
  AND employee_id = $2
  AND order_date >= DATE $3;

pgAssistant evaluates:

Column	Predicate	Priority	n_distinct
customer_id	`=`	Equality	89
employee_id	`=`	Equality	9
order_date	`>=`	Range	814

The advisor therefore generates:

CREATE INDEX CONCURRENTLY
    "pga_idx_orders_customer_id_employee_id_order_date"
ON "public"."orders"
    ("customer_id", "employee_id", "order_date");

This ordering follows PostgreSQL B-Tree optimization principles while also considering:

planner statistics
table characteristics
execution plan behavior
residual filtering
existing indexes already in use

Live demo

A public demo is available here:

https://ov-004f8b.infomaniak.ch/

Demo connection:

postgresql://postgres:demo@demo-db:5432/northwind

The public demo intentionally runs without AI.

Project links

GitHub: https://github.com/beh74/pgassistant-community
Documentation: https://beh74.github.io/pgassistant-blog/
Docker image: https://hub.docker.com/r/bertrand73/pgassistant

Feedback welcome

The project is still evolving and many parts can certainly be improved.

If you work with PostgreSQL and have ideas, feedback, or criticisms, feel free to open an issue or discussion on GitHub.

Thanks for reading.

pgAssistant 2.8 — Deterministic PostgreSQL Analysis with the new Global Advisor

bertrand HARTWIG — Fri, 08 May 2026 06:01:15 +0000

For the past months, I have been working on a simple idea around PostgreSQL tooling:

before using AI, start with deterministic analysis.

This is the direction behind pgAssistant 2.8.

This release introduces a new component called Global Advisor, alongside many improvements around ranking, schema analysis, maintenance diagnostics, and index recommendations.

The project remains open-source and focused on practical PostgreSQL analysis.

What is pgAssistant?

pgAssistant is an open-source PostgreSQL analysis tool.

It helps developers:

inspect database structures
analyze execution plans
detect schema and maintenance issues
review indexes and foreign keys
understand PostgreSQL behavior more easily

The project combines:

deterministic analysis
execution-plan analysis (EXPLAIN ANALYZE)
optional AI-assisted reasoning

The goal is not to replace PostgreSQL expertise.
The goal is simply to make PostgreSQL diagnostics more accessible and more contextual.

The main addition in 2.8: Global Advisor

Before pgAssistant 2.8, most checks existed independently.

Now they are consolidated into a single entry point:

Global Advisor

The Global Advisor performs a database-wide deterministic analysis and aggregates findings into a unified recommendation list.

Each recommendation now includes:

a rank
a confidence score
an estimated impact
an estimated implementation effort
a suggested SQL statement when relevant

The objective is not to claim certainty.

It is to help prioritize investigations.

Deterministic first

One important design choice in pgAssistant is that the Global Advisor is intentionally deterministic.

The analysis is based directly on PostgreSQL catalogs and statistics:

pg_stat_user_tables
pg_stat_user_indexes
pg_constraint
pg_index
pg_settings
pg_stats
execution plans

This means:

same input → same output
no hallucinations
explainable findings
reproducible analysis

AI is still supported as an optional layer.

Examples of checks now included

The Global Advisor currently includes checks such as:

missing indexes on foreign keys
redundant or duplicate indexes
unused indexes
invalid indexes
datatype inconsistencies on foreign keys
tables without primary keys
stale statistics
tables never vacuumed
estimated table bloat
excessive index-to-table ratio
low foreign key coverage
PostgreSQL configuration checks
sequences approaching exhaustion

Most recommendations also include suggested SQL.

Query analysis is still there

The query advisor based on real EXPLAIN ANALYZE plans remains a core part of pgAssistant.

The idea is now:

Global Advisor → broad database analysis
Query Advisor → detailed query-level investigation

These two approaches complement each other.

About AI

AI support remains optional.

pgAssistant can work entirely without an LLM.

When enabled, AI features receive contextual PostgreSQL information:

schema definitions
indexes
execution plans
statistics
database settings

This significantly improves the relevance of generated suggestions compared to generic SQL prompting.

Supported providers currently include:

Ollama
OpenAI-compatible APIs

Why I built it this way

A lot of PostgreSQL tooling focuses on metrics dashboards.

Those tools are useful, but I often felt there was still a gap between:

seeing a metric
understanding the cause
deciding what to change

pgAssistant tries to reduce that gap.

The project is still evolving, but the Global Advisor is an important step toward a more coherent analysis workflow.

Live demo

A public demo is available here:

https://ov-004f8b.infomaniak.ch/

Demo connection:

postgresql://postgres:demo@demo-db:5432/northwind

The public demo intentionally runs without AI.

Project links

GitHub: https://github.com/beh74/pgassistant-community
Documentation: https://beh74.github.io/pgassistant-blog/
Docker image: https://hub.docker.com/r/bertrand73/pgassistant

Feedback welcome

The project is still evolving and many parts can certainly be improved.

If you work with PostgreSQL and have ideas, feedback, or criticisms, feel free to open an issue or discussion on GitHub.

Thanks for reading.

pgAssistant v1.7 released

bertrand HARTWIG — Sat, 01 Feb 2025 13:09:19 +0000

I'm excited to share that we just released pgAssistant v1.7.

PGAssistant is an open-source tool designed to help developers gain deeper insights into their PostgreSQL databases and optimize performance efficiently.

It analyzes database behavior, detects schema-related issues, and provides actionable recommendations to resolve them.

One of the goals of PGAssistant is to help developers optimize their database and fix potential issues on their own before needing to seek assistance from a DBA.

🚀 AI-Powered Optimization: PGAssistant leverages AI-driven language models like ChatGPT, Claude, and on-premise solutions such as Ollama to assist developers in refining complex queries and enhancing database efficiency.

🔗 GitHub Repository: PGAssistant

🚀 Easy Deployment with Docker: PGAssistant is Docker-based, making it simple to run. Get started effortlessly using the provided Docker Compose file.

I’d love to hear your feedback! If you find PGAssistant useful, feel free to contribute or suggest new features. Let’s make PostgreSQL database easy for dev Teams !

pgAssistant - postgesql tool 4 dev

bertrand HARTWIG — Mon, 12 Aug 2024 05:00:03 +0000

I wrote this small tool because I noticed that our young developers were struggling to understand the behavior of their PostgreSQL database, and therefore had even more difficulty optimizing their databases.

Gradually, developers started using it, and the tool helped the development teams become much more autonomous, and me much less solicited.

Feel free to use it and give me your feedback.

I try to enhance the application whenever time allows.

You can find it there : https://github.com/nexsol-technologies/pgassistant

DEV Community: bertrand HARTWIG

Running PostgreSQL Correctly with Docker Compose

Why this configuration matters

Example docker-compose.yml

restart: always

Shared memory and shm_size

Persistent volume

Container resources

POSTGRES_INITDB_ARGS

Health check

pg_stat_statements

autovacuum=on

pgTune-generated PostgreSQL parameters

Important relationship between Docker resources and pgTune

About pgAssistant

What’s New in pgAssistant Since Version 2.8

What’s New in pgAssistant Since Version 2.8

Global Advisor: from an experiment to a broader expert system

Better foreign-key diagnostics

Data type mismatches

Missing foreign-key indexes

Stronger index diagnostics

Better index recommendations from execution plans

Duplicate and redundant indexes

Safer unused-index detection

Invalid index handling

PostgreSQL release and support checks

A newer minor release is available

The PostgreSQL branch is no longer supported

More accurate maintenance recommendations

Tables never vacuumed

Stale statistics

Vacuum recommendations

Autovacuum urgency

Version-aware PostgreSQL configuration checks

Sequence exhaustion detection

Better workload prioritization

Richer context for AI-assisted query analysis

Table Definition Helper redesign

New Table Health view

Collector and Grafana integration

Broader schema coverage

Additional fixes and compatibility improvements

A clearer direction for pgAssistant

Links

PostgreSQL: Which Queries Should You Optimize First?

Indexing every WHERE column is not PostgreSQL optimization

Designing the Right PostgreSQL Index Using Query Plans and Statistics

Why Index Design Is Difficult

The Fundamental Rule of B-Tree Indexes

Equality Predicates Are Highly Selective

Why Range Predicates Should Come Last

Column Order Among Equality Predicates

PostgreSQL Statistics Drive Good Index Design

Understanding n_distinct

Why Selectivity Matters

The Correct Index Design

Good Index Design Also Depends on Table Size

Query Plans Matter More Than Theory

The Importance of Execution Plans

How pgAssistant Recommends Indexes

1. Query Plan

2. Predicate Types

Equality predicates

Range predicates

3. Column Statistics

4. Table Statistics

How pgAssistant Recommends PostgreSQL Indexes

Why Query Syntax Alone Is Not Enough

pgAssistant Uses Execution Plans First

pgAssistant Analyzes Existing Access Paths

Sequential Scan Case

Indexed Access Case

Predicate Classification

Equality Predicates

Prefix Search Predicates

Range Predicates

How pgAssistant Orders Index Columns

Why Column Cardinality Matters

pgAssistant Uses PostgreSQL Statistics

Example `docker-compose.yml`

Shared memory and `shm_size`

`pg_stat_statements`

`autovacuum=on`

Understanding `n_distinct`